January 1974 | ROBERT A. WAGNER AND MICHAEL J. FISCHER
The string-to-string correction problem involves determining the minimum cost sequence of edit operations needed to transform one string into another. The allowed operations are changing a character, deleting a character, or inserting a character. An efficient algorithm is presented that computes this edit distance in time proportional to the product of the lengths of the two strings. The algorithm has applications in spelling correction and finding the longest common subsequence between two strings.
The edit distance between two strings A and B is defined as the minimum cost of all sequences of edit operations that transform A into B. A cost function is assigned to each edit operation, and the goal is to find the minimum cost sequence. The problem is solved by defining a trace, which is a structure that captures the essential steps of the edit operations without redundancy. Traces have properties that allow the edit distance to be computed efficiently.
The algorithm uses dynamic programming to compute the edit distance. It defines a table D(i, j) representing the minimum cost to transform the first i characters of string A into the first j characters of string B. The value of D(i, j) is computed based on three possibilities: changing the last character of A to the last character of B, deleting the last character of A, or inserting the last character of B. The algorithm runs in O(|A| × |B|) time, where |A| and |B| are the lengths of the two strings.
The algorithm can also be used to find the longest common subsequence between two strings. By assigning a cost of 0 to change operations where characters are the same and 2 otherwise, the edit distance can be used to compute the length of the longest common subsequence. The algorithm is efficient and has applications in various areas, including spelling correction and string comparison.The string-to-string correction problem involves determining the minimum cost sequence of edit operations needed to transform one string into another. The allowed operations are changing a character, deleting a character, or inserting a character. An efficient algorithm is presented that computes this edit distance in time proportional to the product of the lengths of the two strings. The algorithm has applications in spelling correction and finding the longest common subsequence between two strings.
The edit distance between two strings A and B is defined as the minimum cost of all sequences of edit operations that transform A into B. A cost function is assigned to each edit operation, and the goal is to find the minimum cost sequence. The problem is solved by defining a trace, which is a structure that captures the essential steps of the edit operations without redundancy. Traces have properties that allow the edit distance to be computed efficiently.
The algorithm uses dynamic programming to compute the edit distance. It defines a table D(i, j) representing the minimum cost to transform the first i characters of string A into the first j characters of string B. The value of D(i, j) is computed based on three possibilities: changing the last character of A to the last character of B, deleting the last character of A, or inserting the last character of B. The algorithm runs in O(|A| × |B|) time, where |A| and |B| are the lengths of the two strings.
The algorithm can also be used to find the longest common subsequence between two strings. By assigning a cost of 0 to change operations where characters are the same and 2 otherwise, the edit distance can be used to compute the length of the longest common subsequence. The algorithm is efficient and has applications in various areas, including spelling correction and string comparison.