October 1996; Revised October 1997 | Eric Sven Ristad, Peter N. Yianilos
This report by Eric Sven Ristad and Peter N. Yianilos introduces a stochastic model for string edit distance, which allows for the automatic learning of string edit distances from a corpus of examples. The authors demonstrate the utility of this approach by applying it to the challenging problem of learning the pronunciation of words in conversational speech. They achieve a reduction in error rate compared to the untrained Levenshtein distance by a factor of four. The report covers the following key points:
1. **String Edit Distance**: The report defines string edit distance and its variants, including the Viterbi edit distance and the stochastic edit distance.
2. **Stochastic Model**: It presents a stochastic interpretation of string edit distance, modeling it as a memoryless stochastic transduction between underlying and surface strings.
3. **Learning Algorithm**: An efficient algorithm is provided to learn the primitive edit costs from a corpus of string pairs using the Expectation-Maximization (EM) framework.
4. **String Classification**: The report extends the stochastic model to string classification problems, using it to learn a powerful string classifier from a corpus of labeled strings.
5. **Application to Pronunciation Recognition**: The techniques are applied to the Switchboard corpus of conversational speech, achieving superior performance over the Levenshtein distance.
The authors conclude that their statistical techniques can potentially obviate the need for manual creation of pronouncing lexicons and enable accurate recognition of new words from a single example.This report by Eric Sven Ristad and Peter N. Yianilos introduces a stochastic model for string edit distance, which allows for the automatic learning of string edit distances from a corpus of examples. The authors demonstrate the utility of this approach by applying it to the challenging problem of learning the pronunciation of words in conversational speech. They achieve a reduction in error rate compared to the untrained Levenshtein distance by a factor of four. The report covers the following key points:
1. **String Edit Distance**: The report defines string edit distance and its variants, including the Viterbi edit distance and the stochastic edit distance.
2. **Stochastic Model**: It presents a stochastic interpretation of string edit distance, modeling it as a memoryless stochastic transduction between underlying and surface strings.
3. **Learning Algorithm**: An efficient algorithm is provided to learn the primitive edit costs from a corpus of string pairs using the Expectation-Maximization (EM) framework.
4. **String Classification**: The report extends the stochastic model to string classification problems, using it to learn a powerful string classifier from a corpus of labeled strings.
5. **Application to Pronunciation Recognition**: The techniques are applied to the Switchboard corpus of conversational speech, achieving superior performance over the Levenshtein distance.
The authors conclude that their statistical techniques can potentially obviate the need for manual creation of pronouncing lexicons and enable accurate recognition of new words from a single example.