Word Association Norms, Mutual Information, and Lexicography

Word Association Norms, Mutual Information, and Lexicography

| Kenneth Ward Church, Patrick Hanks
This paper introduces a new objective measure for estimating word association norms based on mutual information, which can be derived directly from computer-readable corpora. The proposed measure, called the association ratio, provides a statistical description of linguistic phenomena, including semantic relations and co-occurrence constraints between words. It is more objective and less costly than traditional subjective methods used in psycholinguistic research. The association ratio is based on the information-theoretic concept of mutual information, which measures the dependence between two words. It is calculated using the joint probability of two words appearing together and the product of their individual probabilities. The association ratio can be used to identify interesting word associations, such as the strong association between "save" and "from," and to help lexicographers organize concordances. The association ratio has a wide range of practical applications, including improving speech and optical character recognition, disambiguating syntactic structures, retrieving texts from large databases, and enhancing the productivity of computational linguists and lexicographers. It can also be used to identify interesting lexico-syntactic relationships between verbs and typical arguments/adjuncts. The association ratio is particularly useful in lexicography, where it can help identify semantic classes and provide a more systematic approach to analyzing concordances. It can also help lexicographers identify important associations that may not be obvious from traditional methods. However, the association ratio is limited in its ability to capture semantic meaning and may require additional preprocessing to highlight natural similarities between words. The association ratio is a powerful tool for lexicographers, providing a practical and objective measure that can help them organize concordances and identify important word associations. Despite its limitations, it has the potential to be an important aid in the field of lexicography.This paper introduces a new objective measure for estimating word association norms based on mutual information, which can be derived directly from computer-readable corpora. The proposed measure, called the association ratio, provides a statistical description of linguistic phenomena, including semantic relations and co-occurrence constraints between words. It is more objective and less costly than traditional subjective methods used in psycholinguistic research. The association ratio is based on the information-theoretic concept of mutual information, which measures the dependence between two words. It is calculated using the joint probability of two words appearing together and the product of their individual probabilities. The association ratio can be used to identify interesting word associations, such as the strong association between "save" and "from," and to help lexicographers organize concordances. The association ratio has a wide range of practical applications, including improving speech and optical character recognition, disambiguating syntactic structures, retrieving texts from large databases, and enhancing the productivity of computational linguists and lexicographers. It can also be used to identify interesting lexico-syntactic relationships between verbs and typical arguments/adjuncts. The association ratio is particularly useful in lexicography, where it can help identify semantic classes and provide a more systematic approach to analyzing concordances. It can also help lexicographers identify important associations that may not be obvious from traditional methods. However, the association ratio is limited in its ability to capture semantic meaning and may require additional preprocessing to highlight natural similarities between words. The association ratio is a powerful tool for lexicographers, providing a practical and objective measure that can help them organize concordances and identify important word associations. Despite its limitations, it has the potential to be an important aid in the field of lexicography.
Reach us at info@study.space