Understanding Normalized (pointwise) mutual information in collocation extraction

This paper discusses the use of information-theoretic measures, specifically mutual information (MI) and pointwise mutual information (PMI), in collocation extraction. The authors introduce normalized variants of these measures to improve interpretability and reduce sensitivity to occurrence frequency. The normalized measures, NMI and NPMI, are designed to have a fixed upper bound, making them more interpretable and less biased towards low or high frequency data. The paper includes a theoretical discussion on the properties of these measures and an empirical study to evaluate their effectiveness in collocation extraction tasks. The results suggest that NPMI can be a more effective replacement for PMI, while NMI shows a more complex relationship with MI. The study also highlights the importance of empirical evaluation in selecting the most suitable association measure for specific collocation extraction tasks.This paper discusses the use of information-theoretic measures, specifically mutual information (MI) and pointwise mutual information (PMI), in collocation extraction. The authors introduce normalized variants of these measures to improve interpretability and reduce sensitivity to occurrence frequency. The normalized measures, NMI and NPMI, are designed to have a fixed upper bound, making them more interpretable and less biased towards low or high frequency data. The paper includes a theoretical discussion on the properties of these measures and an empirical study to evaluate their effectiveness in collocation extraction tasks. The results suggest that NPMI can be a more effective replacement for PMI, while NMI shows a more complex relationship with MI. The study also highlights the importance of empirical evaluation in selecting the most suitable association measure for specific collocation extraction tasks.

Normalized (Pointwise) Mutual Information in Collocation Extraction

| Gerlof Bouma