Automatic Recognition of Multi-Word Terms: the C-value/NC-value Method

Automatic Recognition of Multi-Word Terms: the C-value/NC-value Method

| Katerina Frantzi, Sophia Ananiadou, Hideki Mima
The paper introduces the C-value/NC-value method for automatic extraction of multi-word terms from machine-readable special language corpora. The method combines linguistic and statistical information to improve term extraction. The C-value enhances the frequency-based approach by focusing on nested terms, while the NC-value incorporates context information to improve term extraction in general. The C-value method uses part-of-speech tagging, linguistic filters, and statistical features to rank candidate terms. The NC-value extends this by incorporating context words and their weights to re-rank the extracted terms. The method is evaluated using precision and recall metrics, showing improved performance compared to frequency-based approaches. The NC-value further improves term extraction by combining C-value with context information, leading to better term ranking and higher precision. The method is applied to a medical corpus, demonstrating its effectiveness in extracting real terms while minimizing noise. The results show that the NC-value method outperforms both C-value and frequency-based approaches in terms of precision and recall.The paper introduces the C-value/NC-value method for automatic extraction of multi-word terms from machine-readable special language corpora. The method combines linguistic and statistical information to improve term extraction. The C-value enhances the frequency-based approach by focusing on nested terms, while the NC-value incorporates context information to improve term extraction in general. The C-value method uses part-of-speech tagging, linguistic filters, and statistical features to rank candidate terms. The NC-value extends this by incorporating context words and their weights to re-rank the extracted terms. The method is evaluated using precision and recall metrics, showing improved performance compared to frequency-based approaches. The NC-value further improves term extraction by combining C-value with context information, leading to better term ranking and higher precision. The method is applied to a medical corpus, demonstrating its effectiveness in extracting real terms while minimizing noise. The results show that the NC-value method outperforms both C-value and frequency-based approaches in terms of precision and recall.
Reach us at info@study.space