Untangling Text Data Mining

Untangling Text Data Mining

| Marti A. Hearst
The paper discusses the challenges and opportunities in text data mining (TDM), distinguishing it from information access and computational linguistics. TDM involves discovering new information from text collections, unlike information access, which focuses on retrieving known information. The author argues that TDM is not merely an extension of data mining but a distinct field with unique applications. The paper highlights the importance of using text data to uncover trends and patterns, which can lead to new discoveries. It contrasts TDM with information retrieval, emphasizing that TDM aims to find previously unknown information, not just retrieve existing data. The paper also discusses the role of computational linguistics in TDM, noting that while it contributes to language analysis, it does not necessarily lead to broader discoveries. Examples of TDM applications include using text to form hypotheses about disease and analyzing patent texts to uncover social impacts. The LINDI project is presented as an example of TDM tools that support exploratory data analysis. The paper concludes that a combination of computational and user-driven approaches can lead to significant advances in TDM, enabling the discovery of new information from large text collections.The paper discusses the challenges and opportunities in text data mining (TDM), distinguishing it from information access and computational linguistics. TDM involves discovering new information from text collections, unlike information access, which focuses on retrieving known information. The author argues that TDM is not merely an extension of data mining but a distinct field with unique applications. The paper highlights the importance of using text data to uncover trends and patterns, which can lead to new discoveries. It contrasts TDM with information retrieval, emphasizing that TDM aims to find previously unknown information, not just retrieve existing data. The paper also discusses the role of computational linguistics in TDM, noting that while it contributes to language analysis, it does not necessarily lead to broader discoveries. Examples of TDM applications include using text to form hypotheses about disease and analyzing patent texts to uncover social impacts. The LINDI project is presented as an example of TDM tools that support exploratory data analysis. The paper concludes that a combination of computational and user-driven approaches can lead to significant advances in TDM, enabling the discovery of new information from large text collections.
Reach us at info@study.space
[slides and audio] Untangling Text Data Mining