Understanding Web mining research%3A a survey

This paper surveys the research in the area of Web mining, highlighting the confusion surrounding the term "Web mining" and proposing three categories: Web content mining, Web structure mining, and Web usage mining. The paper discusses the relationship between these categories and the agent paradigm, focusing on representation issues, processes, learning algorithms, and applications. Web mining involves using data mining techniques to automatically discover and extract information from Web documents and services. It is a converging research area from multiple communities, including database, information retrieval, and AI, particularly machine learning and natural language processing. Web mining is closely related to information retrieval (IR) and information extraction (IE), but differs in its focus on the structure and usage of the Web. Web content mining deals with extracting useful information from Web content, while Web structure mining focuses on the link structure of the Web. Web usage mining analyzes user behavior on the Web. The paper also discusses the connection between Web mining and machine learning, noting that while mining is not a perfect metaphor for knowledge discovery, it is an important application of machine learning on the Web. The paper concludes with a discussion of research issues and future directions in Web mining.This paper surveys the research in the area of Web mining, highlighting the confusion surrounding the term "Web mining" and proposing three categories: Web content mining, Web structure mining, and Web usage mining. The paper discusses the relationship between these categories and the agent paradigm, focusing on representation issues, processes, learning algorithms, and applications. Web mining involves using data mining techniques to automatically discover and extract information from Web documents and services. It is a converging research area from multiple communities, including database, information retrieval, and AI, particularly machine learning and natural language processing. Web mining is closely related to information retrieval (IR) and information extraction (IE), but differs in its focus on the structure and usage of the Web. Web content mining deals with extracting useful information from Web content, while Web structure mining focuses on the link structure of the Web. Web usage mining analyzes user behavior on the Web. The paper also discusses the connection between Web mining and machine learning, noting that while mining is not a perfect metaphor for knowledge discovery, it is an important application of machine learning on the Web. The paper concludes with a discussion of research issues and future directions in Web mining.

Web Mining Research: A Survey

July 2000 | Raymond Kosala, Hendrik Blockeel