[slides] Enhanced hypertext categorization using hyperlinks

The paper addresses the challenge of automatically extracting metadata from unstructured hypertext databases to enable structured search, improve keyword ambiguity, and enhance search and filtering quality. The authors propose robust statistical models and a relaxation labeling technique to exploit link information in small neighborhoods around documents, improving classification accuracy. They experiment with pre-classified samples from Yahoo! and the US Patent Database, achieving significant improvements over traditional text classification methods. The hypertext classifier reduces classification errors from 36% in patents to 21%, and from 68% in Yahoo! documents to 21%, demonstrating the effectiveness of their approach in handling diverse and fragmented web content. The method gracefully adapts to varying fractions of pre-classified documents in the neighborhood, making it robust and practical for real-world applications.The paper addresses the challenge of automatically extracting metadata from unstructured hypertext databases to enable structured search, improve keyword ambiguity, and enhance search and filtering quality. The authors propose robust statistical models and a relaxation labeling technique to exploit link information in small neighborhoods around documents, improving classification accuracy. They experiment with pre-classified samples from Yahoo! and the US Patent Database, achieving significant improvements over traditional text classification methods. The hypertext classifier reduces classification errors from 36% in patents to 21%, and from 68% in Yahoo! documents to 21%, demonstrating the effectiveness of their approach in handling diverse and fragmented web content. The method gracefully adapts to varying fractions of pre-classified documents in the neighborhood, making it robust and practical for real-world applications.

Enhanced hypertext categorization using hyperlinks

1998 | Soumen Chakrabarti, Byron Dom, Piotr Indyk