Hierarchical Attention Networks for Document Classification

Hierarchical Attention Networks for Document Classification

June 12-17, 2016 | Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, Eduard Hovy
This paper proposes a Hierarchical Attention Network (HAN) for document classification. The model has two key features: a hierarchical structure that mirrors the structure of documents and two levels of attention mechanisms at the word and sentence levels, allowing the model to focus on more important content when constructing the document representation. Experiments on six large-scale text classification tasks show that HAN outperforms previous methods significantly. Visualization of the attention layers shows that the model selects informative words and sentences. The HAN architecture consists of a word sequence encoder, a word-level attention layer, a sentence encoder, and a sentence-level attention layer. The word encoder uses a bidirectional GRU to generate word annotations, and a word attention mechanism is used to extract important words and aggregate their representations into sentence vectors. The sentence encoder uses a bidirectional GRU to encode sentences, and a sentence attention mechanism is used to extract important sentences and aggregate their representations into a document vector. The model is evaluated on six large-scale text classification datasets, including Yelp reviews, IMDB reviews, Yahoo Answers, and Amazon reviews. The results show that HAN outperforms previous methods in all tasks, with the best performance achieved by the HN-ATT model, which incorporates both hierarchical structure and attention mechanisms. The model also demonstrates the ability to capture context-dependent word importance, as shown by the attention weight distributions for words like "good" and "bad" in different contexts. The HAN model is effective in selecting informative sentences and words in a document, as demonstrated by the visualization of the hierarchical attention layers. The model is able to capture the hierarchical structure of documents and use attention mechanisms to focus on important parts of the text, leading to improved performance in document classification tasks.This paper proposes a Hierarchical Attention Network (HAN) for document classification. The model has two key features: a hierarchical structure that mirrors the structure of documents and two levels of attention mechanisms at the word and sentence levels, allowing the model to focus on more important content when constructing the document representation. Experiments on six large-scale text classification tasks show that HAN outperforms previous methods significantly. Visualization of the attention layers shows that the model selects informative words and sentences. The HAN architecture consists of a word sequence encoder, a word-level attention layer, a sentence encoder, and a sentence-level attention layer. The word encoder uses a bidirectional GRU to generate word annotations, and a word attention mechanism is used to extract important words and aggregate their representations into sentence vectors. The sentence encoder uses a bidirectional GRU to encode sentences, and a sentence attention mechanism is used to extract important sentences and aggregate their representations into a document vector. The model is evaluated on six large-scale text classification datasets, including Yelp reviews, IMDB reviews, Yahoo Answers, and Amazon reviews. The results show that HAN outperforms previous methods in all tasks, with the best performance achieved by the HN-ATT model, which incorporates both hierarchical structure and attention mechanisms. The model also demonstrates the ability to capture context-dependent word importance, as shown by the attention weight distributions for words like "good" and "bad" in different contexts. The HAN model is effective in selecting informative sentences and words in a document, as demonstrated by the visualization of the hierarchical attention layers. The model is able to capture the hierarchical structure of documents and use attention mechanisms to focus on important parts of the text, leading to improved performance in document classification tasks.
Reach us at info@study.space