June 12-17, 2016 | Zichao Yang1, Diyi Yang1, Chris Dyer1, Xiaodong He2, Alex Smola1, Eduard Hovy1
The paper introduces a Hierarchical Attention Network (HAN) for document classification, designed to capture the hierarchical structure of documents and to attend to differentially important content at both the word and sentence levels. The model is evaluated on six large-scale text classification tasks, demonstrating significant improvements over previous methods. The HAN architecture includes a word sequence encoder, word-level attention layer, sentence encoder, and sentence-level attention layer. The word and sentence attention mechanisms allow the model to focus on more informative words and sentences, enhancing its performance. Visualizations of the attention layers show that the model selects qualitatively informative words and sentences, providing insights into the classification decisions. The paper also discusses the context-dependent nature of word importance and presents experiments to verify this, showing that the model can capture diverse contexts and assign context-dependent weights to words. Overall, the HAN outperforms various baseline methods, including traditional approaches and neural network-based methods, in both sentiment estimation and topic classification tasks.The paper introduces a Hierarchical Attention Network (HAN) for document classification, designed to capture the hierarchical structure of documents and to attend to differentially important content at both the word and sentence levels. The model is evaluated on six large-scale text classification tasks, demonstrating significant improvements over previous methods. The HAN architecture includes a word sequence encoder, word-level attention layer, sentence encoder, and sentence-level attention layer. The word and sentence attention mechanisms allow the model to focus on more informative words and sentences, enhancing its performance. Visualizations of the attention layers show that the model selects qualitatively informative words and sentences, providing insights into the classification decisions. The paper also discusses the context-dependent nature of word importance and presents experiments to verify this, showing that the model can capture diverse contexts and assign context-dependent weights to words. Overall, the HAN outperforms various baseline methods, including traditional approaches and neural network-based methods, in both sentiment estimation and topic classification tasks.