20 Sep 2016 | Jianpeng Cheng, Li Dong and Mirella Lapata
This paper introduces a machine reading simulator that improves sequence-level networks' ability to handle structured input. The model processes text incrementally from left to right, using memory and attention for shallow reasoning. It replaces the single memory cell in Long Short-Term Memory (LSTM) with a memory network, enabling adaptive memory usage and weakly inducing relations among tokens. The system is initially designed for single-sequence processing but can be integrated with an encoder-decoder architecture. Experiments on language modeling, sentiment analysis, and natural language inference show that the model matches or outperforms state-of-the-art results.
The model addresses challenges faced by sequence-level networks, including gradient issues, memory compression, and lack of structure handling. It incorporates memory and attention mechanisms to enhance memorization and token relation discovery. The resulting Long Short-Term Memory-Network (LSTMN) is a reading simulator that can be used for sequence processing tasks. The model processes text incrementally, learning which past tokens are relevant and how they relate to the current token. It induces undirected relations among tokens as an intermediate step in learning representations.
The LSTMN is evaluated on language modeling, sentiment analysis, and natural language inference tasks. It achieves performance comparable or better than state-of-the-art models and superior to vanilla LSTMs. The model uses attention to capture soft, differentiable relations between tokens, which is in contrast to shift-reduce type models where intermediate decisions are hard. The LSTMN captures undirected lexical relations and is distinct from work on dependency grammar induction.
The model is combined with an encoder-decoder architecture to handle two sequences, such as in machine translation and textual entailment. It uses intra- and inter-attention mechanisms to capture relations between sequences. The model is tested on the Penn Treebank, Stanford Sentiment Treebank, and SNLI datasets, achieving strong performance across tasks. The results show that the LSTMN outperforms LSTM baselines and achieves state-of-the-art performance on the natural language inference task. The model's performance is evaluated using various metrics, including perplexity, accuracy, and F1 score. The model's success is attributed to its ability to capture long-term dependencies and handle structured input effectively. The paper concludes that the LSTMN is a promising approach for machine reading and sequence processing tasks.This paper introduces a machine reading simulator that improves sequence-level networks' ability to handle structured input. The model processes text incrementally from left to right, using memory and attention for shallow reasoning. It replaces the single memory cell in Long Short-Term Memory (LSTM) with a memory network, enabling adaptive memory usage and weakly inducing relations among tokens. The system is initially designed for single-sequence processing but can be integrated with an encoder-decoder architecture. Experiments on language modeling, sentiment analysis, and natural language inference show that the model matches or outperforms state-of-the-art results.
The model addresses challenges faced by sequence-level networks, including gradient issues, memory compression, and lack of structure handling. It incorporates memory and attention mechanisms to enhance memorization and token relation discovery. The resulting Long Short-Term Memory-Network (LSTMN) is a reading simulator that can be used for sequence processing tasks. The model processes text incrementally, learning which past tokens are relevant and how they relate to the current token. It induces undirected relations among tokens as an intermediate step in learning representations.
The LSTMN is evaluated on language modeling, sentiment analysis, and natural language inference tasks. It achieves performance comparable or better than state-of-the-art models and superior to vanilla LSTMs. The model uses attention to capture soft, differentiable relations between tokens, which is in contrast to shift-reduce type models where intermediate decisions are hard. The LSTMN captures undirected lexical relations and is distinct from work on dependency grammar induction.
The model is combined with an encoder-decoder architecture to handle two sequences, such as in machine translation and textual entailment. It uses intra- and inter-attention mechanisms to capture relations between sequences. The model is tested on the Penn Treebank, Stanford Sentiment Treebank, and SNLI datasets, achieving strong performance across tasks. The results show that the LSTMN outperforms LSTM baselines and achieves state-of-the-art performance on the natural language inference task. The model's performance is evaluated using various metrics, including perplexity, accuracy, and F1 score. The model's success is attributed to its ability to capture long-term dependencies and handle structured input effectively. The paper concludes that the LSTMN is a promising approach for machine reading and sequence processing tasks.