26 Aug 2015 | Tsung-Hsien Wen, Milica Gašić, Nikola Mrkšić, Pei-Hao Su, David Vandyke and Steve Young
This paper presents a statistical natural language generation (NLG) system based on a semantically controlled Long Short-term Memory (LSTM) recurrent neural network. The system is designed to generate natural, varied responses for spoken dialogue systems. Unlike traditional rule-based or heuristic-based NLG systems, which produce rigid and stylized responses, the proposed LSTM-based generator learns from unaligned data by jointly optimizing sentence planning and surface realization using a simple cross-entropy training criterion. It can generate diverse language by sampling from output candidates, and it requires fewer heuristics, leading to improved performance compared to previous methods.
The system is based on a semantically controlled LSTM (SC-LSTM) cell, which integrates a sentence planning component with a surface realization component. The SC-LSTM cell uses a sigmoid control gate and a dialogue act (DA) to guide the generation process. The system is extended to a deep structure, allowing for more complex feature learning. A backward reranker is introduced to improve fluency, and the system is trained and decoded using a combination of forward and backward generation processes.
The system is evaluated on two domains: restaurants and hotels. Objective metrics such as BLEU-4 and slot error rate (ERR) show that the proposed method outperforms several baselines, including handcrafted generators, k-nearest neighbor (kNN), and class-based language models. Human evaluations also confirm that the SC-LSTM system is preferred for its informativeness and naturalness.
The system is trained on unaligned data and can be scaled to multiple domains and languages. It uses a deep architecture with multiple LSTM layers and skip connections to mitigate the vanishing gradient problem. Dropout is applied to prevent overfitting, and the system is trained using backpropagation through time. The system's ability to learn and adapt to different dialogue acts and slot-value pairs makes it more flexible and effective in generating natural language responses. The proposed approach represents a significant advancement in NLG for spoken dialogue systems, offering a more natural and varied response generation method.This paper presents a statistical natural language generation (NLG) system based on a semantically controlled Long Short-term Memory (LSTM) recurrent neural network. The system is designed to generate natural, varied responses for spoken dialogue systems. Unlike traditional rule-based or heuristic-based NLG systems, which produce rigid and stylized responses, the proposed LSTM-based generator learns from unaligned data by jointly optimizing sentence planning and surface realization using a simple cross-entropy training criterion. It can generate diverse language by sampling from output candidates, and it requires fewer heuristics, leading to improved performance compared to previous methods.
The system is based on a semantically controlled LSTM (SC-LSTM) cell, which integrates a sentence planning component with a surface realization component. The SC-LSTM cell uses a sigmoid control gate and a dialogue act (DA) to guide the generation process. The system is extended to a deep structure, allowing for more complex feature learning. A backward reranker is introduced to improve fluency, and the system is trained and decoded using a combination of forward and backward generation processes.
The system is evaluated on two domains: restaurants and hotels. Objective metrics such as BLEU-4 and slot error rate (ERR) show that the proposed method outperforms several baselines, including handcrafted generators, k-nearest neighbor (kNN), and class-based language models. Human evaluations also confirm that the SC-LSTM system is preferred for its informativeness and naturalness.
The system is trained on unaligned data and can be scaled to multiple domains and languages. It uses a deep architecture with multiple LSTM layers and skip connections to mitigate the vanishing gradient problem. Dropout is applied to prevent overfitting, and the system is trained using backpropagation through time. The system's ability to learn and adapt to different dialogue acts and slot-value pairs makes it more flexible and effective in generating natural language responses. The proposed approach represents a significant advancement in NLG for spoken dialogue systems, offering a more natural and varied response generation method.