2024 | Urchade Zaratiana, Nadi Tomeh, Pierre Holat, Thierry Charnois
This paper introduces an autoregressive text-to-graph framework for joint entity and relation extraction from unstructured text. The authors propose a novel approach that generates a linearized graph where nodes represent text spans and edges represent relation triplets, using a transformer encoder-decoder architecture with a pointing mechanism on a dynamic vocabulary of spans and relation types. This method captures the structural characteristics and boundaries of entities and relations while grounding the output in the original text. The model is evaluated on benchmark datasets (CoNLL 2004, SciERC, and ACE 05) and demonstrates competitive performance, achieving state-of-the-art results on CoNLL 2004 and SciERC. The paper also discusses the effectiveness of the pointing mechanism, the impact of various hyperparameters, and the interpretability of the model's attention maps and learned structure embeddings. The code for the model is available at https://github.com/urchade/ATG.This paper introduces an autoregressive text-to-graph framework for joint entity and relation extraction from unstructured text. The authors propose a novel approach that generates a linearized graph where nodes represent text spans and edges represent relation triplets, using a transformer encoder-decoder architecture with a pointing mechanism on a dynamic vocabulary of spans and relation types. This method captures the structural characteristics and boundaries of entities and relations while grounding the output in the original text. The model is evaluated on benchmark datasets (CoNLL 2004, SciERC, and ACE 05) and demonstrates competitive performance, achieving state-of-the-art results on CoNLL 2004 and SciERC. The paper also discusses the effectiveness of the pointing mechanism, the impact of various hyperparameters, and the interpretability of the model's attention maps and learned structure embeddings. The code for the model is available at https://github.com/urchade/ATG.