1994 | Hamish Cunningham, Yorick Wilks, Robert J. Gaizauskas
GATE is a General Architecture for Text Engineering, developed by the University of Sheffield to address the challenges of reusing algorithmic resources in Natural Language Engineering (NLE). Despite progress in reusable data resources like grammars and thesauruses, algorithmic resources have seen limited reuse due to factors such as cultural resistance and integration overheads. GATE aims to overcome these issues by providing a general architecture and development environment specifically designed for text processing systems.
GATE offers a common infrastructure for building language engineering (LE) systems, including a database for storing text information (GATE Document Manager - GDM), a graphical interface for launching tools and evaluating results (GATE Graphical Interface - GGI), and a collection of reusable objects for language engineering (CREOLE). GDM is based on the TIPSTER document manager and is planned to enhance its SGML capabilities. GGI allows for the interactive assembly of system configurations and the testing of LE components. CREOLE modules handle the actual text analysis tasks, providing standardized APIs for accessing data via GDM and GGI.
GATE supports the reuse of existing components, reducing the overhead of developing new systems from scratch. It allows researchers to mix and match elements of MUC technology with their own components, enabling the benefits of large-scale systems without the associated overheads. The initial release of GATE includes a CREOLE set comprising a complete MUC-compatible Information Extraction (IE) system, with some components based on freely available software and others derived from Sheffield's MUC-6 entrant, LaSIE.
GATE also addresses the challenges of integrating diverse LE modules by enforcing a separation between information representation and storage mechanisms, thereby reducing integration overheads. While GATE cannot eliminate all integration barriers, it provides a common model for expressing text information and a common storage mechanism, facilitating algorithmic reuse. The modularity of GATE-based systems contributes to reducing engineering overheads when porting systems to different domains.
GATE is intended to benefit both LE researchers and industrialists, offering a flexible development environment that allows for the customization of interfaces and the upgrading of systems with better technology. Despite these advantages, challenges remain in porting LE systems to new domains, which is an ongoing research issue. Overall, GATE aims to increase confidence in algorithmic reuse by providing a standardized framework for developing and evaluating LE components.GATE is a General Architecture for Text Engineering, developed by the University of Sheffield to address the challenges of reusing algorithmic resources in Natural Language Engineering (NLE). Despite progress in reusable data resources like grammars and thesauruses, algorithmic resources have seen limited reuse due to factors such as cultural resistance and integration overheads. GATE aims to overcome these issues by providing a general architecture and development environment specifically designed for text processing systems.
GATE offers a common infrastructure for building language engineering (LE) systems, including a database for storing text information (GATE Document Manager - GDM), a graphical interface for launching tools and evaluating results (GATE Graphical Interface - GGI), and a collection of reusable objects for language engineering (CREOLE). GDM is based on the TIPSTER document manager and is planned to enhance its SGML capabilities. GGI allows for the interactive assembly of system configurations and the testing of LE components. CREOLE modules handle the actual text analysis tasks, providing standardized APIs for accessing data via GDM and GGI.
GATE supports the reuse of existing components, reducing the overhead of developing new systems from scratch. It allows researchers to mix and match elements of MUC technology with their own components, enabling the benefits of large-scale systems without the associated overheads. The initial release of GATE includes a CREOLE set comprising a complete MUC-compatible Information Extraction (IE) system, with some components based on freely available software and others derived from Sheffield's MUC-6 entrant, LaSIE.
GATE also addresses the challenges of integrating diverse LE modules by enforcing a separation between information representation and storage mechanisms, thereby reducing integration overheads. While GATE cannot eliminate all integration barriers, it provides a common model for expressing text information and a common storage mechanism, facilitating algorithmic reuse. The modularity of GATE-based systems contributes to reducing engineering overheads when porting systems to different domains.
GATE is intended to benefit both LE researchers and industrialists, offering a flexible development environment that allows for the customization of interfaces and the upgrading of systems with better technology. Despite these advantages, challenges remain in porting LE systems to new domains, which is an ongoing research issue. Overall, GATE aims to increase confidence in algorithmic reuse by providing a standardized framework for developing and evaluating LE components.