| Collin F. Baker and Charles J. Fillmore and John B. Lowe
The Berkeley FrameNet project is a three-year NSF-supported initiative focused on corpus-based computational lexicography. The project aims to create a database of frame-semantic descriptions for several thousand English lexical items, supported by semantically annotated examples from contemporary English corpora. Key features include:
1. **Commitment to Corpus Evidence**: The project emphasizes the use of corpus data for semantic and syntactic generalizations.
2. **Frame Semantics**: The semantic portion of the project uses frame semantics to describe the valences of target words (mainly nouns, adjectives, and verbs).
3. **Database Components**:
- **Lexicon**: Comprises conventional dictionary data, formulas for morphosyntactic realization, links to annotated example sentences, and links to other resources.
- **Frame Database**: Describes the conceptual structure of each frame and provides names and descriptions for its elements.
- **Annotated Example Sentences**: Marked up to illustrate the semantic and morphosyntactic properties of lexical items.
4. **Scope**: The project covers various semantic domains, including HEALTH CARE, CHANCE, PERCEPTION, COMMUNICATION, TRANSACTION, TIME, SPACE, BODY, MOTION, LIFE STAGES, SOCIAL CONTEXT, EMOTION, and COGNITION.
5. **Conceptual Model**: FrameNet uses local frame elements (FEs) within specific conceptual structures (frames), with some frames being general and others specific to a small set of lexical items.
6. **Workflow**: The project involves four main steps: Preparation, Subcorpus Extraction, Annotation, and Entry Writing. Each step is performed by different roles: Vanguard, Annotators, and Rearguard.
7. **Implementation**: The data structures are implemented in SGML, and the software suite includes PERL/CGI-based tools for various tasks.
8. **Conclusion**: The project has made significant progress, with nearly 10,000 annotated sentences and over 20,000 frame element tokens marked. The final database is expected to contain 250,000 annotated sentences and over half a million tokens of frame elements.
The project aims to provide a comprehensive resource for computational linguistics and lexicography, enhancing the understanding and use of lexical items in natural language processing.The Berkeley FrameNet project is a three-year NSF-supported initiative focused on corpus-based computational lexicography. The project aims to create a database of frame-semantic descriptions for several thousand English lexical items, supported by semantically annotated examples from contemporary English corpora. Key features include:
1. **Commitment to Corpus Evidence**: The project emphasizes the use of corpus data for semantic and syntactic generalizations.
2. **Frame Semantics**: The semantic portion of the project uses frame semantics to describe the valences of target words (mainly nouns, adjectives, and verbs).
3. **Database Components**:
- **Lexicon**: Comprises conventional dictionary data, formulas for morphosyntactic realization, links to annotated example sentences, and links to other resources.
- **Frame Database**: Describes the conceptual structure of each frame and provides names and descriptions for its elements.
- **Annotated Example Sentences**: Marked up to illustrate the semantic and morphosyntactic properties of lexical items.
4. **Scope**: The project covers various semantic domains, including HEALTH CARE, CHANCE, PERCEPTION, COMMUNICATION, TRANSACTION, TIME, SPACE, BODY, MOTION, LIFE STAGES, SOCIAL CONTEXT, EMOTION, and COGNITION.
5. **Conceptual Model**: FrameNet uses local frame elements (FEs) within specific conceptual structures (frames), with some frames being general and others specific to a small set of lexical items.
6. **Workflow**: The project involves four main steps: Preparation, Subcorpus Extraction, Annotation, and Entry Writing. Each step is performed by different roles: Vanguard, Annotators, and Rearguard.
7. **Implementation**: The data structures are implemented in SGML, and the software suite includes PERL/CGI-based tools for various tasks.
8. **Conclusion**: The project has made significant progress, with nearly 10,000 annotated sentences and over 20,000 frame element tokens marked. The final database is expected to contain 250,000 annotated sentences and over half a million tokens of frame elements.
The project aims to provide a comprehensive resource for computational linguistics and lexicography, enhancing the understanding and use of lexical items in natural language processing.