1998 | Collin F. Baker and Charles J. Fillmore and John B. Lowe
The Berkeley FrameNet Project is a three-year NSF-funded initiative in corpus-based computational lexicography, now in its second year. The project focuses on creating a database that represents the semantic and syntactic valences of target words, primarily nouns, adjectives, and verbs, using frame semantics. The database will include descriptions of semantic frames underlying word meanings, valence representations of thousands of words and phrases, and annotated corpus examples that illustrate the linkages between frame elements and their syntactic realizations. The project's goals include developing computational tools for lexicography and providing a lexical resource and associated software tools.
The project's primary focus is encoding semantic knowledge in machine-readable form, guided by corpus-based research. The semantic domains covered include healthcare, chance, perception, communication, transaction, time, space, body, motion, life stages, social context, emotion, and cognition. The project's results include a FrameNet database with three major components: a lexicon containing entries with conventional dictionary data, formulas capturing morphosyntactic realizations, links to annotated example sentences, and connections to other resources; a frame database describing conceptual structures and frame elements; and annotated example sentences that exemplify semantic and morphosyntactic properties.
The FrameNet project is similar to efforts to describe argument structures in terms of case-roles or theta-roles, but in FrameNet, frame elements are local to specific conceptual structures. The project involves four main steps: preparation, subcorpus extraction, annotation, and entry writing. The computational side of the project involves capturing human insights into semantic structure, with most work involving semantic tagging, frame structure specification, and dictionary-style entry writing. The software used includes PERL-based programs, and the data model is implemented in SGML. The project has produced over 10,000 annotated sentences and over 20,000 frame element tokens, with the inventory expected to grow rapidly. The final database of 5,000 lexical units may contain 250,000 annotated sentences and over half a million frame element tokens.The Berkeley FrameNet Project is a three-year NSF-funded initiative in corpus-based computational lexicography, now in its second year. The project focuses on creating a database that represents the semantic and syntactic valences of target words, primarily nouns, adjectives, and verbs, using frame semantics. The database will include descriptions of semantic frames underlying word meanings, valence representations of thousands of words and phrases, and annotated corpus examples that illustrate the linkages between frame elements and their syntactic realizations. The project's goals include developing computational tools for lexicography and providing a lexical resource and associated software tools.
The project's primary focus is encoding semantic knowledge in machine-readable form, guided by corpus-based research. The semantic domains covered include healthcare, chance, perception, communication, transaction, time, space, body, motion, life stages, social context, emotion, and cognition. The project's results include a FrameNet database with three major components: a lexicon containing entries with conventional dictionary data, formulas capturing morphosyntactic realizations, links to annotated example sentences, and connections to other resources; a frame database describing conceptual structures and frame elements; and annotated example sentences that exemplify semantic and morphosyntactic properties.
The FrameNet project is similar to efforts to describe argument structures in terms of case-roles or theta-roles, but in FrameNet, frame elements are local to specific conceptual structures. The project involves four main steps: preparation, subcorpus extraction, annotation, and entry writing. The computational side of the project involves capturing human insights into semantic structure, with most work involving semantic tagging, frame structure specification, and dictionary-style entry writing. The software used includes PERL-based programs, and the data model is implemented in SGML. The project has produced over 10,000 annotated sentences and over 20,000 frame element tokens, with the inventory expected to grow rapidly. The final database of 5,000 lexical units may contain 250,000 annotated sentences and over half a million frame element tokens.