2004 | Adam Kilgarriff, Pavel Rychly, Pavel Smrz, David Tugwell
The Sketch Engine is a corpus-based tool that generates word sketches, thesauruses, and sketch differences for any language. Word sketches are one-page summaries of a word's grammatical and collocational behavior, first used in the Macmillan English Dictionary. The Sketch Engine takes a corpus and grammar patterns as input and produces word sketches for words in that language, along with a thesaurus and sketch differences that highlight similarities and differences between near-synonyms.
The Sketch Engine uses grammatical relations to identify collocates for each grammatical role a word plays. It supports lemmatization and POS-tagging, and requires input in a specific format. For languages with free word order, such as Czech, the engine uses patterns based on the grammar of SYNT, a deep parser for Czech. It addresses the challenge of free word order by using gaps in patterns to identify grammatical relations.
The Sketch Engine also builds a thesaurus based on shared grammatical relations, and sketch differences that compare near-synonyms by highlighting shared and unique collocates. A case study on Czech word sketches showed that they can be useful for lexicographic work, as they can provide information that is not easily found in traditional dictionaries.
The Sketch Engine is available as a commercial product and can be used over the web. It supports corpora of Czech, Irish, and English, and is available for clients to host their own corpora. Future plans include supporting multi-word items and extending sketch differences to compare words across different subcorpora. The Sketch Engine is a valuable tool for lexicographers and researchers interested in analyzing word behavior and near-synonym differences.The Sketch Engine is a corpus-based tool that generates word sketches, thesauruses, and sketch differences for any language. Word sketches are one-page summaries of a word's grammatical and collocational behavior, first used in the Macmillan English Dictionary. The Sketch Engine takes a corpus and grammar patterns as input and produces word sketches for words in that language, along with a thesaurus and sketch differences that highlight similarities and differences between near-synonyms.
The Sketch Engine uses grammatical relations to identify collocates for each grammatical role a word plays. It supports lemmatization and POS-tagging, and requires input in a specific format. For languages with free word order, such as Czech, the engine uses patterns based on the grammar of SYNT, a deep parser for Czech. It addresses the challenge of free word order by using gaps in patterns to identify grammatical relations.
The Sketch Engine also builds a thesaurus based on shared grammatical relations, and sketch differences that compare near-synonyms by highlighting shared and unique collocates. A case study on Czech word sketches showed that they can be useful for lexicographic work, as they can provide information that is not easily found in traditional dictionaries.
The Sketch Engine is available as a commercial product and can be used over the web. It supports corpora of Czech, Irish, and English, and is available for clients to host their own corpora. Future plans include supporting multi-word items and extending sketch differences to compare words across different subcorpora. The Sketch Engine is a valuable tool for lexicographers and researchers interested in analyzing word behavior and near-synonym differences.