| Adam Kilgarriff, Pavel Rychly, Pavel Smrz, David Tugwell
The Sketch Engine is a corpus-based tool developed to generate word sketches, thesauri, and 'sketch differences' for any language. Word sketches are one-page summaries of a word's grammatical and collocational behavior, originally used in the Macmillan English Dictionary. The Sketch Engine takes a corpus with appropriate linguistic markup as input and outputs word sketches, which can be integrated into corpus query systems (CQS) like Manatee. The tool supports lemmatization, POS tagging, and identifies grammatical relations using regular expressions over POS tags. It also generates a thesaurus based on similarity measures and 'sketch differences' to highlight similarities and differences between near-synonyms. The paper discusses the development of the Sketch Engine, its application to Czech, and its evaluation, showing that word sketches can facilitate lexicographic work in languages with free word order, such as Czech. The Sketch Engine is available as a commercial product and future plans include enhancing support for multi-word items and extending sketch difference functionality.The Sketch Engine is a corpus-based tool developed to generate word sketches, thesauri, and 'sketch differences' for any language. Word sketches are one-page summaries of a word's grammatical and collocational behavior, originally used in the Macmillan English Dictionary. The Sketch Engine takes a corpus with appropriate linguistic markup as input and outputs word sketches, which can be integrated into corpus query systems (CQS) like Manatee. The tool supports lemmatization, POS tagging, and identifies grammatical relations using regular expressions over POS tags. It also generates a thesaurus based on similarity measures and 'sketch differences' to highlight similarities and differences between near-synonyms. The paper discusses the development of the Sketch Engine, its application to Czech, and its evaluation, showing that word sketches can facilitate lexicographic work in languages with free word order, such as Czech. The Sketch Engine is available as a commercial product and future plans include enhancing support for multi-word items and extending sketch difference functionality.