SUBTLEX-UK: A new and improved word frequency database for British English

SUBTLEX-UK: A new and improved word frequency database for British English

2014 | Walter J. B. van Heuven, Pawel Mandera, Emmanuel Keuleers, and Marc Brysbaert
The paper presents SUBTLEX-UK, a new word frequency database for British English based on subtitles of British television programs. It shows that SUBTLEX-UK frequencies explain more variance in lexical decision times of the British Lexicon Project than frequencies from the British National Corpus (BNC) or SUBTLEX-US. The database includes word frequencies, contextual diversity, part-of-speech specific frequencies, child-focused word frequencies, and word bigram frequencies. A new measure, the Zipf scale, is introduced to better represent word frequency effects, addressing the limitations of previous standardized measures like frequency per million words (fpmw). The Zipf scale is a logarithmic scale with values from 1 to 7, where values of 3 or less indicate low-frequency words and 4 or more indicate high-frequency words. The database also provides contextual diversity measures, part-of-speech frequencies, and bigram frequencies, offering researchers comprehensive data for British English. The SUBTLEX-UK database is available in three files, providing detailed information on word types, frequencies, and other linguistic features. The study highlights the importance of using subtitle-based frequencies for more accurate word processing research, as they better predict lexical decision times than written source frequencies. The Zipf scale is recommended for its improved interpretation of word frequency effects. The database is available for use by researchers in British English studies.The paper presents SUBTLEX-UK, a new word frequency database for British English based on subtitles of British television programs. It shows that SUBTLEX-UK frequencies explain more variance in lexical decision times of the British Lexicon Project than frequencies from the British National Corpus (BNC) or SUBTLEX-US. The database includes word frequencies, contextual diversity, part-of-speech specific frequencies, child-focused word frequencies, and word bigram frequencies. A new measure, the Zipf scale, is introduced to better represent word frequency effects, addressing the limitations of previous standardized measures like frequency per million words (fpmw). The Zipf scale is a logarithmic scale with values from 1 to 7, where values of 3 or less indicate low-frequency words and 4 or more indicate high-frequency words. The database also provides contextual diversity measures, part-of-speech frequencies, and bigram frequencies, offering researchers comprehensive data for British English. The SUBTLEX-UK database is available in three files, providing detailed information on word types, frequencies, and other linguistic features. The study highlights the importance of using subtitle-based frequencies for more accurate word processing research, as they better predict lexical decision times than written source frequencies. The Zipf scale is recommended for its improved interpretation of word frequency effects. The database is available for use by researchers in British English studies.
Reach us at info@futurestudyspace.com
Understanding Subtlex-UK%3A A New and Improved Word Frequency Database for British English