SUBTLEX-UK: A new and improved word frequency database for British English

SUBTLEX-UK: A new and improved word frequency database for British English

2014 | Walter J. B. van Heuven, Pawel Mandera, Emmanuel Keuleers, and Marc Brysbaert
The paper introduces SUBTLEX-UK, a new database of word frequencies for British English based on television subtitles. The authors argue that subtitle-based word frequencies are more accurate predictors of word processing times than those based on written sources, such as the British National Corpus and SUBTLEX-US. They present various measures derived from the SUBTLEX-UK database, including part-of-speech-specific frequencies, contextual diversity, and word bigram frequencies. Additionally, they introduce the Zipf scale, a standardized frequency measure that addresses the limitations of the traditional frequency per million words (fpmw) measure. The Zipf scale is designed to be more intuitive and easier to interpret, with values ranging from 1 (very low-frequency words) to 7 (very high-frequency content words). The paper also includes validation studies showing that SUBTLEX-UK outperforms other frequency measures in predicting lexical decision times and accuracies in the British Lexicon Project. The database is available in three files, providing detailed information on word frequencies, part-of-speech frequencies, and word bigrams.The paper introduces SUBTLEX-UK, a new database of word frequencies for British English based on television subtitles. The authors argue that subtitle-based word frequencies are more accurate predictors of word processing times than those based on written sources, such as the British National Corpus and SUBTLEX-US. They present various measures derived from the SUBTLEX-UK database, including part-of-speech-specific frequencies, contextual diversity, and word bigram frequencies. Additionally, they introduce the Zipf scale, a standardized frequency measure that addresses the limitations of the traditional frequency per million words (fpmw) measure. The Zipf scale is designed to be more intuitive and easier to interpret, with values ranging from 1 (very low-frequency words) to 7 (very high-frequency content words). The paper also includes validation studies showing that SUBTLEX-UK outperforms other frequency measures in predicting lexical decision times and accuracies in the British Lexicon Project. The database is available in three files, providing detailed information on word frequencies, part-of-speech frequencies, and word bigrams.
Reach us at info@study.space
[slides and audio] Subtlex-UK%3A A New and Improved Word Frequency Database for British English