SubtlexUS is database containing word frequencies based on English-US movies and TV series subtitles.
The main strengths of this database are the following :
- Based on spoken-like language
- Based on 50 million words
Search online in SUBTL database
Follow this link, select the “SubtlexUS” database and have fun ! (dynamic research, regular expressions research, sorting, etc.)
Documentation
Brysbaert, M. & New, B. (2009) Moving beyond Kucera and Francis: A Critical Evaluation of Current Word Frequency Norms and the Introduction of a New and Improved Word Frequency Measure for American English. Behavior Research Methods, 41 (4), 977-990.
Download
Here is the corpus from which we randomized the sentences for copyright issues.