SubtlexUS is database containing word frequencies based on English-US movies and TV series subtitles.
The main strengths of this database are the following :

  • Based on spoken-like language
  • Based on 50 million words

Search online in SUBTL database

Follow this link, select the “SubtlexUS” database and have fun ! (dynamic research, regular expressions research, sorting, etc.)

Documentation

Brysbaert, M. & New, B. (2009) Moving beyond Kucera and Francis: A Critical Evaluation of Current Word Frequency Norms and the Introduction of a New and Improved Word Frequency Measure for American English. Behavior Research Methods, 41 (4), 977-990. 

Download

Here is the corpus from which we randomized the sentences for copyright issues.

Corpus