# Wordlex # WorldLex provides word frequencies tables for 64 languages, estimated from web pages (Blog, Twitter and Newspapers). The web pages corpora were assembled by Hans Christensen and are available at [HC-Copora](http://corpora.epizy.com/index.html). According to this web site: > The corpora are collected from publicly available sources by a web crawler. The crawler checks for language, so as to mainly get texts consisting of the desired language. > Once the raw corpus has been collected, it is parsed further, to remove duplicate entries and split into individual lines. Approximately 50% of each entry is then deleted. Since you cannot fully recreate any entries, the entries are anonymised and this is a non-profit venture I believe that it would fall under Fair Use. The frequencies tables were created by [Manuel Gimenes](https://sites.google.com/site/manuelgimeneshomepage/) & [Boris New](http://psycho-usmb.fr/boris.new/) **Website:** **Publication:** Gimenes, Manuel, and Boris New. 2016. [Worldlex: Twitter and Blog Word Frequencies for 66 Languages.](https://drive.google.com/file/d/0B-sE9ac1ksCANWFVN3ZacHFWQ0k/view) _Behavior Research Methods_ 48 (3): 963–72. ---- Time-stamp: <2019-10-05 09:39:10 christophe@pallier.org>