Manuel Gimenes & Boris New

You can now access online Worlex_EN (English) and Worldlex_FR.

If you use them, you can cite the following article:
Gimenes, M., & New, B. (2015). Worldlex: Twitter and blog word frequencies for 66 languages. Behavior research methods, 1-10.

These corpora have been collected by Hans Christensen for HCCorpora.

Afrikaans201114.3LinkLink
Albanian201226.2LinkLink
Amharic20132.4Link
Arabic2011-1262.7LinkLink
Armenian201114.6LinkLink
Azeri201212.7LinkLink
Bengali20125.8LinkLink
Bosnian201311.7Link
Catalan201319.7LinkLink
Chinese Simplified201194.1Link
Croatian201225.5LinkLink
Czech201129.4LinkLink
Danish2010-1172LinkLink
Dutch2011-1261.4LinkLink
English US2012104.2LinkLink
Estonian201129.4LinkLink
Finnish201126.4LinkLink
French2011-1284.2LinkLink
Georgian20119.5Link
German2010-1174.8LinkLink
Greek2011-1257.2LinkLink
Greenlandic20123.7Link
Gujarati201110.2LinkLink
Hebrew201121.4LinkLink
Hindi2011-1213.7LinkLink
Hungarian2011-1366.9LinkLink
Icelandic201116.7LinkLink
Indonesian2011-12115.2LinkLink
Italian2011-1279.5LinkLink
Japanese201140.9Link
Kannada20119.1Link
Kazakh20125.6LinkLink
Khmer20126Link
Korean201155.7Link
Latvian201236.3LinkLink
Lithuanian20119.9LinkLink
Macedonian201215.4LinkLink
Malayalam20114LinkLink
Malaysian201123.9Link
Mongolian201210LinkLink
Nepali20135.9LinkLink
Norwegian201144LinkLink
Persian20128.8LinkLink
Polish2011-1374.8LinkLink
Portuguese Brazil201150.7LinkLink
Portuguese Europe2011-1268LinkLink
Punjabi201242.2LinkLink
Romanian2011-1374.6LinkLink
Russian2011-1263.6LinkLink
Serbian (Latin)201321LinkLink
Sinhala201110.8LinkLink
Slovak201122.7LinkLink
Slovenian2012-1335.5LinkLink
Spanish South America201245.8LinkLink
Spanish Spain2011-1245.6LinkLink
Swahili201213.4LinkLink
Swedish2011-1274.1LinkLink
Tagalog201213.9Link
Tamil2011-1210.8Link
Telugu2011-129.9LinkLink
Turkish2011-1263.7LinkLink
Ukrainian201128.8LinkLink
Urdu20127.7Link
Uzbek20125.1Link
Vietnamese201246.1LinkLink
Welsh20133.8LinkLink