This example shows how to select four random sets of twenty nouns and verbs of low and high frequencies from Lexique382, using Python. (If you have not already, install Python: Go to https://www.anaconda.com/distribution/; select your OS (Windows, MacOS or Linux) and download the Python 3.7 installer.)
""" Example of selecting items from the Lexique382 database """
import pandas as pd
lex = pandas.read_csv('http://www.lexique.org/databases/Lexique382/Lexique382.tsv', sep='\t')
# alternatively, you can download the table locally:
# lex = pd.read_csv("Lexique382.tsv", sep='\t')
lex.head()
# restricts the search to words with a length between 5 and 8 letters
subset = lex.loc[(lex.nblettres >= 5) & (lex.nblettres <= 8)]
# separates nouns and verbs into two dataframes:
noms = subset.loc[subset.cgram == 'NOM']
verbs = subset.loc[subset.cgram == 'VER']
# splits based on lexical frequency
noms_hi = noms.loc[noms.freqlivres > 50.0]
noms_low = noms.loc[(noms.freqlivres < 10.0) & (noms.freqlivres > 1.0)]
verbs_hi = verbs.loc[verbs.freqlivres > 50.0]
verbs_low = verbs.loc[(verbs.freqlivres < 10.0) & (verbs.freqlivres > 1.0)]
# chooses random items from each of the 4 subsets:
N = 20
noms_hi.sample(N).ortho.to_csv('nomhi.txt', index=False)
noms_low.sample(N).ortho.to_csv('nomlo.txt', index=False)
verbs_hi.sample(N).ortho.to_csv('verhi.txt', index=False)
verbs_hi.sample(N).ortho.to_csv('verlo.txt', index=False)