Formal Language Understanding

Codebase for investigating the semantic capabilities of language models in formal language environments. Specifically, in these environments, the process generating the training data is idealized in terms of pragmatic theory and controllable via many hyperparameters. This enables testing hypotheses about how speakers' pragmatic concerns embed semantic information in raw text that language models can leverage.

Setup

pip install allennlp allennlp_models

Generate Synthetic Data

To generate training data in the language powerset:

python generate.py powerset --temp=5 --cost=.5  > documents.txt

Development and testing sets can be generated by specifying a different random seeds.

python generate.py powerset --seed=3 --temp=5 --cost=.5  > dev_documents.txt

Full documentation of all the training data can be found in scripts/generate.s:

source scripts/generate.s

Train LMs

The following command shows how to train and save a language model on the synthetic data:

CUDA=0 TRAIN=documents.txt DEV=dev_documents.txt allennlp train training_config/bi_lm.jsonnet -s=rsa1_model

Note that this part can be done using whatever language modeling framework you want, but I'm using AllenNLP.

I also have provided scripts to train a bunch of models on the NYU slurm cluster, assuming all the data has been setup with generate.s.

SPEAKER=literal source scripts/launch_train.sh
SPEAKER=informative source scripts/launch_train.sh
SPEAKER=independent source scripts/launch_train.sh

Evaluation

General evaluation:

python evaluate.py independent \
    --model_dir=$ROOT/models/powerset-3/literal \
    --eval_path=$ROOT/data/powerset-3/eval.tsv

Evaluation setting cost to its gold value:

ROOT=$SCRATCH/synthetic-language-understanding
python evaluate.py informative --cost=0.1 \
    --model_dir=$ROOT/models/powerset-3/informative \
    --eval_path=$ROOT/data/powerset-3/eval.tsv

Generate compositional test data

We use the script generate_compositional_test_data.py to generate pairs of entailed and non-entailed texts. In addition to entailment labels, it will also output gold probabilities of each text according to the RSA model.

Use argument n_items to specify the number of worlds, and max_sent_len to specify the maximum number to lexical items in a premise or hypothesis (not including the stop token).

For example:

cd ${PROJECT_ROOT}
for agent in vanilla dependent; do
  python generate_compositional_test_data.py ${lang} 
    --${agent} \
    --n_items=3 \
    --temp=5 \
    --cost=0.1 \
    --max_sent_len=5 \
    --eval_dir=data/powerset/dependent

N-gram model evaluation

The following command will reproduce our n-gram model evaluation.

cd ${PROJECT_ROOT}/src/evaluation
python evaluate_entailment.py \
--test_data=data/powerset/dependent/eval_entail-3_worlds-5_sents.tsv \
--distributional_model=ngram \
--lang=powerset \
--dependent \
--n_items=3 \
--cost=0.1 \
--training_dir=data/powerset/dependent \
--order=3 \
--size=100000000 \
--plot_type=line \
--complexity=length \
--n-increments=22
--auc

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
gpt		gpt
old_scripts		old_scripts
scripts		scripts
src		src
training_config		training_config
.gitignore		.gitignore
README.md		README.md
alex_notes.md		alex_notes.md
evaluate_entailment.py		evaluate_entailment.py
generate.py		generate.py
generate_compositional_test_data.py		generate_compositional_test_data.py
generate_test.py		generate_test.py
plot_metadata.py		plot_metadata.py
predictive_plot.py		predictive_plot.py

viking-sudo-rm/formal-language-understanding

Folders and files

Latest commit

History

Repository files navigation

Formal Language Understanding

Setup

Generate Synthetic Data

Train LMs

Evaluation

Generate compositional test data

N-gram model evaluation

About

Resources

Stars

Watchers

Forks

Languages