Skip to content

viking-sudo-rm/formal-language-understanding

Repository files navigation

Formal Language Understanding

Codebase for investigating the semantic capabilities of language models in formal language environments. Specifically, in these environments, the process generating the training data is idealized in terms of pragmatic theory and controllable via many hyperparameters. This enables testing hypotheses about how speakers' pragmatic concerns embed semantic information in raw text that language models can leverage.

Setup

pip install allennlp allennlp_models

Generate Synthetic Data

To generate training data in the language powerset:

python generate.py powerset --temp=5 --cost=.5  > documents.txt

Development and testing sets can be generated by specifying a different random seeds.

python generate.py powerset --seed=3 --temp=5 --cost=.5  > dev_documents.txt

Full documentation of all the training data can be found in scripts/generate.s:

source scripts/generate.s

Train LMs

The following command shows how to train and save a language model on the synthetic data:

CUDA=0 TRAIN=documents.txt DEV=dev_documents.txt allennlp train training_config/bi_lm.jsonnet -s=rsa1_model

Note that this part can be done using whatever language modeling framework you want, but I'm using AllenNLP.

I also have provided scripts to train a bunch of models on the NYU slurm cluster, assuming all the data has been setup with generate.s.

SPEAKER=literal source scripts/launch_train.sh
SPEAKER=informative source scripts/launch_train.sh
SPEAKER=independent source scripts/launch_train.sh

Evaluation

General evaluation:

python evaluate.py independent \
    --model_dir=$ROOT/models/powerset-3/literal \
    --eval_path=$ROOT/data/powerset-3/eval.tsv

Evaluation setting cost to its gold value:

ROOT=$SCRATCH/synthetic-language-understanding
python evaluate.py informative --cost=0.1 \
    --model_dir=$ROOT/models/powerset-3/informative \
    --eval_path=$ROOT/data/powerset-3/eval.tsv

Generate compositional test data

We use the script generate_compositional_test_data.py to generate pairs of entailed and non-entailed texts. In addition to entailment labels, it will also output gold probabilities of each text according to the RSA model.

Use argument n_items to specify the number of worlds, and max_sent_len to specify the maximum number to lexical items in a premise or hypothesis (not including the stop token).

For example:

cd ${PROJECT_ROOT}
for agent in vanilla dependent; do
  python generate_compositional_test_data.py ${lang} 
    --${agent} \
    --n_items=3 \
    --temp=5 \
    --cost=0.1 \
    --max_sent_len=5 \
    --eval_dir=data/powerset/dependent

N-gram model evaluation

The following command will reproduce our n-gram model evaluation.

cd ${PROJECT_ROOT}/src/evaluation
python evaluate_entailment.py \
--test_data=data/powerset/dependent/eval_entail-3_worlds-5_sents.tsv \
--distributional_model=ngram \
--lang=powerset \
--dependent \
--n_items=3 \
--cost=0.1 \
--training_dir=data/powerset/dependent \
--order=3 \
--size=100000000 \
--plot_type=line \
--complexity=length \
--n-increments=22
--auc

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published