Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Natural Language Inference with Mixed Effects

This project implements reusable annotator random effects components for natural language inference models as presented in the following paper:

Natural Language Inference with Mixed Effects
William Gantt, Benjamin Kane, and Aaron Steven White
*SEM 2020


To begin development, run:

git clone git://
cd nli-mixed-models
pip install --user --no-cache-dir -r ./requirements.txt
python develop

If you add any dependencies as you're working, please be sure to add them to requirements.txt. I have not been working inside a Docker container, but anyone who does should please update the Dockerfile as appropriate. All training scripts (which can be found in the scripts directory should be run from the root directory as follows:

python -m scripts.{categorical,unit}.{script_name} --parameters [path/to/parameters/file]

If no file path is given as an argument to --parameters, the script uses the default JSON file in the appropriate directory — either scripts/categorical or scripts/unit. The config files provided in those directories are merely examples (from actual experiments reported in the paper) and can be consulted as templates for any additional parameters files you may wish to create.

The structure of this package is largely modeled off that of torch-combinatorial. Guidance for how to structure any additions to this package should thus be sought there. The core components of the code were drawn from the "Natural Language Inference with Mixed Effects Models" notebook, which can be found here.

Throughout the code, you may see reference to "subtask (a)" and "subtask (b)". These refer to different ways of performing inference at test time:

  • subtask (a): For a given test example, if we have learned random effects parameters for the annotator associated with that test example, then we use those parameters. Otherwise, we use the mean parameter value across annotators.
  • subtask (b): For a given test example, we always use the mean random effects parameter values across annotators — even if we have learned specific parameters for the relevant annotator. The results presented in the paper are for subtask (a) and these are the results you should look at if trying to reproduce the results from the paper.

Lastly, if these setup instructions are inadequate, please update them as you see fit.


No description, website, or topics provided.







No releases published


No packages published