Code for the ACL 2018 paper "Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context"
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

How Neural Language Models Use Context

This repository contains code for experiments described in the ACL 2018 publication: Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. It was developed using the AWD LSTM Language Model code (the older release in PyTorch v0.2.0) provided by Merity et al.

Running this code requires the user to first train a language model. Then, they can evaluate it with restricted context and perturbed inputs to better understand how it is modeling nearby vs. long-range context. For further details on our experiments, we refer the reader to the paper.

If you use this code or results from our paper, please cite:

  title={{Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context}},
  author={Khandelwal, Urvashi and He, He and Qi, Peng and Jurafsky, Dan},
  journal={Association of Computational Linguistics (ACL)},

Software Requirements

This code requires Python 3 and PyTorch v0.2.0.

Using Anaconda, you can set up a conda environment by running conda create -n lm_context python=3.4 and activating it with source activate lm_context. Then you can install PyTorch v0.2 with conda install pytorch=0.2.0 -c soumith.

Training the Language Model

We have provided all the scripts from Merity et al. that are necessary to train and finetune language models for the Penn Treebank (PTB) and Wikitext-2 (Wiki) datasets.

  • First run to obtain the data.
  • Then, train the model as follows:
    • For PTB, python --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500 --save
    • And for Wiki, python --epochs 750 --data data/wikitext-2 --save --dropouth 0.2 --seed 1882
  • Finetune the model:
    • For PTB, python --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500 --save
    • And for Wiki, python --epochs 750 --data data/wikitext-2 --save --dropouth 0.2 --seed 1882

For further details on the model training instructions, we refer the user to the original code.

Using your own language model

The analysis code provided in this repository is model-agnostic. It is widely usable with a number of different model classes.

If you would like to use your own language model, the analysis code expects a init_hidden function and a callable that returns a tuple of outputs, i.e. the output distributions, and the next hidden state. The sample evaluation script for the language model also expects all of the functionality in

Evaluation Script allows the user to evaluate the language model in the standard infinite-context setting, i.e. the model is provided all previous tokens when estimating the distribution of the next one.

For PTB: python -u --data data/penn/ --seed 141 --start_token 0 --checkpoint --logfile ptb_eval.pkl

For Wiki: python -u --data data/wikitext-2/ --seed 141 --start_token 0 --checkpoint --logfile wiki_eval.pkl

Note: the start_token flag controls how many tokens to exclude from the beginning of the corpus. This is useful when trying to build a fair comparison between the infinite-context model and a restricted context model. For instance, if the context size is restricted to 300 tokens, the first 299 tokens cannot be included in the loss computation and this must be reflected in the loss for the infinite context model as well.

Restricted context allows the user to evaluate their language model with a fixed size finite context. The seq_len flag defines how many tokens to use for context.

For PTB:
mkdir ptb_context_exps
python -u --data data/penn/ --checkpoint --seed 141 --batch_size 50 --seq_len 100 --max_seq_len 1000 --logdir ptb_context_exps/

For Wiki:
mkdir wiki_context_exps
python -u --data data/wikitext-2/ --checkpoint --seed 1882 --batch_size 50 --seq_len 100 --max_seq_len 1000 --logdir wiki_context_exps/

Note: You may need to use a smaller batch size for a larger sequence length and that would change the number of examples processed. Carefully pick the maximum sequence length first to process the same number of examples for each experiment.

Perturbed inputs

You can find a list of functions supported by running python -h and checking the --func flag. Sample commands for the reverse function:

For PTB:
mkdir ptb_reverse
python -u --data data/penn/ --checkpoint --seed 141 --logdir wiki_reverse --batch_size 50 --seq_len 300 --max_seq_len 1000 --freq 108 --rare 1986 --func reverse

For Wiki:
mkdir wiki_reverse
python -u --data data/wikitext-2/ --checkpoint --seed 1882 --logdir wiki_reverse --batch_size 50 --seq_len 300 --max_seq_len 1000 --freq 189 --rare 4282 --func reverse

Neural Cache runs the neural cache at evaluation. It contains added support for saving all the losses to a file.

For PTB: python --data data/penn --save --lambdasm 0.1 --theta 1.0 --window 500 --bptt 5000 --logfile ptb_cache.pkl

For Wiki: python --save --lambdasm 0.1279 --theta 0.662 --window 3785 --bptt 2000 --data data/wikitext-2 --logfile wiki_cache.pkl


  • Add visualization code samples
  • Upgrade to PyTorch v0.3 (to reflect upgrade in Merity et al. code)