pip install -r requirements.txt
python setup.py developThe authors used run_slurm.py to run experiments.
The file contains six segments of code, with each segment headed by if False. The six segments correspond to the following (this information is also commented in run_slurm.py):
- (1) training vanilla models (RNNs and LSTMs)
- (2) evaluating vanilla model using regular decoding algorithms (for decoding algorithms, please refer to the decoding algorithm section below)
- (3) evaluating vanilla model using consistent sampling algorithms
- (4) training self-terminating RNN models
- (5) training self-terminating LSTM models
- (6) evaluating self-terminating RNN+LSTM models
Note that by default, we train each model using 10 different random seeds. The number of random seeds can be easily adjusted from run_slurm.py.
In the end of the segments we show how to modify parameters to run with BPE-tokenized dataset..
- Set the values in
user_foldersper the example inrun_slurm.py - Adjust
partitionchoices inrun_slurm.py, and adjust corresponding GPU options (to find the spot to do so, one can searchargs.partition) - After training and before final evaluation, one should adjust
sweep_dirsin the evaluation segments of the code, to refer to absolute locations of checkpoint folders; examples are included inrun_slurm.py
To use one segment, users can set the corresponding if False to if True and run python run_slurm.py.
Alternatively, if users are not in a slurm environment, or if users prefer to run our code through command line, one can print out the actual python commands by including the flag --print-commands.
- When the model is a self-terminating RNN/LSTM,
evaluate.pyonly uses greedy decoding algorithm. - When the model is a regular RNN/LSTM...
- if
--consistent-sampling 0, thenevaluate.pyuses the following decoding algorithms: greedy decoding, ancestral sampling, beam search with beam size 2 and 4, top-k decoding with k=2 and k=4, and nucleus sampling with mu=0.2 and mu=0.4. - if
--consistent-sampling 1, thenevaluate.pyuses the following decoding algorithms: consistent top-k decoding with k=2 and k=4, and consistent nucleus sampling with mu=0.2 and mu=0.4.
- if
The self-terminating wrapper supports Transformers 3.3.1 (current version on Oct 2020)
The gpt2 folder contains all the necessary wrappers to use self-terminating layer in HuggingFace pretrained model.
- (1) Tokenize wikitext-103 dataset:
prepare-wikitext.py - (2) Fine-tune GPT-2 or self-terminating GPT-2:
train_line.py