Skip to content

wjxts/RegularizedBN

Repository files navigation

We present the code and instructions to reproduce our NeurIPS 2022 Spotlight paper "Understanding the Failure of Batch Normalization for Transformers in NLP" on neural machine translation experiments.
For other tasks, you can easily modify the normalization module in language modeling, named entity recognition, text classification to reproduce the corresponding results. For the reason of license, we do not include them here. We are still appending new features.

The codes are based on fairseq (v0.9.0)

BN/RBN module is located at: fairseq\modules\norm\mask_batchnorm3d.py

Reproduction

Install PyTorch (we use Python=3.6 and PyTorch=1.7.1, higher version of python and PyTorch should also work)

conda create -n rbn python=3.6
conda activate rbn
conda install pytorch==1.7.1 torchvision torchaudio cudatoolkit=11.0 -c pytorch (or pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html)

Install fairseq by:

cd RegularizedBN
pip install --editable ./

Install other requirements

pip install -r requirements.txt

IWSLT14 De-En

Download the data from google drive and extract it in data-bin. You can also download it from Baidu Netdisk.

cd data-bin
unzip iwslt14.tokenized.de-en.zip
cd ..

Training the model (8GB GPU memory is enough)

chmod +x ./iwslt14_bash/train-iwslt14-pre-max-epoch.sh ./iwslt14_bash/train-iwslt14-post-max-epoch.sh 
For Pre-Norm Transformer:  
BN: 
CUDA_VISIBLE_DEVICES=0 ./iwslt14_bash/train-iwslt14-pre-max-epoch.sh batch_1_1
RBN: 
CUDA_VISIBLE_DEVICES=1 ./iwslt14_bash/train-iwslt14-pre-max-epoch.sh batch_diff_0.1_0.01  
LN: 
CUDA_VISIBLE_DEVICES=2 ./iwslt14_bash/train-iwslt14-pre-max-epoch.sh layer_1
For Post-Norm Transformer:  
BN: 
CUDA_VISIBLE_DEVICES=0 ./iwslt14_bash/train-iwslt14-post-max-epoch.sh batch_1_1
RBN: 
CUDA_VISIBLE_DEVICES=1 ./iwslt14_bash/train-iwslt14-post-max-epoch.sh batch_diff_60_0
LN: 
CUDA_VISIBLE_DEVICES=2 ./iwslt14_bash/train-iwslt14-post-max-epoch.sh layer_1
 

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages