Skip to content

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

License

Notifications You must be signed in to change notification settings

zhenyuhe00/BiPE

Repository files navigation

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation 🔥, ICML 2024

News

🔥May 2 2024: BiPE is accepted to ICML 2024!

🔥Apr 11 2024: Release a 1.6B BiPE-RoPE model pre-trained on 300B tokens, demonstrating consistent extrapolation ability comparable to that of the 151 million-parameter version.

🔥Apr 4 2024: Initial commits. More codes (YaRN finetuning, SCROLLs finetuning) are coming soon.

Overview

This repository contains the source code for

↓Overview of BiPE

Setup Environment

conda create -n bipe python=3.9
conda activate bipe
pip3 install -r requirements.txt

Data for Pretraining

We use the Pile for pretraining with all copyrighted data removed.

cd BiPE;
DATA_DIR=./data # the directory to save the data
python3 download_data.py --dataset-cache-dir $DATA_DIR

Pretraining

The scripts under script/ covers the commands for training and perpleixity evaluation.

For training, the key modifications for BiPE are getting token ids (intra-segment) and position ids (inter-segment) by the get_bilevel_ids function. Then, the token ids are used to get absolute positional encodings (get_ape_embeddings) and the position ids are used to get relative positional encodings. For example, you can start training 151M BiPE-RoPE model with the following command:

cd BiPE
OUTPUT_DIR=./output  # path to save checkpoints and tensorboard
DATA_DIR=./data  # path to load data
CONFIG_NAME=config/bipe_rope.json
bash script/train.sh

You can change CONFIG_NAME to choose different positional encoding variants. (choose from [config/bipe_rope.json, config/bipe_alibi.json, config/rope.json, config/alibi.json)

Perplexity Evaluation

For perplexity evaluation, you can use the following command:

cd BiPE;
DATA_DIR=./data  # path to load data
MODEL=./bipe_rope # model checkpoint path
bash script/eval.sh

You can also download our pre-trained models (Note that the 1.6B model is pre-trained with a batch size of 1024):

Model HuggingFace Checkpoint 🤗
BiPE_RoPE-151M link
BiPE_RoPE-1.6B link
RoPE-151M link
BiPE_ALiBi-151M link
ALiBi-151M link

For example, to evaluate BiPE-RoPE-151M, you can use the following command:

git lfs install
git clone https://huggingface.co/hzy00/BiPE_RoPE-151M
DATA_DIR=./data  # path to load data
MODEL=./BiPE_RoPE-151M # model checkpoint path
bash script/eval.sh

Citations

@article{he2024two,
  title={Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation},
  author={He, Zhenyu and Feng, Guhao and Luo, Shengjie and Yang, Kai and He, Di and Xu, Jingjing and Zhang, Zhi and Yang, Hongxia and Wang, Liwei},
  journal={arXiv preprint arXiv:2401.16421},
  year={2024}
}

About

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published