Rotary Transformer

Rotary Transformer is an MLM pre-trained language model with rotary position embedding (RoPE). The RoPE is a relative position encoding method with promise theoretical properties. The main idea is to multiply the context embeddings (q,k in the Transformer) by rotation matrices depending on the absolute position. One can prove that the inner product of the context embeddings will become only depending on the relative position. EleutherAI also posted a blog that contains an intuitive explanation and experiments about RoPE.

To the best of our knowledge, RoPE is the only relative position embeddings that can be used in linear attentions. For more details, please refer to our paper or the original Blog (Chinese). EleutherAI also posted a blog that contains an intuitive explanation and experiments of using RoPE in various models.

Dependency

bert4keras 0.10.4

Implementation

You can implement the RoPE with a few lines of changes in the self-attention layer. Here we provide the pseudo code for instruction.

sinusoidal_pos.shape = [1, seq_len, hidden_size] # Sinusoidal position embeddings
qw.shape = [batch_size, seq_len, num_heads, hidden_size]  # query hiddens
kw.shape = [batch_size, seq_len, num_heads, hidden_size]  # key hiddens

cos_pos = repeat_elements(sinusoidal_pos[..., None, 1::2], rep=2, axis=-1)
sin_pos = repeat_elements(sinusoidal_pos[..., None, ::2], rep=2, axis=-1)
qw2 = stack([-qw[..., 1::2], qw[..., ::2]], 4)
qw2 = reshape(qw2, shape(qw))
qw = qw * cos_pos + qw2 * sin_pos
kw2 = K.stack([-kw[..., 1::2], kw[..., ::2]], 4)
kw2 = K.reshape(kw2, K.shape(kw))
kw = kw * cos_pos + kw2 * sin_pos

# Attention
a = tf.einsum('bjhd,bkhd->bhjk', qw, kw)

Or you can find the implementation in source code of bert4keras.

Download

Other Implementations

A pytorch implementation can be found here
x-transformer, GPT-Neo, GPT-NeoX and mesh-transformer-jax by EleutherAI

Citation

Bibtex:

@misc{su2021roformer,
      title={RoFormer: Enhanced Transformer with Rotary Position Embedding}, 
      author={Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu},
      year={2021},
      eprint={2104.09864},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
finetune_scm.py		finetune_scm.py
test_roformer_gpt.py		test_roformer_gpt.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rotary Transformer

Dependency

Implementation

Download

Other Implementations

Citation

About

Releases

Packages

Languages

License

yhaiqiang/roformer

Folders and files

Latest commit

History

Repository files navigation

Rotary Transformer

Dependency

Implementation

Download

Other Implementations

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages