PyTorch Examples repo for "ReZero is All You Need: Fast Convergence at Large Depth"
This repo contains examples demonstrating the power of the ReZero architecture, see the paper.

The official ReZero repo is here.


Final valid errors: Vanilla - 7.74. FixUp - 7.5. ReZero - 6.38, see .


If you find ReZero or a similar architecture improves the performance of your application, you are invited to share a demonstration here.


To install ReZero via pip use pip install rezero


We provide custom ReZero Transformer layers (RZTX).

For example, this will create a Transformer encoder:

import torch
import torch.nn as nn
from rezero.transformer import RZTXEncoderLayer

encoder_layer = RZTXEncoderLayer(d_model=512, nhead=8)
transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
src = torch.rand(10, 32, 512)
out = transformer_encoder(src)


If you find rezero useful for your research, please cite our paper:

    title = "ReZero is All You Need: Fast Convergence at Large Depth",
    author = "Bachlechner, Thomas  and
      Majumder, Bodhisattwa Prasad
      Mao, Huanru Henry and
      Cottrell, Garrison W. and
      McAuley, Julian",
    booktitle = "arXiv",
    year = "2020",
    url = ""
