Skip to content

An unofficial implementation of sigma reparam [Zhai et al. 2023]

Notifications You must be signed in to change notification settings

ywchan2005/sigma-reparam-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

An unofficial implementation of $\sigma$-Reparam

Overview

This repository contains an implementation of $\sigma$-Reparam, which is proposed in Stabilizing Transformer Training by Preventing Attention Entropy Collapse (Zhai et al. 2023) at ICML 2023.

Compared to spectral norm, $\sigma$-Reparam introduces a dimensionless learnable variable $\gamma$ to force the updates of spectral norm to be dimensionality independent.

$$ \hat{W} = \frac{\gamma}{\sigma(W)}W $$

Feedbacks and discussions are welcome on how we could make use of $\sigma$-Reparam to enhance our models.

Compatibility

The implementation is based on torch.nn.utils.parametrizations.spectral_norm in PyTorch v2.1.0. Incompability may arise in newer versions.

Reference

Please refer to the original repository for the official implementation.