In [1]:
import warnings
warnings.filterwarnings('ignore')

import os
os.chdir("../..")

In [3]:
import torch

In [4]:
if torch.backends.mps.is_available():
    device = torch.device('mps')
else:
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print(f"Using device: {device}")

Using device: mps


# 1. Toy model setup

Models may or may not have a privileged basis. A privileged basis refers to a special or **preferred coordinate system or direction** in the representation space of a model — usually one defined by the model’s architecture, such as individual neurons in a hidden layer.

Models without a privileged basis are elegant, and can be an interesting analogue for certain neural network representations which don't have a privileged basis – word embeddings, or the transformer residual stream. But of primary interest is the understanding of neural network representations that have neurons which do impose a privileged basis, such as transformer MLP layers or convolutional network neurons.

The simplest toy model with a privileged basis is a non-privileged basis model with an activation function, which allows for the representation of hidden layers with neurons, such as the transformer MLP layer. Based on the previous model, it can be represented by adding a ReLU to the hidden layer:

\begin{align*}
h &= \text{ReLU}(Wx) \\
x' &= \text{ReLU}(W^T h + b)
\end{align*}


# Sources

1. [Ground truth - Arena::Indirect Object Identification](https://arena-chapter1-transformer-interp.streamlit.app/[1.4.1]_Indirect_Object_Identification)
2. [Interpretability in the wild: A circuit for indirect object identification in GPT-2 small, by Wang, K, et. al.](https://arxiv.org/pdf/2211.00593)
3. [Exploratory Analysis Demo, by Neel Nanda](https://colab.research.google.com/github/TransformerLensOrg/TransformerLens/blob/main/demos/Exploratory_Analysis_Demo.ipynb#scrollTo=WXktSe0CvBdh)
4. [An analogy for understanding transformers, by Callum McDougall](https://www.lesswrong.com/posts/euam65XjigaCJQkcN/an-analogy-for-understanding-transformers)