# Rotary Positional Embeddings (RoPE)

To understand RoPE, you must fundamentally change how you view "position."

- **Old View (Absolute/Learnable):** Position is a signal we add to the data.
- **New View (RoPE):** Position is an orientation in space.

## Core Idea of RoPE (one sentence)

**RoPE rotates token embeddings in a position-dependent way so that attention naturally understands relative distance.**

## Why Rotation Works Better Than Addition

### Learnable Embeddings (old way)
```
embedding = word_vector + position_vector
```

**Problem:**
- Position info is mixed with meaning
- Hard to infer distance

### RoPE (new way)
```
embedding = ROTATE(word_vector, angle = position)
```

**Key advantage:**
- Rotation preserves length
- Only changes direction
- Relative angle difference = relative position difference

## How Rotation is Done (simple math, no fear)

Each embedding is split into pairs:
```
[x1, x2], [x3, x4], [x5, x6], ...
```

Each pair acts like a **2D vector**.

### Rotation Formula:
```
[x', y'] = [ x*cosθ - y*sinθ ,
             x*sinθ + y*cosθ ]
```

**θ (theta) depends on:**
- Token position
- Embedding dimension

## The Math: Rotation Matrices

Imagine a 2D vector $(x_1, x_2)$. In the complex plane, this is $x_1 + i x_2$.

To rotate this vector by an angle $\theta$, we multiply it by $e^{i\theta}$.

### How RoPE Works

RoPE applies this to the high-dimensional embedding vector by **chopping it into chunks of 2**.

If $d_{model} = 512$, we treat it as **256 pairs** of coordinates.

### The Rotation Formula

For a token at position $m$, we rotate the pair $(x_1, x_2)$ by an angle $m \theta$:

$$\begin{pmatrix} x'_1 \\ x'_2 \end{pmatrix} = \begin{pmatrix} \cos(m\theta) & -\sin(m\theta) \\ \sin(m\theta) & \cos(m\theta) \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}$$

### The Magic Property ✨

If you take the dot product of a **Query** rotated by $m\theta$ and a **Key** rotated by $n\theta$, the math simplifies beautifully:

$$\text{Score} = \text{OriginalScore} \times \cos((m - n)\theta)$$

**The absolute positions $m$ and $n$ disappear!** Only the relative distance $(m - n)$ remains.