# Transformer数学interpretation



改编自[The Random Transformer](https://osanseviero.github.io/hackerllama/blog/posts/random_transformer/#the-random-encoder-decoder-transformer)

### 词汇embedding

Hello -> [1,2,3,4]

World -> [2,3,4,5]

输入矩阵
$$
E = \begin{bmatrix}
1 & 2 & 3 & 4 \\
2 & 3 & 4 & 5
\end{bmatrix}
$$


### 位置编码


$$
\begin{gathered}
PE(\text{pos}, 2i)=\sin \left(\frac{pos}{10000^{2i / d_{\mathrm{model}}}}\right) \\
PE(\text{pos}, 2i+1)=\cos \left(\frac{pos}{10000^{2i / d_{\mathrm{model}}}}\right)
\end{gathered}
$$

“Hello” -> [0, 1, 0, 1]

“World” -> [0.84, 0.99, 0, 1]

$$
E = \begin{bmatrix}
1 & 3 & 3 & 5 \\
2.84 & 3.99 & 4 & 6
\end{bmatrix}
$$

In [9]:
# define the function of position embeddings
import numpy as np
def pos_embedding(max_len, d_model):
    # max_len: the max length of the sequence
    # d_model: the dimension of the model
    # return the position embeddings
    pos = np.arange(max_len).reshape(max_len, 1)
    i = np.arange(d_model).reshape(1, d_model)
    angle_rates = 1 / np.power(10000, (2 * (i // 2)) / np.float32(d_model))
    angle_rads = pos * angle_rates
    angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])
    angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])
    pos_embedding = angle_rads[np.newaxis, ...]
    return pos_embedding

In [11]:
print(pos_embedding(2,4))
print((2 * (1 // 2)) / np.float32(4))

[[[0.         1.         0.         1.        ]
  [0.84147098 0.54030231 0.00999983 0.99995   ]]]
0.0
