# Trigrams in Bilinear Transformers


In this document, we will analyze what transformer do in their first attention layer. Which is closely related with trigrams

---

## Setup

We will take a very naive approach; perform the diagonal approach where the same token is inserted into the value component of a head (virtual) and one that follows the residual stream (direct). Given some background knowledge, the setup is actually quite simple; we study the following matrix. I won't go into depth on this math here.

$$ U_{oq} P_{qa} (E_{jb} B_{aij} O_{il} V_{lk} E_{kb}) $$

## Usefulness

While the results are quite striking, I'm unsure of the generality of this technique. This is only useful to study same-token heads, which actually often occurs in the first layer but should be taken with a pinch of salt.

## Code

In [9]:
from shared.transformer import Transformer, Config
import plotly.express as px
from shared.tensors import *
import torch
import pandas as pd
from einops import *

In [10]:
torch.set_grad_enabled(False)
pd.set_option('display.max_columns', None)

name = "tdooms/TinyStories-2-256"
config = Config.from_pretrained(name)
model = Transformer.from_pretrained(name, config=config).cuda()
vocab = model.vocab

_ = model.center_unembed().fold_norms()

In [11]:
def attention_features(layer, head, mlp=0):
    diag = einsum(
        model.w_e, model.w_e, model.ov[layer, head], model.w_l[mlp], model.w_r[mlp], model.w_p[mlp], model.w_u,
        "emb1 i, emb2 i, ov emb1, hid ov, hid emb2, res hid, out res -> out i"
    ).cpu()
    
    o, s, q = torch.svd(diag)
    # px.line(s[:64].cpu()).show()
    
    df = vocab.get_max_activations(diag.T, ["virtual", "direct"], 10)

    for i in range(0, 10):
        tops = (o[:, i:i+1] @ torch.diag(s[i:i+1]) @ q.T[i:i+1])
        df = df.join(vocab.get_max_activations(tops.T, [f"virtual_{i}", f"direct_{i}"], 10, val_name=f"value_{i}"))

    return df

## Results
The results are large pandas dataframes. They should be read as (input, output, value) triplets, where the input is a previous token that this specific attention head may look at. The value is simple how strong the connection is, there is no clear explanation for this value except higher is better.

The first triplet is an analysis on the whole diagonal, the following ones are the svd components.

Let's first look at attention head 0.0

In [12]:
attention_features(layer=0, head=0).round(2)

Unnamed: 0,virtual,direct,value,virtual_0,direct_0,value_0,virtual_1,direct_1,value_1,virtual_2,direct_2,value_2,virtual_3,direct_3,value_3,virtual_4,direct_4,value_4,virtual_5,direct_5,value_5,virtual_6,direct_6,value_6,virtual_7,direct_7,value_7,virtual_8,direct_8,value_8,virtual_9,direct_9,value_9
0,",",##id,0.55,",",##gether,0.21,blanket,them,0.12,no,move,0.1,not,held,0.2,sara,##re,0.13,daughter,herself,0.09,not,sandwiches,0.09,##gged,##r,0.1,not,closely,0.16,",",##chy,0.09
1,",",##ve,0.54,",",mim,0.21,stones,them,0.12,no,faces,0.1,not,enjoyed,0.19,lila,##re,0.12,girl,herself,0.09,"""",sandwiches,0.09,##pping,##r,0.1,not,graceful,0.16,",",##t,0.08
2,",",##pt,0.48,",",ima,0.21,leaf,them,0.12,no,smile,0.1,not,remembered,0.19,sally,##re,0.12,no,##ving,0.09,said,sandwiches,0.09,##gged,##ic,0.09,not,alive,0.15,",",##ched,0.07
3,",",##ved,0.45,",",goodby,0.21,petals,them,0.12,no,bark,0.09,not,admired,0.19,lisa,##re,0.12,daughter,across,0.08,t,sandwiches,0.08,##pping,##ic,0.09,not,nicely,0.15,",",##ten,0.07
4,",",##ward,0.44,",",ey,0.21,needle,them,0.12,no,run,0.09,not,showed,0.19,anna,##re,0.12,girl,across,0.08,not,everywhere,0.08,##gged,##il,0.09,not,brightly,0.15,amaz,##chy,0.07
5,",",##ad,0.44,",",##issed,0.2,balls,them,0.11,no,laugh,0.09,not,followed,0.19,emily,##re,0.12,no,##ved,0.08,"""",everywhere,0.08,##gged,##ment,0.09,not,cheerful,0.14,",",##ged,0.07
6,a,##ers,0.44,",",ingred,0.2,branches,them,0.11,no,open,0.09,t,held,0.19,amy,##re,0.11,daughter,between,0.08,said,everywhere,0.08,##pping,##il,0.09,not,##uit,0.14,",",##mp,0.07
7,",",##ord,0.44,",",mov,0.2,not,them,0.11,no,spin,0.09,t,enjoyed,0.18,jane,##re,0.11,girl,between,0.08,t,everywhere,0.07,a,##r,0.09,not,colourful,0.14,",",##red,0.07
8,a,##ds,0.42,",",##adow,0.2,blanket,him,0.11,no,pick,0.09,t,remembered,0.18,lily,##re,0.11,daughter,everywhere,0.08,not,ducks,0.07,##pping,##ment,0.09,not,peaceful,0.13,",",##med,0.07
9,she,she,0.42,",",#,0.2,stones,him,0.11,no,hair,0.09,not,##ied,0.18,alice,##re,0.11,girl,everywhere,0.08,not,##ped,0.07,##gged,##or,0.09,not,attractive,0.13,upset,##chy,0.06


At first, this may seem a bit random. There is a lot of structure but what is it?

Looking at this from a trigram perspective makes a lot of sense. For instance, in component 2, it's easy to see that ``no (happy) faces`` or ``no (loud) bark`` make sense.

For the last component, maybe all ``girls (sno) re``.

Exactly determining these trigrams won't be covered here.

In [13]:
attention_features(layer=0, head=1)

Unnamed: 0,virtual,direct,value,virtual_0,direct_0,value_0,virtual_1,direct_1,value_1,virtual_2,direct_2,value_2,virtual_3,direct_3,value_3,virtual_4,direct_4,value_4,virtual_5,direct_5,value_5,virtual_6,direct_6,value_6,virtual_7,direct_7,value_7,virtual_8,direct_8,value_8,virtual_9,direct_9,value_9
0,mom,##ng,0.424141,res,=,0.146172,t,happens,0.091086,disgusting,clapped,0.093361,prun,##ack,0.073478,not,feed,0.203248,couldn,##t,0.074654,mom,##ed,0.146977,says,##ored,0.076489,i,anywhere,0.08643,"""",##ped,0.116126
1,mum,##ng,0.417375,res,wra,0.145447,t,until,0.088174,dangerous,clapped,0.088561,decor,##ack,0.073256,t,feed,0.170312,couldn,##s,0.073416,mom,##ped,0.134165,came,##cing,0.075401,i,##dom,0.078146,"""",##pl,0.104576
2,not,##robe,0.362936,res,ingred,0.143327,t,either,0.085537,not,clapped,0.087013,prun,##ment,0.066062,not,swim,0.161224,restaurant,##t,0.07286,mum,##ed,0.133942,mom,##cing,0.069667,i,along,0.078055,"""",wide,0.102012
3,mother,##ng,0.355985,res,diff,0.143183,t,##ses,0.084473,filthy,clapped,0.085358,decor,##ment,0.065862,not,read,0.156343,perf,##t,0.071764,mother,##ed,0.13364,came,##her,0.068546,i,tomorrow,0.074564,"""",asleep,0.09899
4,"""",##ened,0.347304,res,ima,0.142785,t,enough,0.082105,t,clapped,0.085137,comb,##ack,0.064812,not,##ture,0.156282,restaurant,##s,0.071653,mom,friends,0.124783,says,##less,0.066723,i,##oring,0.073217,"""",##ined,0.093748
5,mommy,##ng,0.344628,res,mim,0.14278,t,##corn,0.078062,smelly,clapped,0.085002,prun,##ort,0.064177,not,explore,0.154025,perf,##s,0.070575,mum,##ped,0.122266,appeared,##cing,0.06562,i,anymore,0.072583,"""",##ked,0.093328
6,mom,##ope,0.331457,res,avail,0.142003,t,imp,0.078033,disgusting,remembered,0.084827,decor,##ort,0.063983,not,sail,0.153505,couldn,##es,0.069961,mother,##ped,0.12199,came,asking,0.065078,i,everywhere,0.071709,"""",cozy,0.084636
7,mummy,##ng,0.32866,res,##ses,0.141648,not,happens,0.078024,broken,clapped,0.084335,ornament,##ack,0.06371,not,bounce,0.153389,not,morning,0.069339,mommy,##ed,0.121461,flexible,##cing,0.063782,if,anywhere,0.070214,"""",fast,0.083298
8,not,##ent,0.323354,res,mov,0.140648,not,until,0.075529,sick,clapped,0.083226,lotion,##ack,0.063451,not,hop,0.151896,mom,##t,0.069298,mom,##ged,0.118614,mom,##her,0.063334,i,##ling,0.069477,"""",blowing,0.08124
9,"""",##per,0.311646,res,#,0.1403,t,fast,0.075452,strange,clapped,0.082797,prun,##ured,0.063117,not,pay,0.150394,restaurant,##es,0.06828,mum,friends,0.113716,came,trying,0.062833,i,##s,0.068683,"""",##ged,0.081045


This head is slightly strange, first of all, I like to call this "the mommy head". But when looking closer, it does a lot more.

The first two components, I don't really understand, why are ``res`` and ``t`` useful? They don't compose in rest because: first, ``rest`` itself is a token, second, otherwise it should be ``##res``. The 3rd component (index 2) is something about negative adjectives. The fifth is about negation, the sixth is maybe about trigrams of ``couldn (') t`` (accents are tokenized separately, always). The seventh is obviously the mommy one.

Anyway, from some further analysis (not shown), this seems to be a head that has triplets closely related to verbs and subjects.

In [14]:
attention_features(layer=0, head=2)

Unnamed: 0,virtual,direct,value,virtual_0,direct_0,value_0,virtual_1,direct_1,value_1,virtual_2,direct_2,value_2,virtual_3,direct_3,value_3,virtual_4,direct_4,value_4,virtual_5,direct_5,value_5,virtual_6,direct_6,value_6,virtual_7,direct_7,value_7,virtual_8,direct_8,value_8,virtual_9,direct_9,value_9
0,mom,##ophone,0.391306,flea,##ughter,0.293325,selfish,##d,0.12446,man,##ked,0.129392,ostrich,##ned,0.102307,troubled,ways,0.10057,such,##ight,0.077226,mom,##ow,0.174451,mom,##eath,0.107691,favourite,##fe,0.078223,nervous,##ys,0.068338
1,his,##ughter,0.384932,flea,##ople,0.267231,flexible,##d,0.122591,boy,##ked,0.119707,deer,##ned,0.095576,the,##oun,0.094146,he,vet,0.076098,mom,##ish,0.160988,mum,##eath,0.106417,favourite,##es,0.072178,nervous,##es,0.068299
2,mom,##ict,0.380579,jellyfish,##ughter,0.260898,miserable,##d,0.120109,man,##ts,0.119243,zebra,##ned,0.095523,the,##id,0.093686,he,pillow,0.07572,dad,##ow,0.145166,mom,##pl,0.095134,he,##fe,0.068429,reliable,##ys,0.06523
3,cricket,##ughter,0.380409,mosquito,##ughter,0.260285,helpless,##d,0.118525,boy,##ts,0.110318,dolphin,##ned,0.094954,the,##dd,0.093404,he,tra,0.073519,dad,##ish,0.133964,mum,##pl,0.094009,favourite,##ject,0.068428,reliable,##es,0.065192
4,mum,##ower,0.373173,cricket,##ughter,0.245844,selfish,##ten,0.115711,mum,##ked,0.109657,kangaroo,##ned,0.09458,troubled,times,0.092342,upon,##ight,0.070813,mom,##ent,0.133783,mom,lying,0.093303,favourite,##ha,0.068156,nervous,##akes,0.063651
5,mum,##eath,0.364686,flea,##q,0.245204,foolish,##d,0.115166,man,##aled,0.106704,lizard,##ned,0.093939,mum,##oun,0.089897,he,##let,0.069721,mommy,##ow,0.131191,mom,asleep,0.093123,passport,##fe,0.06725,frightened,##ys,0.062762
6,flea,##ughter,0.362513,flea,##os,0.243355,ugly,##d,0.114649,man,felt,0.106504,wolf,##ned,0.093054,mum,##id,0.089458,his,vet,0.068547,mom,##her,0.122024,mum,lying,0.0922,favourite,##re,0.064691,frightened,##es,0.062726
7,mom,forth,0.358202,snake,##ughter,0.24193,secret,##d,0.114101,man,herself,0.105846,chicken,##ned,0.093008,mum,##dd,0.089189,returned,##ight,0.068502,mum,##ow,0.121797,parents,##eath,0.092118,enough,##fe,0.063533,dependable,##ys,0.062601
8,mom,##eath,0.355956,lizard,##ughter,0.23804,flexible,##ten,0.113973,father,##ked,0.10562,alligator,##ned,0.092995,smelly,ways,0.088346,he,lun,0.068482,mommy,##ish,0.121066,mum,asleep,0.092022,favourite,##er,0.063386,dependable,##es,0.062566
9,mom,##ets,0.354445,jellyfish,##ople,0.237689,selfish,##led,0.113862,man,ellie,0.103833,goose,##ned,0.092684,unhappy,ways,0.088257,he,cushion,0.068304,mom,##ets,0.119336,mom,##board,0.090256,he,##es,0.063141,nervous,##ight,0.062552


I won't go too in-depth from now on, however, this has a funny animal component for some reason. Not sure what ``##ned`` is though.

In [15]:
attention_features(layer=0, head=3)

Unnamed: 0,virtual,direct,value,virtual_0,direct_0,value_0,virtual_1,direct_1,value_1,virtual_2,direct_2,value_2,virtual_3,direct_3,value_3,virtual_4,direct_4,value_4,virtual_5,direct_5,value_5,virtual_6,direct_6,value_6,virtual_7,direct_7,value_7,virtual_8,direct_8,value_8,virtual_9,direct_9,value_9
0,"""",##dge,0.655147,clum,##ect,0.147636,bald,ways,0.230559,the,##ied,0.207824,the,##led,0.15106,cannot,drop,0.091068,extra,late,0.09234,"""",##qu,0.113713,at,##ging,0.149982,dug,corner,0.092314,talk,##es,0.102957
1,"""",##ey,0.631389,clum,mim,0.143648,charming,ways,0.211134,the,##s,0.175326,the,##ged,0.139498,cannot,##ts,0.089,such,late,0.088901,"""",##chy,0.110949,at,##ing,0.140404,dug,##ing,0.090643,his,##es,0.092482
2,"""",sorry,0.581847,clum,#,0.143358,bald,thing,0.196593,his,##ied,0.166459,never,walk,0.126979,cannot,##iest,0.087035,extra,interesting,0.087161,"""",##um,0.108085,for,##ging,0.131071,dug,cont,0.085637,"""",##es,0.090828
3,"""",##red,0.562648,clum,ima,0.140989,bald,way,0.190088,the,##ick,0.166124,never,laugh,0.120886,cannot,bite,0.084789,extra,difficult,0.086387,"""",##t,0.102591,for,##ing,0.122701,dug,##ered,0.079909,talk,understand,0.090032
4,"""",oh,0.555328,clum,norm,0.137459,ugly,ways,0.189934,skirt,##ied,0.16358,the,##les,0.120277,cannot,##ow,0.083553,more,late,0.085436,unknown,##qu,0.101974,at,##p,0.12203,dug,##oun,0.078375,your,##es,0.08676
5,"""",thank,0.541446,clum,##leep,0.136551,chubby,ways,0.189665,the,##pl,0.157101,the,##ed,0.118006,cannot,crawl,0.081012,such,interesting,0.083915,unknown,##chy,0.099495,at,##pl,0.117363,dug,stop,0.077042,talk,##dge,0.086362
6,"""",i,0.540035,clum,pige,0.13647,smelly,ways,0.189312,the,##cing,0.156705,the,##gs,0.114753,cannot,break,0.080956,extra,wrong,0.083382,"""",nice,0.099238,at,##able,0.115394,dug,dir,0.075054,my,##es,0.082109
7,"""",bottom,0.521694,clum,##que,0.136018,filthy,ways,0.18691,sc,##ied,0.156549,never,step,0.114242,cannot,s,0.079957,such,difficult,0.08317,unknown,##um,0.096927,took,##ging,0.113301,dug,##ier,0.074235,his,understand,0.080871
8,"""",balance,0.519567,clum,laund,0.135909,bald,new,0.185444,"""",##ied,0.156129,the,##n,0.113159,anna,drop,0.079645,extra,harsh,0.083108,"""",##ss,0.092866,make,##ging,0.110761,dug,##our,0.072681,"""",understand,0.079425
9,"""",##ughter,0.512992,clum,envel,0.135835,modern,ways,0.183166,the,##ing,0.153645,never,eat,0.112247,app,watched,0.078716,extra,since,0.082479,"""",##ri,0.092668,at,##board,0.108261,dug,secret,0.071383,the,##es,0.078556


Again, some cool structure, somewhat related to quotes it seems.