# Visualizing Attention in Huggingface Transformers

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/labmlai/inspectus/blob/main/notebooks/gpt2.ipynb)

This Jupyter notebook demonstrates how to use the Inspectus library with Huggingface Transformers. In this case, we will be focusing on the GPT-2 model.

In [1]:
!pip install -qqq inspectus
!pip install -qqq transformers

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/117.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m117.2/117.2 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/731.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m727.0/731.2 kB[0m [31m25.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m731.2/731.2 kB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.6/94.6 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m225.1/225.1 kB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
from transformers import AutoTokenizer, AutoConfig, GPT2LMHeadModel
import inspectus
import torch

Following cell sets up a GPT-2 model from Huggingface's Transformers library. It initializes the tokenizer and model configuration, then creates the GPT-2 model.

In [3]:
context_length = 128
tokenizer = AutoTokenizer.from_pretrained("huggingface-course/code-search-net-tokenizer")

config = AutoConfig.from_pretrained(
    "gpt2",
    vocab_size=len(tokenizer),
    n_ctx=context_length,
    bos_token_id=tokenizer.bos_token_id,
    eos_token_id=tokenizer.eos_token_id,
)

model = GPT2LMHeadModel(config)
model_size = sum(t.numel() for t in model.parameters())
print(f"GPT-2 size: {model_size/1000**2:.1f}M parameters")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/265 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/789k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/448k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

GPT-2 size: 124.2M parameters


This cell takes a sentence and tokenizes it using the previously initialized tokenizer. The tokenized output is then used to create input IDs for the model and a list of tokens. It uses the 'offset_mapping' attribute returned by the tokenizer to slice the original text into individual tokens.

In [4]:
text= 'The quick brown fox jumps over the lazy dog'
tokenized = tokenizer(
    text,
    return_tensors='pt',
    return_offsets_mapping=True
)
input_ids = tokenized['input_ids']

tokens = [text[s: e] for s, e in tokenized['offset_mapping'][0]]

In [5]:
with torch.no_grad():
    res = model(input_ids=input_ids.to(model.device), output_attentions=True)



The `attention` function from the Inspectus library is used to visualize the attention weights. The function takes the attention weights from the model output and the list of tokens as input. The `chart_types` parameter specifies the types of visualizations to be generated. The `color` parameter is used to specify the color palette for the visualizations.

In [6]:
inspectus.attention(res['attentions'], tokens,
          chart_types=['attention_matrix', 'query_token_heatmap', 'key_token_heatmap', 'dimension_heatmap', 'token_dim_heatmap', 'line_grid']
             ,color={
                 'query_token_heatmap': 'orange',
                 'key_token_heatmap': 'green',
                 'token_dim_heatmap': 'red',
             }
         )

# **Visualizing Custom Defined Attentions**

The attention matrix is essentially a 2D matrix (query_tokens, key_tokens). However, in scenarios where multiple layers and heads are involved, the dimensionality can increase to 3D or 4D, represented as (layer, query, key) or (layer, head, query, key) respectively.

In [7]:
import numpy as np
import inspectus

In [8]:
# single attention matrix

arr = np.random.rand(3, 5)

inspectus.attention(arr, ['a', 'b', 'c'], [f'{i}' for i in range(5)])

In [9]:
# Multiple layers

arr = np.random.rand(2, 3, 5)

inspectus.attention(arr, ['a', 'b', 'c'], [f'{i}' for i in range(5)])

In [10]:
# Multiple layers and heads

arr = np.random.rand(2, 2, 3, 5)

inspectus.attention(arr, ['a', 'b', 'c'], [f'{i}' for i in range(5)])

# **Visualizing Tokens**

This demonstrates how to use the Inspectus library to visualize tokens and any related data.

This cell takes a sentence and tokenizes it using the previously initialized tokenizer. The tokenized output is then used to create input IDs for the model and a list of tokens. It uses the 'offset_mapping' attribute returned by the tokenizer to slice the original text into individual tokens.

In [11]:
text= 'Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source. Lorem Ipsum comes from sections 1.10.32 and 1.10.33 of "de Finibus Bonorum et Malorum" (The Extremes of Good and Evil) by Cicero, written in 45 BC. This book is a treatise on the theory of ethics, very popular during the Renaissance. The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32. The standard chunk of Lorem Ipsum used since the 1500s is reproduced below for those interested. Sections 1.10.32 and 1.10.33 from "de Finibus Bonorum et Malorum" by Cicero are also reproduced in their exact original form, accompanied by English versions from the 1914 translation by H. Rackham.'
tokenized = tokenizer(
    text,
    return_tensors='pt',
    return_offsets_mapping=True
)
input_ids = tokenized['input_ids']

tokens = [text[s: e] for s, e in tokenized['offset_mapping'][0]]

In [12]:
with torch.no_grad():
    res = model(input_ids=input_ids.to(model.device), output_attentions=True)

Following cell gets the top 5 predictions for each token in the given text along with the loss.

In [14]:
from torch.nn.functional import nll_loss

logits = res['logits']
losses = []
entropies = []
token_info = []
for i in range(len(tokens)):
    loss = nll_loss(logits[0, i], input_ids[0, i])
    losses.append(loss.item())

    entropy = -torch.sum(torch.nn.functional.softmax(logits[0, i]) * torch.nn.functional.log_softmax(logits[0, i]))
    entropies.append(entropy.item())

    pred_token_indices = torch.argsort(logits[0, i])[:5]
    pred_tokens = [tokenizer.decode([idx]) for idx in pred_token_indices]
    token_info.append(f"pred 1: {pred_tokens[0]}\npred 2: {pred_tokens[1]}\npred 3: {pred_tokens[2]}\npred 4: {pred_tokens[3]}\npred 5: {pred_tokens[4]}")

  entropy = -torch.sum(torch.nn.functional.softmax(logits[0, i]) * torch.nn.functional.log_softmax(logits[0, i]))
  entropy = -torch.sum(torch.nn.functional.softmax(logits[0, i]) * torch.nn.functional.log_softmax(logits[0, i]))


The inspectus.tokens function visualizes tokens using their losses. Hover on to a token to view aditional info.

In [15]:
inspectus.tokens(tokens, {"loss": losses, "entropy": entropies}, token_info=token_info, theme="light")