In this example, we start by loading a pre-trained tokenizer and model from the Transformers library. We then define a sentence and tokenize it using the tokenizer. The resulting inputs object is a PyTorch tensor that can be passed through the model.

We pass the inputs through the model to get the hidden states for each token in the sentence. We then get the last hidden state and define a query vector as before.

We calculate attention weights and apply attention as before, resulting in a set of attended inputs. However, since the input sequence is now a sentence, the attended inputs are also sequences of tokens.

To convert the attended inputs back to text, we use the tokenizer's convert_ids_to_tokens() method to convert the token IDs back to their corresponding text representations. The resulting attended_tokens variable contains the tokens in the sentence that the attention mechanism deemed most important for the task at hand.

In [None]:
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

# Load a pre-trained tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('bert-base-uncased')

# Define a sentence
sentence = "The quick brown fox jumps over the lazy dog."

# Tokenize the sentence
inputs = tokenizer(sentence, return_tensors='pt')

# Pass the inputs through the model to get hidden states
outputs = model(**inputs)

# Get the last hidden state and apply attention
last_hidden_state = outputs.last_hidden_state
query = torch.randn(1, last_hidden_state.shape[-1])
attention_weights = F.softmax(torch.bmm(last_hidden_state, query.unsqueeze(2)).squeeze(2), dim=1)
attended_inputs = torch.bmm(last_hidden_state.transpose(1, 2), attention_weights.unsqueeze(2)).squeeze(2)

# Convert attended inputs back to text
attended_tokens = tokenizer.convert_ids_to_tokens(attended_inputs.argmax(dim=-1).tolist()[0])

print('Input sentence:', sentence)
print('Attended tokens:', attended_tokens)