In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import sys

# Use this if running this notebook from within its place in the truera repository.
sys.path.insert(0, "..")

# Or otherwise install trulens.
# !{sys.executable} -m pip install trulens

# Install transformers / huggingface.
!{sys.executable} -m pip install transformers pandas numpy

import os
os.environ['TRULENS_BACKEND']='torch'

from IPython.display import display
import matplotlib.pyplot as plt
import torch
import pandas as pd
import numpy as np
from pathlib import Path
import re

from torch.utils.data import DataLoader
from pandas import Series
from typing import Union

# Lab Week 2: NLP Example Usage and Stability

## Twitter Sentiment Model

[Huggingface](https://huggingface.co/models) offers a variety of pre-trained NLP models to explore. We exemplify in this notebook a [transformer-based twitter sentiment classification model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment). Before getting started, familiarize yourself with the general Truera API as demonstrated in the [intro notebook using pytorch](intro_demo_pytorch.ipynb).

In [None]:
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer

# Wrap all of the necessary components.
class TwitterSentiment:
    MODEL = f"cardiffnlp/twitter-roberta-base-sentiment"

    # device = 'cpu'
    # Can also use cuda if available:
    device = 'cuda:0'

    model = AutoModelForSequenceClassification.from_pretrained(MODEL).to(device)

    tokenizer = AutoTokenizer.from_pretrained(MODEL)
    @staticmethod
    def tokenize(inputs):
        return TwitterSentiment \
            .tokenizer(inputs, padding=True, return_tensors="pt") \
            .to(TwitterSentiment.device)
        # pt refers to pytorch tensor

    labels = ['negative', 'neutral', 'positive']

    NEGATIVE = labels.index('negative')
    NEUTRAL = labels.index('neutral')
    POSITIVE = labels.index('positive')

task = TwitterSentiment()

This model quantifies tweets (or really any text you give it) according to its sentiment: positive, negative, or neutral. Lets try it out on some examples.

In [None]:
sentences = ["I'm so happy!", "I'm so sad!", "I cannot tell whether I should be happy or sad!", "meh"]

# Input sentences need to be tokenized first.

inputs = task.tokenize(sentences)

# The tokenizer gives us vocabulary indexes for each input token (in this case,
# words and some word parts like the "'m" part of "I'm" are tokens).

print(inputs)

# Decode helps inspecting the tokenization produced:

print(task.tokenizer.batch_decode(torch.flatten(inputs['input_ids'])))
# Normally decode would give us a single string for each sentence but we would
# not be able to see some of the non-word tokens there. Flattening first gives
# us a string for each input_id.

## Displaying tokens using *trulens*

The trulens library features some utilities for displaying tokenizations. To use these, we use the `NLP` object as below. We need to configure it with regards to labels and a tokenizer's decode and encode methods.

In [None]:
from trulens.visualizations import NLP

V = NLP(
    labels=task.labels,
    decode=lambda x: task.tokenizer.decode(x),
    tokenize=task.tokenize,
    # huggingface models can take as input the keyword args as per produced by their tokenizers.

    input_accessor=lambda x: x['input_ids'],
    # for huggingface models, input/token ids are under input_ids key in the input dictionary

    hidden_tokens=set([task.tokenizer.pad_token_id])
    # do not display these tokens
)

display(V.tokens(sentences))#, show_id=True))

## Evaluating the model

Evaluating huggingface models is straight-forward if we use the structure produced by the tokenizer.

In [None]:
outputs = task.model(**inputs)

print(outputs)

# From logits we can extract the most likely class for each sentence and its readable label.

predictions = [task.labels[i] for i in outputs.logits.argmax(axis=1)]

for sentence, logits, prediction in zip(sentences, outputs.logits, predictions):
    print(logits.to('cpu').detach().numpy(), prediction, sentence)

Trulens can also help us view these results in a more readable manner. But first we need to tell it about the model by wrappig it in a libray specific container:

In [None]:
from trulens.nn.models import get_model_wrapper

task.wrapper = get_model_wrapper(task.model, input_shape=(None, task.tokenizer.model_max_length), device=task.device)

We also need to indicate how to retrieve the logits from the output of the model as in the `output_accessor` parameter below:

In [None]:
V = NLP(
    wrapper=task.wrapper,
    labels=task.labels,
    decode=lambda x: task.tokenizer.decode(x),
    tokenize=task.tokenize,
    # huggingface models can take as input the keyword args as per produced by their tokenizers.

    input_accessor=lambda x: x['input_ids'],
    # for huggingface models, input/token ids are under input_ids key in the input dictionary

    output_accessor=lambda x: x['logits'],
    # and logits under 'logits' key in the output dictionary

    hidden_tokens=set([task.tokenizer.pad_token_id])
    # do not display these tokens
)

display(V.tokens(sentences, show_id=True))

## Exploring real-world tweets

Lets try out the sentiment model on some real-world tweets. We first load it the CSV file to a pandas `DataFrame`.

In [None]:
tweets = pd.read_csv(
    Path("resources") / "training.1600000.processed.noemoticon.csv",
    encoding='ISO-8859-1',
    header=None,
    names=["polarity", "id", "timestamp", "query", "user", "text"]
)
tweets

Lets take a look at the model's predictions on some of these tweets. Note that emojis were stripped from the dataset. These missing tokens are shown as �.

In [None]:
some_tweets = list(tweets['text'][0:10])

display(V.tokens(some_tweets))

Lets explore tweets that mention baseball teams. Below we have some utilities that find tweets that contain team names and create team-less versions of them where the team name is replaced with `:team:`. We can use such tweets to investigate the sensitivity of the sentiment model towards particular teams.

In [None]:
def to_team(team: str):
    """Replaces all instances of ':team:' with the given `team` in the given list of tweets."""

    def f(tweets: Union[Series, np.ndarray]):
        if isinstance(tweets, pd.Series):
            return tweets.map(subst(":team:", team))
        if isinstance(tweets, np.ndarray):
            return np.vectorize(subst(":team:", team))(tweets)
        raise ValueError("I don't know")
    return f

def word_pattern(word):
    """Create a pattern that matches the given `word` as long as it is not immediately next to an alpha-numeric character."""
    return "(?<!\w)" + re.escape(word) + "(?!\w)"

def subst(thing_from: str, thing_to: str):
    pat = re.compile(word_pattern(thing_from), re.IGNORECASE)
    def f(context: str):
        return pat.sub(thing_to, context)
    return f

def extract_teams(teams):
    """Create a method that extracts tweets that contain mentions of any of the terms in the given `teams`."""

    pattern = '|'.join(map(word_pattern, teams))
    reg = re.compile(pattern, re.IGNORECASE)

    def f(tweets: Series):
        indices = tweets.str.contains(reg)
        ret = tweets[indices]

        for team in teams:
            ret = ret.map(subst(team, ":team:"))

        return ret

    return f

This cell may take a minute to run. 

In [None]:
team_tweets = extract_teams([
    "diamondbacks", "braves", "orioles", "redsox", "red sox", "cubs", "whitesox", "white sox", "reds", 
    "guardians", "rockies", "tigers", "astros", "royals", "dodgers", "marlins", "brewers", "twins", 
    "mets", "yankees", "athletics", "phillies", "pirates", "padres", "giants", "mariners", "cardinals", 
    "rays", "rangers", "jays", "nationals"
    ]# ['redsox', 'red sox', 'yankees'],
    )(tweets['text'])

print(f"found {len(team_tweets)} team tweets")

Lets now focus on two particular teams and compare the sentiment model on them. We start by creating versions of the team tweets with a particular team filled in to where the `:team:` marker was. We can visualize pairs of such tweets that differ in team via another trulens utility.

In [None]:
teams = ['redsox', 'yankees']
tweets_for_team = {team: to_team(team)(team_tweets) for team in teams}

display(V.tokens_stability(
    texts1=list(tweets_for_team[teams[0]][0:10]),
    texts2=list(tweets_for_team[teams[1]][0:10])
))

Lets inspect the distribution of logits accross tweets of the two teams.

In [None]:
# First a method to help us evaluate the model on a large collection of instances.
def eval_batched(data: Series, batch_size=16):
    """Evaluate the model `task.model` on given `data` tokenized by `task.tokenizer` in a set of batches. Return the logits."""
    sentences = DataLoader(data.to_numpy(), batch_size=batch_size)

    all_logits = []

    for batch in sentences:
        tokens = task.tokenizer(batch, padding=True, return_tensors='pt').to(task.device)
        logits = task.model(**tokens)['logits'].detach().to('cpu')
        del tokens
        all_logits += logits

    return np.stack(list(map(torch.Tensor.numpy, all_logits)))

# Then get the logits for each team variant's tweets.
logits_for_team = {team: eval_batched(tweets) for team, tweets in tweets_for_team.items()}

In [None]:
amin = min(logits.min() for logits in logits_for_team.values())
amax = max(logits.max() for logits in logits_for_team.values())

colors = {teams[0]: 'red', teams[1]: 'blue'}

# Create a figure showing the histogram of logits for each of the three classes for all of the teams in `teams`.

fig, axs = plt.subplots(3,1, figsize=(10,10))
for idx, label in zip([task.NEGATIVE,task.NEUTRAL,task.POSITIVE], task.labels):
    for team, logits in logits_for_team.items():
        axs[idx].hist(logits[:, idx], bins=10, alpha=0.25, label=f"{team} {label}", color=colors[team], range=(amin, amax))
    axs[idx].legend()

Are there any individual tweets that have particularly disparate logits?

In [None]:
# Get the index of tweets sorted by absolute difference in logits across teams.
sort_idx = np.argsort(abs(logits_for_team[teams[0]] - logits_for_team[teams[1]]).sum(axis=1))[::-1]

In [None]:
team_tweets_np = team_tweets.to_numpy()

V.tokens_stability(
    texts1 = list(tweets_for_team[teams[0]].to_numpy()[sort_idx[0:10]]),
    texts2 = list(tweets_for_team[teams[1]].to_numpy()[sort_idx[0:10]])
)

# Lab Week 3: NLP Attribution

Evaluating huggingface models is straight-forward if we use the structure produced by the tokenizer.

In [None]:
from trulens.nn.quantities import ClassQoI
from trulens.nn.attribution import IntegratedGradients, InputAttribution
from trulens.nn.attribution import Cut, OutputCut
from trulens.nn.distributions import GaussianDoi

## Attributions

Applying integrated gradents to the sentiment model is similar as in the prior notebooks except special considerations need to be made for the cuts used as the targets of the attribution (i.e. what do we want to assign importance to). As you may have noted above, the model takes as input integer indexes associated with tokens. As we cannot take gradient with respect to these, we use an alternative: the embedding representation of those same inputs. To instantiate trulens with this regard, we need to find inspect the layer names inside our model:

### Parameters

Above, `roberta_embeddings_word_embeddings` is the layer that produces a continuous representation of each input token so we will use that layer as the one defining the **distribution of interest**. While most neural NLP models contain a token embedding, the layer name will differ.

The second thing to note is the form of model outputs. Specifically, outputs are structures which contain a 'logits' attribute that stores the model scores.

Putting these things together, we instantiate `IntegratedGradients` to attribute each embedding dimension to the maximum class (i.e. the predicted class).

In [None]:
# Alternatively we can look at a particular class:

infl_positive = IntegratedGradients(
    model = task.wrapper,
    resolution=10,
    doi_cut=Cut('roberta_embeddings_word_embeddings'),
    qoi=ClassQoI(task.POSITIVE),
    qoi_cut=OutputCut(accessor=lambda o: o['logits'])
)

"""
# Alternatively we can look at a particular class:
infl_positive = InputAttribution(
    model = task.wrapper,
    doi='point',
    # doi=GaussianDoi(var=0.001, resolution=10, cut=Cut('roberta_embeddings_word_embeddings')),
    doi_cut=Cut('roberta_embeddings_word_embeddings'),
    qoi=ClassQoI(task.POSITIVE),
    qoi_cut=OutputCut(accessor=lambda o: o['logits'])
)
"""

Getting attributions uses the same call as model evaluation.

A listing as above is not very readable so Trulens comes with some utilities to present token influences a bit more concisely. First we need to set up a few parameters to make use of it:

In [None]:
from trulens.visualizations import NLP

V = NLP(
    wrapper=task.wrapper,
    labels=task.labels,
    decode=lambda x: task.tokenizer.decode(x),
    tokenize=task.tokenize,
    # huggingface models can take as input the keyword args as per produced by their tokenizers.

    input_accessor=lambda x: x['input_ids'],
    # for huggingface models, input/token ids are under input_ids key in the input dictionary

    output_accessor=lambda x: x['logits'],
    # and logits under 'logits' key in the output dictionary

    hidden_tokens=set([task.tokenizer.pad_token_id])
    # do not display these tokens
)

display(
    V.tokens(list(tweets_for_team[teams[0]][0:10]), attributor=infl_positive)
)

## Baselines

We see in the above results that special tokens such as the sentence end **&lt;/s&gt;** contributes are found to contribute a lot to the model outputs. While this may be useful in some contexts, we are more interested in the contributions of the actual words in these sentences. To focus on the words more, we need to adjust the **baseline** used in the integrated gradients computation. By default in the instantiation so far, the baseline for each token is a zero vector of the same shape as its embedding. By making the basaeline be identicaly to the explained instances on special tokens, we can rid their impact from our measurement. Trulens provides a utility for this purpose in terms of `token_baseline` which constructs for you the methods to compute the appropriate baseline. 

In [None]:
from trulens.utils.nlp import token_baseline

inputs_baseline_ids, inputs_baseline_embeddings = token_baseline(
    keep_tokens=set([task.tokenizer.cls_token_id, task.tokenizer.sep_token_id]),
    # Which tokens to preserve.

    replacement_token=task.tokenizer.pad_token_id,
    # What to replace tokens with.

    input_accessor=lambda x: x.kwargs['input_ids'],

    ids_to_embeddings=task.model.get_input_embeddings()
    # Callable to produce embeddings from token ids.
)

We can now inspect the baselines on some example sentences. The first method returned by `token_baseline` gives us token ids to inspect while the second gives us the embeddings of the baseline which we will pass to the attributions method.

In [None]:
infl_positive_baseline = IntegratedGradients(
    model = task.wrapper,
    resolution=50,
    baseline = inputs_baseline_embeddings,
    doi_cut=Cut('roberta_embeddings_word_embeddings'),
    qoi=ClassQoI(task.POSITIVE),
    qoi_cut=OutputCut(accessor=lambda o: o['logits'])
)

print("QOI = POSITIVE WITH BASELINE")
V.tokens(list(tweets_for_team[teams[0]][0:10]), attributor=infl_positive_baseline)

As we see, the baseline eliminated the measurement of contribution of the special tokens.