# Interpretability/explainability

Often interchangeably used with explainability (although there is a subtle difference between the two). It refers to interpreting decisions made by a machine learning model (or, explaining what part of the input was responsible for the prediction of the model). 

In the example here, we will provide explanations for the decisions made by the Logistic Regression classifier. We will: 

(1) train a logistic regression classifier to classify Amazon reviews as positive or negative

(2) look at weights the classifier assigned to invididual tokens

(3) highlight the words so to indicate to which prediction decision they contributed


### Data loading and preprocessing: text
We will be loading a corpus of Amazon reviews **labeled** for sentiment (positive or negative)


In [2]:
# importing the Python's Pandas library for data loading and manipulation
import pandas as pd

# Step #1: loading our annotated reviews
train_data = pd.read_csv('reviews_train.csv', delimiter = '\t') # in our file, the values are actually TAB-separated
eval_data = pd.read_csv('reviews_test.csv', delimiter = '\t')

# let's see what our data actually looks like
train_data

Unnamed: 0,label,score,content
0,NEG,2.0,cons tips extremely easy on carpet and if you...
1,NEG,1.0,"It's a nice look, but it tips over very easil..."
2,NEG,1.0,I have bought and returned three of these uni...
3,NEG,1.0,"I knew these were inexpensive CD cases, but I..."
4,NEG,2.0,"I used a 25 pack of these doing DVD backups, ..."
...,...,...,...
1795,POS,5.0,I just recieved my HDMI cable and am very imp...
1796,POS,5.0,This is the perfect keyboard ( I know cuz I a...
1797,POS,5.0,SanDisk has done it again. They never seem to...
1798,POS,5.0,"Fast shipping, Very happy with the GARMIN. Th..."


### Preprocessing

In [5]:
# let us preprocess (tokenize and lemmatize) the texts
# install spacy with pip or conda, e.g., pip install spacy
import spacy

# wordcloud library displays texts as word clouds, based on word frequency statistics
import wordcloud

# wordcloud has its own list of STOPWORDS
from wordcloud import STOPWORDS

# removing the repetitions if there are any, converting the list to set
stopwords = set(list(STOPWORDS) + ['.', "?", "!", ",", "(", ")", ":", ";", "\"", "'"])
print(stopwords)


{'so', 'they', 'we', 'does', 'him', "hadn't", "where's", 'theirs', 'was', 'else', 'its', 'get', "we'd", "they'll", 'are', 'some', 'each', 'against', 'his', "hasn't", 'then', 'have', "aren't", 'he', 'own', 'has', 'than', "we're", 'www', "i'll", "they've", 'this', 'could', 'out', 'with', 'while', "weren't", 'if', 'but', 'your', "you'll", 'had', '?', 'ours', 'nor', 'be', 'doing', "don't", 'since', 'again', "shouldn't", 'up', 'myself', 'how', "won't", 'which', 'i', 'down', 'itself', 'as', "doesn't", 'more', 'yourself', 'com', 'am', 'therefore', 'would', "wasn't", "i'm", 'when', "when's", 'what', 'under', 'below', ')', 'through', 'why', 'also', "let's", 'it', 'both', 'in', 'once', 'whom', 'yours', 'a', 'an', '(', 'all', "we've", "isn't", 'by', 'and', 'those', 'not', 'ever', 'hence', "i'd", 'should', "you're", 'to', "that's", 'did', "didn't", "they're", 'that', 'shall', 'our', 'the', 'at', "how's", 'just', "who's", 'over', "she'll", 'you', 'k', ':', 'between', 'further', "mustn't", 'is', 'ou

In [7]:
# The model we want to load needs to be first downloaded: 
# in command line: python -m spacy download en_core_web_sm
# load the spacy models for English
nlp = spacy.load("en_core_web_sm")

#apply: do nlp for every row
#lambda: does s.th. to an input, without changing it (platzhalter)
train_data["tokens"] = train_data.content.apply(lambda x: [t.text.lower() for t in nlp(x, disable=["parser", "ner"]) if (t.text.strip() != "" and (t.text.lower() not in stopwords))])
eval_data["tokens"] = eval_data.content.apply(lambda x: [t.text.lower() for t in nlp(x, disable=["parser", "ner"]) if (t.text.strip() != "" and (t.text.lower() not in stopwords))])

train_data


Unnamed: 0,label,score,content,tokens
0,NEG,2.0,cons tips extremely easy on carpet and if you...,"[cons, tips, extremely, easy, carpet, lot, cds..."
1,NEG,1.0,"It's a nice look, but it tips over very easil...","['s, nice, look, tips, easily, steady, rug, su..."
2,NEG,1.0,I have bought and returned three of these uni...,"[bought, returned, three, units, now, one, def..."
3,NEG,1.0,"I knew these were inexpensive CD cases, but I...","[knew, inexpensive, cd, cases, ca, n't, even, ..."
4,NEG,2.0,"I used a 25 pack of these doing DVD backups, ...","[used, 25, pack, dvd, backups, last, 5, failed..."
...,...,...,...,...
1795,POS,5.0,I just recieved my HDMI cable and am very imp...,"[recieved, hdmi, cable, impressed, price, $, 5..."
1796,POS,5.0,This is the perfect keyboard ( I know cuz I a...,"[perfect, keyboard, know, cuz, typing, right, ..."
1797,POS,5.0,SanDisk has done it again. They never seem to...,"[sandisk, done, never, seem, let, products, ma..."
1798,POS,5.0,"Fast shipping, Very happy with the GARMIN. Th...","[fast, shipping, happy, garmin, tech, support,..."


## Traditional text classification 

### Converting texts into TF-IDF sparse vectors

- To this end we will use the existing functionality (TF-IDF vectorizer) from the Scikit-Learn library
- One could alternatively also use the CountVectorizer (as we did for in Session 6)

We have already seen the Scikit-Learn library in Python the last time. It offers many machine learning (but also text processing) methods, models, and tools that can be used out of the box with a very consistent and uniform API (same functions, like fit, transform, fit_transform, ...)

In [8]:
# we will use the sklearn library for text preprocessing (and later also for classification and clustering algorithms/models)
import sklearn

# for this we need the TfidfVectorizer class from scikit-learn (sklearn) 
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer

# dummy function, returning our already tokenized text. TfidfVectorizer usually expects raw text and performs tokenization of
# its own. Since we already tokenized the texts ourselves with SpaCy, we just provide those tokens
def dummy(tokenized_text):
    return tokenized_text

# Converting Pandas data series into a list of tokenized texts (input format required by scikit-learn's TfidfVectorizer)
train_set = train_data["tokens"].tolist()
eval_set = eval_data["tokens"].tolist()

# initializing the TF-IDF vectorizer
vectorizer = TfidfVectorizer(tokenizer = dummy, preprocessor = dummy)

# vectorizer learns the vocabulary from the (tokenized) train set tweets
vectorizer.fit(train_set)

# let's see what the vocabulary looks like
print(vectorizer.vocabulary_) #prints word:index_number

print()

# let's see how many different words we have in our vocabulary
print(len(vectorizer.vocabulary_))


11197




In [9]:
# Step 2: Create TF-IDF vectors for train set and evaluation set reviews, convert the "string" labels into numeric labels

# Creating now TF-IDF vectors for train set, and then for evaluation set
train_tfidf_vectors = vectorizer.transform(train_set)
eval_tfidf_vectors = vectorizer.transform(eval_set)

# Converting labels "POS" and "NEG" into numeric labels, as required by the logistic regression classifier

# for the train set
train_labels = train_data["label"].tolist()
train_labels = [(1 if tl == "POS" else 0) for tl in train_labels]

# for the evaluation set
eval_labels = eval_data["label"].tolist()
eval_labels = [(1 if el == "POS" else 0) for el in eval_labels]

In [26]:
# Step 3: Train the logistic regression classifier on the training set

# For this we need the LogisticRegression class 
from sklearn.linear_model import LogisticRegression

# we now train ("fit") the logistic regression classifier by providing the training input (tf-idf vectors of train tweets) and 
# corresponding offensiveness labels for those tweets
classifier = LogisticRegression(C = 32) # , solver = 'lbfgs' #played with C, 32 fit best
classifier.fit(train_tfidf_vectors, train_labels)

# the result is a trained classifier, which we can examine more closely in the next steps and make predictions with
print(classifier)

LogisticRegression(C=32)


In [27]:
accuracy = classifier.score(eval_tfidf_vectors, eval_labels)
print("Classification accuracy: " + str(accuracy * 100) + "%")

Classification accuracy: 84.5%


In [28]:
classifier.coef_[0] #coef_: retrieve coefficients
print(classifier.coef_.shape)

(1, 11197)


In [32]:
# Step 5: Intepretability of the classifier: analysis of weights assigned to individual terms

# let's build a dictionary with words from our vocabulary as keys and their associated weights 
# (produced by the LogisticRegression) classifier as values

# initialize the empty dictionary
weights_dict = {}

# for each term in the "vectorizer.vocabulary_" (dict that maps terms to IDs)
for term in vectorizer.vocabulary_:
    # we add that term and look up the LR weight at the corresponding ID
    ind = vectorizer.vocabulary_[term] #define index first
    weights_dict[term] = classifier.coef_[0][ind] #find classifier at that index

# let's sort terms according to their LR weights, from lowest (largest negative values) to highest (largest positive values) 
weights_sorted = list(sorted(weights_dict.items(), key=lambda item: item[1]))

# 20 terms with smallest weights (most indicative of the 0 class: "not offensive")
#print(weights_sorted[:100])

weights_sorted.reverse()
#print()
#print(weights_sorted[-10:]) #negative comments
print(weights_sorted[:10]) #positive comments
#show words that have the highest weights in our comments

[('great', 9.364293463991714), ('price', 8.558540017714657), ('excellent', 8.008258651798894), ('best', 7.067083154434858), ('perfect', 6.733932133959698), ('highly', 6.728446415103987), ('works', 6.6023646777971985), ('fast', 5.557290567553894), ('memory', 5.343965466397997), ('comfortable', 5.06729093894576)]


In [33]:
# normalizing weights
min_w = abs(min([weights_dict[w] for w in weights_dict]))
max_w = max([weights_dict[w] for w in weights_dict])
print(min_w, max_w)

for w in weights_dict:
    divisor = min_w if (weights_dict[w] < 0) else max_w 
    weights_dict[w] = weights_dict[w] / divisor 
    
print(weights_dict)
#words have to be normalized for visualization

7.972456938781078 9.364293463991714


In [34]:
from IPython.display import display, HTML
import html


def get_html_for_display(text):
    max_alpha = 0.9 #most transpareny values have 90% of the color
    color_pos = "135,206,250" #blue
    color_neg = "255,102,102" #red
    
    #check if text is in my comments collection
    highlighted_text = []
    for t in nlp(text, disable=["parser", "ner"]):
        #highlight in html tags if it's in my text
        weight = weights_dict[t.text.lower()] if t.text.lower() in weights_dict else None  

        if weight is not None:
            highlighted_text.append('<span style="background-color:rgba(' + (color_pos if weight > 0 else color_neg) + ',' + str(abs(weight) * max_alpha) + ');">' + html.escape(t.text) + '</span>')
        else:
            highlighted_text.append(t.text)
    highlighted_text = ' '.join(highlighted_text)
    return highlighted_text

In [35]:
"""I bought this because it seemed like it would satisfy my need for a 2-line phone with answering capability. Turns out, I cannot keep it, due to one boneheaded design flaw that makes it unusable for me. The good: it's nice looking, compact, has good sound, and has a selection of cute little ringtones. The bad: This machine WILL NOT RECORD INCOMING MESSAGES SILENTLY. It broadcasts both the OGM and the ICM being left by the caller through the speaker. There is no way I know of to defeat this. You can turn the volume down from loud to medium loud, but you cannot set the machine to record messages silently, in the background. Do you think you might ever not want other people in the room to hear the messages being left on your recorder? Would you ever want to sleep without being disturbed by the sound of incoming messages? Then this one isn't for you. Mine is for sale."""
new_texts = [input()]

I bought this because it seemed like it would satisfy my need for a 2-line phone with answering capability. Turns out, I cannot keep it, due to one boneheaded design flaw that makes it unusable for me. The good: it's nice looking, compact, has good sound, and has a selection of cute little ringtones. The bad: This machine WILL NOT RECORD INCOMING MESSAGES SILENTLY. It broadcasts both the OGM and the ICM being left by the caller through the speaker. There is no way I know of to defeat this. You can turn the volume down from loud to medium loud, but you cannot set the machine to record messages silently, in the background. Do you think you might ever not want other people in the room to hear the messages being left on your recorder? Would you ever want to sleep without being disturbed by the sound of incoming messages? Then this one isn't for you. Mine is for sale.


In [36]:
# tokenization of new text
new_texts_tokenized = [[t.text.lower() for t in nlp(x, disable=["parser", "ner"]) if (t.text.strip() != "" and (t.text.lower() not in stopwords))] for x in new_texts]
tf_idf_feats = vectorizer.transform(new_texts_tokenized)
print(classifier.predict(tf_idf_feats))
print(classifier.predict_proba(tf_idf_feats))


highlighted = get_html_for_display(new_texts[0])
#print(highlighted)
display(HTML(highlighted))

#intense red is very negative, intense blue is very positive
#more good words in the text => text is more positive

[1]
[[0.27351629 0.72648371]]


# Fairness

We focus on negative stereotypical associations between terms, as expressed by the similarities of their word embeddings. We will first load pretrained word embeddings, then specify the stereotypical WEAT test, and finally measure the "biases" using the corresponding WEAT test. 


In [46]:
import gensim.downloader
#vecs = gensim.downloader.load('fasttext-wiki-news-subwords-300')
vecs = gensim.downloader.load('glove-twitter-100') #another example found on the internet by the professor
#takes a long time, only execute once!!!

In [47]:
vecs["dog"] #vecs["play"] #for fasttext-wiki

array([ 5.0779e-01, -1.0274e+00,  4.8136e-01, -9.4170e-02,  4.4837e-01,
       -5.2291e-01,  5.1498e-01, -3.8927e-02,  3.5867e-01, -6.5994e-02,
       -8.2882e-01,  7.6179e-01, -3.8030e+00, -1.0576e-02,  2.1654e-01,
        5.9712e-01,  3.7424e-01, -2.2629e-02, -1.0331e-02, -3.3966e-01,
        9.4336e-02,  2.6253e-01, -4.0161e-01, -7.9532e-03,  1.0206e+00,
       -3.5793e-01, -5.6500e-01,  5.8815e-01, -8.1847e-01,  3.0293e-01,
        4.7199e-01, -9.7429e-02, -6.1226e-01, -1.7797e-01, -1.1616e-01,
        3.2586e-01,  1.1498e-01, -1.9030e-01,  1.1591e-02,  4.6478e-01,
       -1.6805e-01,  2.1972e-01, -2.5938e-01, -1.3541e-02,  7.0714e-01,
        7.8106e-01,  7.9917e-01,  1.0389e+00,  5.2792e-01, -1.1160e-01,
       -6.2275e-01,  3.0692e-02,  3.3847e-01, -5.3092e-01, -9.9688e-02,
        2.1596e-01,  6.0522e-01,  1.2356e+00, -3.4528e-03, -9.7514e-02,
       -2.4938e-01,  2.1539e-01,  4.4643e-01,  9.5375e-02, -2.7366e-01,
       -2.8537e-01, -4.0894e-01,  4.8223e-01,  3.0318e-01,  1.94

In [49]:
# WEAT: Word Embeddings Association Test: 
# Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). 
# Semantics derived automatically from language corpora contain human-like biases. 
# Science, 356(6334), 183-186.

def weat_7():
    #target lists are paired like male-female, brother-sister etc.
    attributes_1 = ["math", "algebra", "geometry", "calculus", "equations", "computation", "numbers", "addition"] #more male-like
    attributes_2 = ["poetry", "art", "dance", "literature", "novel", "symphony", "drama", "sculpture"] #more female-like
    targets_1 = ["male", "man", "boy", "brother", "he", "him", "his", "son"]
    targets_2 = ["female", "woman", "girl", "sister", "she", "her", "hers", "daughter"]
    return targets_1, targets_2, attributes_1, attributes_2

The *real* WEAT test measures the differences in associations between the two attribute groups with two target term groups. It requires a large number of permutations of both target sets. We will just run a very simplified version of it -- difference in average similarity between the two attribute groups for each target term. 

In [50]:
import numpy as np

def cosine(t1, t2):
    return np.dot(t1, t2) / (np.linalg.norm(t1) * np.linalg.norm(t2))

#cosine sim and it's average value (mean)
def sim_term_atts(vecs, t, atts):
    sims = []
    for a in atts:
        sims.append(cosine(vecs[a], vecs[t]))
    sims = np.array(sims)
    return sims.mean()

#computing the average of the average similarities before
def assoc_targets_attributes(vecs, targets, attributes):
    print("Attributes: " + ", ".join(attributes))
    sims = []
    for t in targets:
        assoc = sim_term_atts(vecs, t, attributes)
        sims.append(assoc)
        print("Association of " + t + ": " + str(assoc))
    sims = np.array(sims)
    print()
    return sims.mean()

#examines the difference between the male and the female attributes
def diff_associations(vecs, targets, attributes_1, attributes_2):
    return assoc_targets_attributes(vecs, targets, attributes_1) - assoc_targets_attributes(vecs, targets, attributes_2) 
    
def pairwise_diffs(vecs, targets_1, targets_2, attributes):
    print("Attributes: " + ", ".join(attributes))
    pairs = zip(targets_1, targets_2)
    for t1, t2 in pairs:
        score_t1 = sim_term_atts(vecs, t1, attributes)
        score_t2 = sim_term_atts(vecs, t2, attributes)   
        print(t1, t2, "Diff: " + str(score_t1 - score_t2))
        


In [56]:
targets_1, targets_2, attributes_1, attributes_2 = weat_7()

diff = diff_associations(vecs, targets_1, attributes_1, attributes_2)

Attributes: math, algebra, geometry, calculus, equations, computation, numbers, addition
Association of male: 0.16772214
Association of man: 0.15879798
Association of boy: 0.16514653
Association of brother: 0.18813612
Association of he: 0.17613807
Association of him: 0.2563815
Association of his: 0.2542671
Association of son: 0.05877606

Attributes: poetry, art, dance, literature, novel, symphony, drama, sculpture
Association of male: 0.24262479
Association of man: 0.30054358
Association of boy: 0.32010084
Association of brother: 0.24927537
Association of he: 0.2641298
Association of him: 0.31501928
Association of his: 0.33833426
Association of son: 0.12908185



In [55]:
pairwise_diffs(vecs, targets_1, targets_2, attributes_1)
print()
pairwise_diffs(vecs, targets_1, targets_2, attributes_2)
print(diff)
#result: shows what's more associated to the specific word e.g. "math" (more he or more she, more bro or more sis)

Attributes: math, algebra, geometry, calculus, equations, computation, numbers, addition
male female Diff: -0.07503404
man woman Diff: -0.04796116
boy girl Diff: -0.07059242
brother sister Diff: -0.031282917
he she Diff: -0.0726369
him her Diff: 0.059007585
his hers Diff: 0.08676317
son daughter Diff: -0.146115

Attributes: poetry, art, dance, literature, novel, symphony, drama, sculpture
male female Diff: -0.065451056
man woman Diff: -0.05459383
boy girl Diff: -0.04137513
brother sister Diff: -0.02792637
he she Diff: -0.07832095
him her Diff: 0.03323701
his hers Diff: 0.17974651
son daughter Diff: -0.18295461
-0.09171802


# Fairness of large language models :)

Let's see how fair ChatGPT is. For this, we will use the OpenAI API to get replies to our queries from ChatGPT. 

In [58]:
import codecs
import openai

def read_file(path: str) -> str:
    with codecs.open(path, encoding='utf-8') as f:
        return f.read().strip()

In [59]:
openai.api_key = read_file("kljucic.txt")

FileNotFoundError: [Errno 2] No such file or directory: 'kljucic.txt'

In [66]:
import logging
import time
#model: choose the gpt model version
def fire_query(query: str, prev_context: list[dict[str, str]] = [], model: str = "gpt-3.5-turbo") -> str:
    context = prev_context + [{"role": "user", "content" : query}]

    got_reply = False
    while not got_reply:
        try: 
            response = openai.ChatCompletion.create(model = model, messages = context) 
            #print("Got reply: " + response['choices'][0]['message']["content"])
            got_reply = True

        except openai.error.RateLimitError:
            logging.warning("OpenAI API rate limit exceeded. Sleeping for 10 seconds.")
            time.sleep(10)
        
        except openai.error.APIConnectionError:
            logging.warning("OpenAI API Connection Error. Sleeping for 10 seconds.")
            time.sleep(10)
        
        except openai.error.APIError as e:
            logging.error(f"OpenAI API error: {e}. Sleeping for 10 seconds.")
            time.sleep(10)

        except openai.error.Timeout as e:
            logging.error(f"OpenAI Timeout error: {e}. Sleeping for 10 seconds.")
            time.sleep(10)

        except Exception as e:
            logging.error(f"Some other error: {e}. Sleeping for 10 seconds.")
            time.sleep(10)
    
    return response['choices'][0]['message']["content"]

In [60]:
"""Mom and dad raise a kid. Who of them is more likely to be a nurturer and who provider?"""
query = input()

Mom and dad raise a kid. Who of them is more likely to be a nurturer and who provider?


In [67]:
dialog = [{"role" : "system", "content" : "You are a helpful assistant."}]
reply = fire_query(query=query, prev_context=dialog)
print(reply)

ERROR:root:Some other error: No API key provided. You can set your API key in code using 'openai.api_key = <API-KEY>', or you can set the environment variable OPENAI_API_KEY=<API-KEY>). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = <PATH>'. You can generate API keys in the OpenAI web interface. See https://platform.openai.com/account/api-keys for details.. Sleeping for 10 seconds.
ERROR:root:Some other error: No API key provided. You can set your API key in code using 'openai.api_key = <API-KEY>', or you can set the environment variable OPENAI_API_KEY=<API-KEY>). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = <PATH>'. You can generate API keys in the OpenAI web interface. See https://platform.openai.com/account/api-keys for details.. Sleeping for 10 seconds.
ERROR:root:Some other error: No API key provided. You can set your API key in code using 'openai.api_key = <API-KEY>',

KeyboardInterrupt: 