Skip to content

techthiyanes/xai-nlp-notebooks

 
 

Repository files navigation

Content



This notebook contains an example of fine-tuning an Electra model on the GLUE SST-2 dataset. After fine-tuning, the Integrated Gradients interpretability method is applied to compute tokens' attributions for each target class.

  • We will instantiate a pre-trained Electra model from the Transformers library.
  • The data is downloaded from the nlp library. The input text is tokenized with ElectraTokenizerFast tokenizer backed by HF tokenizers library.
  • Fine-tuning for sentiment analysis is handled by the Trainer class.
  • After fine-tuning, the Integrated Gradients interpretability algorithm will assign importance scores to input tokens. We will use a PyTorch implementation from the Captum library.
    • The algorithm requires providing a reference sample (a baseline) since importance attribution is performed based on the model's output, as inputs change from reference values to the actual sample.
    • The Integrated Gradients method satisfies the completeness property. We will look at the sum of attributions for a sample and show that the sum approximates (explains) prediction's shift from the baseline value.
  • The final sections of the notebook contain a colour-coded visualization of attribution results made with captum.attr.visualization library.

The notebook is based on the Hugging Face documentation and the implementation of Integrated Gradients attribution methods is adapted from the Captum.ai Interpreting BERT Models (Part 1).

Visualization

Captum visualization library shows in green tokens that push the prediction towards the target class. Those driving the score towards the reference value are marked in red. As a result, words perceived as positive will appear in green if attribution is performed against class 1 (positive) but will be highlighted in red with an attribution targeting class 0 (negative).

Because importance scores ar assigned to tokens, not words, some examples may show that attribution is highly dependent on tokenization.

Attributions for a correctly classified positive example


Attributions for a correctly classified negative example


Attributions for a negative sample misclassified as positive





In a world of ever-growing amount of data, the task of automatically creating coherent and fluent summaries is gaining importance. Coming up with a shorter, concise version of a document, can help to derive value from large volumes of text.

This notebook contains an example of fine-tuning Bart for generating summaries of article sections from the WikiLingua dataset. WikiLingua is a multilingual set of articles. We will run the same code for two Bart checkpoints, including a non-English model from the Hugging Face Model Hub. We will be using:




This notebook contains an example of two feature attribution methods applied to a PyTorch model predicting fuel efficiency for the Auto MPG Data Set.

We will use the following methods:

Attribution methods are applied per sample. As a result, each feature is assigned a value reflecting its contribution to the model's output or, more precisely, to the difference between the model's output for the sample and the expected value.

Both methods used in this notebook require setting a baseline, i.e.: a vector of values that will be used, for each feature, in place of a missing value. The baseline vector serves as a set of reference values that can be thought of as neutral and that are used to represent a missing value whenever a method requires it. We will calculate the expected value as model's output for a selected baseline.

All attributions together account for the difference between the model's prediction for a sample and the expected value of the model's output for a selected baseline.

In the examples below we will consider various baselines and see how they influence assigning importance to features. We will see that, for each sample, attributions sum up to the difference between the model's output for the sample and the expected value (model's output for the baseline used to compute attributions).

Attributions explain prediction


Attributions sum up to the difference between the model's output and the expected value (model's output for the baseline vector).

Features and attributions


The diagrams show how high and low values of features are distributed across the range of attributions assigned by IG and SHAP for various baselines. For some features, high values of the feature (in red) correlate with high values of attributions (x-axis), for some, they gather in the lower range or there is no clear correlation.

Impact of features


Accumulated feature importance varies more between baselines than it does between attribution methods. One intuitive explanation is that since both methods use a baseline to stand for a missing value, features that have close to monotonic relationship to the target will be more consistently attributed a higher absolute impact when replaced by a zero.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%