##### Copyright 2020 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Model Remediation Case Study

<div class="devsite-table-wrapper"><table class="tfo-notebook-buttons" align="left">
  <td><a target="_blank" href="https://colab.sandbox.google.com/github/tensorflow/fairness-indicators/blob/master/fairness_indicators/documentation/examples/Fairness_Indicators_Lineage_Case_Study.ipynb">
  <img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
</td>
<td>
  <a target="_blank" href="https://colab.sandbox.google.com/github/tensorflow/tensorflow/model-remediation/blob/master/docs/examples/min_diff_keras.ipynb">
  <img src="https://www.tensorflow.org/images/colab_logo_32px.png">Run in Google Colab</a>
</td>
<td>
  <a target="_blank" href="https://github.com/tensorflow/model-remediation">
  <img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png">View source on GitHub</a>
</td>
<td>
    <a target="_blank" href="https://storage.googleapis.com/tensorflow_docs/docs/examples/min_diff_keras.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
</td>
</table></div>

In this notebook, we’ll train a ‘toxicity classifier’ to identify comments that could be considered toxic or harmful. Since this model could be used to automatically moderate forums on the internet, we will attempt to ensure that it works well on text containing references to sensitive identity terms in order to avoid unfairly censoring speech. We will first use [Fairness Indicators](https://www.tensorflow.org/tfx/fairness_indicators) to evaluate our baseline model’s performance on text containing references to religious groups. Then, we will try to improve performance on any underperforming slices by training with MinDiff. Finally, we’ll evaluate the new model’s performance.

Since our purpose is to demonstrate the MinDiff technique, we won’t perform in-depth fairness evaluation in this notebook. However, please note that thorough evaluation is an important precondition for applying MinDiff. For a more comprehensive tutorial showing how to use Fairness Indicators to evaluate for fairness in a similar use case, see [this module](https://developers.google.com/machine-learning/practica/fairness-indicators).


## Setup

We begin by installing Fairness Indicators and TensorFlow Model Remediation.


In [None]:
#@title Installs
#!pip install --upgrade tensorflow-model-remediation
#!pip install --upgrade fairness-indicators

Import all necessary components, including MinDiff and Fairness Indicators for evaluation.

In [None]:
#@title Imports
import copy
import os
import requests
import tempfile
import zipfile

import tensorflow_model_remediation.min_diff as md
from google.protobuf import text_format
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_model_analysis as tfma
import tensorflow_data_validation as tfdv
from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from tensorflow_model_analysis.addons.fairness.view import widget_view

In [None]:
#@title Import Util
%load -r 16: min_diff_keras_util.py


We use a util to download the preprocessed data and prepare the labels to match the model’s output shape. The util also downloads the data as TFRecords to make later evaluation quicker. Alternatively, you may convert the Pandas DataFrame into TFRecords with any available utility conversion function.


In [None]:
#@title Preprocess Data
# We use a helper utility to preprocessed data for convenience and speed.
data_train, data_validate, validate_tfrecord_file, labels_train, labels_validate = download_and_process_civil_comments_data()

A few useful constants.  We will train the model on the “comment_text” feature, with label “toxicity.” Note that the batch size here is somewhat arbitrary. In a production setting we would want to make sure to tune it for best performance.

In [None]:
#@title Constants
TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'
BATCH_SIZE = 128

Set random seeds for more reproducible results.

In [None]:
#@title Seeds
np.random.seed(1)
tf.random.set_seed(1)

## Define and train the baseline model

For efficiency, we use a pretrained model by default. It is a simple Keras sequential model with an initial embedding and convolution layers, outputting a toxicity prediction. If you prefer, you can change this and train from scratch using our util function to create the model.

In [None]:
#@title Train Model

use_pretrained_model = True #@param {type:"boolean"}

if use_pretrained_model:
  URL = 'https://storage.googleapis.com/civil_comments_model/baseline_model.zip'
  ZIPPATH = 'baseline_model.zip'
  DIRPATH = '/tmp/baseline_model'
  r = requests.get(URL, allow_redirects=True)
  open(ZIPPATH, 'wb').write(r.content)

  with zipfile.ZipFile(ZIPPATH, 'r') as zip_ref:
    zip_ref.extractall('/')

  baseline_model = tf.keras.models.load_model(DIRPATH, custom_objects={'KerasLayer' : hub.KerasLayer})

else:
  optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
  loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)

  baseline_model = create_keras_sequential_model()
  
  baseline_model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

  baseline_model.fit(x=data_train[TEXT_FEATURE],
                     y=labels_train, batch_size=BATCH_SIZE,
                     epochs=10)


We save the model in order to evaluate using Fairness Indicators.

In [None]:
#@title Save Model
base_dir = tempfile.mkdtemp(prefix='saved_models')
baseline_model_location = os.path.join(base_dir, 'model_export_baseline')
baseline_model.save(baseline_model_location, save_format='tf')


Next we run Fairness Indicators model analysis. In this notebook, we’re just going to perform sliced evaluation for comments referencing religious groups, since our intention is to showcase our remediation technique on those slices. In a production environment, we recommend taking a thoughtful approach to determining which categories to evaluate across. It’s very likely that you’ll need to perform evaluation across slices of data belonging to several sensitive categories. 

To compute model performance, the util function makes a few convenient choices for metrics, slices, and classifier thresholds.


In [None]:
#@title Run Model Analysis

# We use a helper utility to hide the evaluation logic for readability.
base_dir = tempfile.mkdtemp(prefix='eval') 
eval_subdir = 'eval_results_baseline'
eval_result = get_eval_results(baseline_model_location, base_dir,
                               eval_subdir, validate_tfrecord_file)

Let’s look at the evaluation results. Try selecting the metric “false_positive_rate” with threshold 0.450.  We can see that the model does not perform as well for some minority religious groups as for others, displaying a high false positive rate (FPR) at many thresholds. We’ve chosen to focus on this metric, because a higher FPR means that comments referencing these identity groups are more likely to be incorrectly flagged as toxic than other comments. This could lead to unfair censorship of non-toxic conversations.


In [None]:
#@title Render Evaluation Results

widget_view.render_fairness_indicator(eval_result)

## Define and Train the MinDiff Model

Now that we’ve performed our evaluation and identified an issue, we want to improve the FPR for underperforming groups, while ensuring that the overall FPR and the FPR for other groups doesn’t degrade below an acceptable level. We’ll attempt to do so using [MinDiff](https://arxiv.org/abs/1901.04562), a remediation technique that seeks to balance error rates across slices of your data by penalizing disparities in performance during training. When we apply MinDiff, model performance may degrade slightly on slices that otherwise display higher performance. As such, our goals with MinDiff will be:
Improved performance for underperforming groups
Limited degradation for highest performing groups and overall performance

It’s important to note that MinDiff requires that we have sufficient examples belonging to the underperforming classes during training. While the number of examples necessary to apply MinDiff will depend on the model and use case, we’ve found 5,000 to be enough. In many cases, the actual number needed may be much lower. For more in-depth guidance on MinDiff’s requirements, please see the MinDiff guide.

In our case, the negatively-labeled minority group example categories are size 9,688, 3,906, and 759.  Other groups that we didn’t include in either set are each under size 350.  The majority group negatively-labeled examples category is size 22,044. 

Because this dataset is heavily class-imbalanced, we also recommend attempting to balance the class distribution by collecting more examples, even before applying MinDiff. In our case, there are very few comments referencing some religious groups, so we may attempt to collect or generate more. However, this may not be possible or practical for some use cases. We will skip this data collection step for now, since our intention is only to demonstrate how MinDiff can be applied.

###Preparing Your Data

To use MinDiff, we create two additional data splits:
A split for the minority group: In our case, this will include comments with references to our underperforming identity terms.  We don’t include some of the categories because there are too few examples, leading to higher uncertainty with wide confidence interval ranges.
A split for the majority group, which includes comments referring to better performing groups.

We select all negative examples for these groups, so that MinDiff can optimize on getting these examples right. It may seem counterintuitive to carve out sets of ground truth negative examples if we’re primarily concerned with disparities in false positive rate, but remember that a false positive prediction is a ground truth negative example that’s incorrectly classified as positive, which is the issue we’re trying to address.

In [None]:
#@title Create Mindiff DataFrames

# Create masks for the sensitive and nonsensitive groups
minority_mask = data_train.religion.apply(
    lambda x: any(religion in x for religion in 
                  ('atheist', 'jewish', 'muslim')))
majority_mask = data_train.religion.apply(lambda x: x == "['christian']")

# Select nontoxic examples, so Mindiff will be able to reduce sensitive FP rate.
true_negative_mask = data_train['toxicity'] == 0

data_train_main = copy.copy(data_train)
data_train_sensitive = data_train[minority_mask & true_negative_mask]
data_train_nonsensitive = data_train[majority_mask & true_negative_mask]

We also need to convert our Pandas DataFrames into Tensorflow Datasets for MinDiff input.  Note that unlike the Keras model API for Pandas DataFrames, using Datasets means that we need to provide the model’s input features and labels together in one Dataset. Here we provide the comment_text as an input feature and reshape the label to match the model's expected output.

We batch the Dataset at this stage, too, since min_diff expects batched Datasets.  Note that we tune the batch size selection the same way it is tuned for the baseline model, taking into account training speed and hardware considerations while balancing with model performance. Here we have chosen the same batch size for all three datasets but this is not a requirement.

In [None]:
#@title Create Mindiff Datasets

# Convert the pandas DataFrames to Datasets.
dataset_train_main = tf.data.Dataset.from_tensor_slices(
    (data_train_main['comment_text'].values, 
     data_train_main.pop(LABEL).values.reshape(-1,1) * 1.0)).batch(BATCH_SIZE)
dataset_train_sensitive = tf.data.Dataset.from_tensor_slices(
    (data_train_sensitive['comment_text'].values, 
     data_train_sensitive.pop(LABEL).values.reshape(-1,1) * 1.0)).batch(BATCH_SIZE)
dataset_train_nonsensitive = tf.data.Dataset.from_tensor_slices(
    (data_train_nonsensitive['comment_text'].values, 
     data_train_nonsensitive.pop(LABEL).values.reshape(-1,1) * 1.0)).batch(BATCH_SIZE)

###Training and Evaluating the Model

To train the model with MinDiff, simply take the regular model and wrap it in a MinDiffModel with a MinDiff loss and MinDiff strength.  We are using 1.5 as the default and for the pretrained model, but this is a parameter that needs to be tuned for your use case, since it depends on your product requirements.  You can experiment with changing the MinDiff strength to see how it impacts the model, noting that increasing the strength pushes the performance of the minority and majority groups closer together but may come with more pronounced tradeoffs.

Then compile the model normally (using the regular non-MinDiff loss) and fit to train.


In [None]:
#@title Train Model

use_pretrained_model = True #@param {type:"boolean"}
min_diff_strength = 1.5 #@param {type:"number"}

base_dir = tempfile.mkdtemp(prefix='saved_models')
min_diff_model_location = os.path.join(base_dir, 'model_export_min_diff')

if use_pretrained_model:
  URL = 'https://storage.googleapis.com/civil_comments_model/min_diff_model.zip'
  ZIPPATH = 'min_diff_model.zip'
  DIRPATH = '/tmp/min_diff_model'
  r = requests.get(URL, allow_redirects=True)
  open(ZIPPATH, 'wb').write(r.content)

  with zipfile.ZipFile(ZIPPATH, 'r') as zip_ref:
    zip_ref.extractall('/')

  min_diff_model = tf.keras.models.load_model(DIRPATH, custom_objects={'KerasLayer' : hub.KerasLayer})
  
  min_diff_model.save(min_diff_model_location, save_format='tf')

else:
  # Create the dataset that will be passed to the MinDiffModel during training.
  dataset = md.keras.utils.input_utils.pack_min_diff_data(
      dataset_train_main, dataset_train_sensitive, dataset_train_nonsensitive)

  # Create the original model.
  original_model = create_keras_sequential_model()
  
  # Wrap the original model in a MinDiffModel, passing in one of the min diff
  # losses and using a moderately high strength.
  min_diff_loss = md.losses.MMDLoss()
  min_diff_model = md.keras.MinDiffModel(original_model, min_diff_loss, min_diff_strength)

  # Compile the model normally after wrapping the original model.  Note that
  # this means we use the baseline's model's loss here.
  optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
  loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
  min_diff_model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

  min_diff_model.fit(dataset, epochs=10)

  min_diff_model.save_original_model(min_diff_model_location, save_format='tf')

Next we evaluate the results.  

In [None]:
#@title Run Model Analysis
min_diff_eval_subdir = 'eval_results_min_diff'
min_diff_eval_result = get_eval_results(min_diff_model_location, base_dir,
                                    min_diff_eval_subdir, validate_tfrecord_file,
                                    slice_selection='religion')


To ensure we evaluate our new model correctly, we need to select a threshold that results in a similar overall FPR to the baseline model.  This threshold may be different from the one you selected for the baseline model.  Try selecting the metric “false_positive_rate” with threshold 0.425.  (Note that the subgroups with very low quantity examples have very wide confidence range intervals and don’t have predictable results.) 

In [None]:
#@title Render Evaluation Results

widget_view.render_fairness_indicator(min_diff_eval_result)

Reviewing these results, we notice that our false positive rates seem to have improved. However, upon further inspection, we can see that the FPR for the Atheist group has not changed noticeably, and the high confidence bounds make us less certain of any changes. This is likely due to insufficient examples belonging to this class and may require collecting more examples to improve performance. 

Still, the gap between our lowest performing group and the majority group has improved from .024 to .008 and the FPR for two of the three target groups has improved dramatically. Given the improvements we’ve observed and the continued strong performance for the majority group, we’ve satisfied both of our goals. Depending on product requirements, further improvements may be necessary, but this approach has gotten our model one step closer to performing equitably for all users.

