# Fairness Indicators on TF-Hub Text Embeddings

In this colab, you will learn how to use [Fairness Indicators](https://github.com/tensorflow/fairness-indicators) to evaluate embeddings from [TF Hub](https://www.tensorflow.org/hub). Fairness Indicators is a suite of tools that facilitates evaluation and visualization of fairness metrics on machine learning models. Fairness Indicators is built on top of [TensorFlow Model Analysis](https://www.tensorflow.org/tfx/guide/tfma), TensorFlow's official model evaluation library.


# Imports

In [0]:
!pip install fairness-indicators

In [0]:
%tensorflow_version 2.x
import os
import tempfile
import apache_beam as beam
from datetime import datetime
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_model_analysis as tfma
from tensorflow_model_analysis.addons.fairness.view import widget_view
from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators
from fairness_indicators.examples import util

# Defining Constants

In [0]:
BASE_DIR = tempfile.gettempdir()

# The input and output features of the classifier
TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'

# Data



In this exercise, we'll work with the [Civil Comments dataset](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), approximately 2 million public comments made public by the [Civil Comments platform](https://github.com/reaktivstudios/civil-comments) in 2017 for ongoing research. This effort was sponsored by Jigsaw, who have hosted competitions on Kaggle to help classify toxic comments as well as minimize unintended model bias.

Each individual text comment in the dataset has a toxicity label, with the label being 1 if the comment is toxic and 0 if the comment is non-toxic. Within the data, a subset of comments are labeled with a variety of identity attributes, including categories for gender, sexual orientation, religion, and race or ethnicity.

You can choose to download the original dataset and process it in the colab, which may take minutes, or you can download the preprocessed data.

In [0]:
download_original_data = False

if download_original_data:
  train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')

  # The identity terms list will be grouped together by their categories
  # on threshould 0.5. Only the identity term column, text column,
  # and label column will be kept after processing.
  train_tf_file = util.convert_comments_data(train_tf_file)
  validate_tf_file = util.convert_comments_data(validate_tf_file)

else:
  train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',
                                          'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')
  validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',
                                             'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')

## Identity Terms

You can select the subset of identity groups you are interested in by removing the others from the list below. By default, we will look at all identity terms.

In [0]:
IDENTITY_TERMS = ['gender', 'sexual_orientation', 'race', 'religion', 'disability']

# Creating a TensorFlow Model Analysis Pipeline

The Fairness Indicators library operates on [TensorFlow Model Analysis (TFMA) models](https://www.tensorflow.org/tfx/model_analysis/get_started). TFMA models wrap [TensorFlow models](https://www.tensorflow.org/guide/estimator) with additional functionality to evaluate and visualize their results. The actual evaluation occurs inside of an [Apache Beam pipeline](https://beam.apache.org/documentation/programming-guide/).

So we need to...
1. Build a TensorFlow model.
2. Build a TFMA model on top of the TensorFlow model.
3. Run the model analysis in a Beam pipeline.

## 1) Build a TensorFlow Model

### Define an Input Function

TensorFlow parses features from data using [`FixedLenFeature`](https://www.tensorflow.org/api_docs/python/tf/io/FixedLenFeature) and [`VarLenFeature`](https://www.tensorflow.org/api_docs/python/tf/io/VarLenFeature). So to allow TensorFlow to parse our data, we will need to map out our input feature, output feature, and any slicing features that we will want to analyze via Fairness Indicators.

In [0]:
FEATURE_MAP = {
    # input and output features
    LABEL: tf.io.FixedLenFeature([], tf.float32),
    TEXT_FEATURE: tf.io.FixedLenFeature([], tf.string),

    # slicing features
    'sexual_orientation': tf.io.VarLenFeature(tf.string),
    'gender': tf.io.VarLenFeature(tf.string),
    'religion': tf.io.VarLenFeature(tf.string),
    'race': tf.io.VarLenFeature(tf.string),
    'disability': tf.io.VarLenFeature(tf.string)
}

Now that we have defined our features and their types, we can create an input function for our model.

In [0]:
def input_fn(tf_file):
  def parse_function(serialized):
    parsed_example = tf.io.parse_single_example(
        serialized=serialized, features=FEATURE_MAP)
    # Adds a weight column to deal with unbalanced classes.
    parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)
    return (parsed_example,
            parsed_example[LABEL])
  dataset = tf.data.TFRecordDataset(
      filenames=[tf_file]).map(parse_function).batch(512)
  return dataset

### Train a Classifier

For each text embedding, we will train a **[DNN Classifier](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier)**.

**TF Hub** allows us to insert text embeddings as features to our model via **[`text_embedding_column`](https://www.tensorflow.org/hub/api_docs/python/hub/text_embedding_column)**. The function's signature is **`text_embedding_column(key, module_spec)`**, where...

* *`key`* is the name of the DataFrame's text feature (ex: `"comment_text"`)
* *`module_spec`* is a url path to an text embedding module (ex: `"https://tfhub.dev/google/nnlm-en-dim128/1"`)

Because each text embedding column is memory-intensive, the Colaboratory environment may crash if all embeddings are loaded at once. To avoid this, we encapsulate the embedding columns inside a pipeline and wait to get the pipeline's results before loading the next embedding.

In [0]:
def train_classifier(embedding):
  embedded_text_feature_column = hub.text_embedding_column(
      key=TEXT_FEATURE, 
      module_spec=embedding)
  model_dir = os.path.join(BASE_DIR, 'train', datetime.now().strftime(
    "%Y%m%d-%H%M%S"))
  classifier = tf.estimator.DNNClassifier(
      hidden_units=[500, 100],
      weight_column='weight',
      feature_columns=[embedded_text_feature_column],
      n_classes=2,
      optimizer=tf.optimizers.Adagrad(learning_rate=0.003),
      loss_reduction=tf.losses.Reduction.SUM,
      model_dir= model_dir)
  classifier.train(input_fn=lambda: input_fn(train_tf_file), steps=1000);
  return classifier

## 2) Build a TFMA model

TFMA represents datasets as [`tf.Examples`](https://www.tensorflow.org/tutorials/load_data/tfrecord#tfexample), which it parses with [`EvalInputReceivers`](https://github.com/tensorflow/model-analysis/blob/master/tensorflow_model_analysis/eval_saved_model/export.py#L42). Refer to [Getting Started with TensorFlow Model Analysis](https://www.tensorflow.org/tfx/model_analysis/get_started#modify_an_existing_model) for more info on creating `EvalInputReceivers`.

In [0]:
def eval_input_receiver_fn():
  """Create a tfma.export.EvalInputReceiver to parse input features."""
  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_placeholder')
  receiver_tensors = {'examples': serialized_tf_example}
  features = tf.io.parse_example(serialized_tf_example, FEATURE_MAP)
  features['weight'] = tf.ones_like(features[LABEL])
  return tfma.export.EvalInputReceiver(
    features=features,
    receiver_tensors=receiver_tensors,
    labels=features[LABEL])

TFMA represents models with its **[`EvalSharedModel`](https://github.com/tensorflow/model-analysis/blob/master/tensorflow_model_analysis/types.py#L172)** class, which accepts a list of metrics to evaluate and visualize. TFMA represents metrics as callbacks which are computed after the model is exported - hence, the name **[`post_export_metrics`](https://github.com/tensorflow/model-analysis/blob/master/tensorflow_model_analysis/post_export_metrics/post_export_metrics.py)**. The metric provided by the Fairness Indicators library is **`post_export_metrics.fairness_indicators`**.

`EvalSharedModel` is actually a thin wrapper around TFMA's **[`EvalSavedModel`](https://github.com/tensorflow/model-analysis/blob/master/tensorflow_model_analysis/eval_saved_model/load.py#L54)** class. To create an `EvalSavedModel`, we need to pass in a TensorFlow model and an `EvalInputReceiver`.

In [0]:
def create_tfma_model(classifier, eval_input_receiver_fn, metric_callbacks):

  # create EvalSavedModel
  eval_saved_model_path = tfma.export.export_eval_savedmodel(
      estimator=classifier,
      export_dir_base=os.path.join(BASE_DIR, 'tfma_eval_model'),
      eval_input_receiver_fn=eval_input_receiver_fn)

  # create EvalSharedModel
  return tfma.default_eval_shared_model(
        eval_saved_model_path=eval_saved_model_path,
        add_metrics_callbacks=metric_callbacks)

## 3) Get Evaluation Results in Apache Beam

Our [Model Evaluation pipeline](https://www.tensorflow.org/tfx/model_analysis/get_started) will have two steps. The first step is to read in the data in a TF-compatible format - we can use [`beam.io.ReadFromTFRecord`](https://beam.apache.org/releases/pydoc/2.15.0/apache_beam.io.tfrecordio.html#apache_beam.io.tfrecordio.ReadFromTFRecord) for that.

The second step is to evaluate the TFMA results. We use TFMA's [`ExtractEvaluateAndWriteResults`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/ExtractEvaluateAndWriteResults) API, a [`PTransform`](https://beam.apache.org/documentation/programming-guide/#transforms) that takes in an `EvalSharedModel`, computes metrics for the slices specified in `slice_spec`, and writes them to an `output_path`.

[`slice_spec`](https://www.tensorflow.org/tfx/tutorials/model_analysis/tfma_basic#slicing_and_dicing) is how TFMA decides how to group the data. In this case study, the slices refer to different identity groups. We'll show you how to create a `slice_spec` in the next section.

Check the [Get Started with TensorFlow Model Analysis](https://www.tensorflow.org/tfx/model_analysis/get_started) tutorial for more information.

In [0]:
def get_eval_result(input_file, eval_shared_model,
                    slice_spec, eval_result_path):
  with beam.Pipeline() as pipeline:
    _ = (
        pipeline
        | 'ReadFromTFRecord' >> beam.io.ReadFromTFRecord(
            file_pattern=input_file)
        | 'ExtractEvaluateAndWriteResults' >>
        tfma.ExtractEvaluateAndWriteResults(
                  eval_shared_model=eval_shared_model,
                  slice_spec=slice_spec,
                  compute_confidence_intervals=False,
                  output_path=eval_result_path)
    )
  return tfma.load_eval_result(output_path=eval_result_path)

# Putting it all Together

In [0]:
def embedding_fairness_result(embedding):

  # First, we use our train_classifier() function to train a basic classifier
  # with our chosen embedding.
  print("Training classifier for " + embedding)
  classifier = train_classifier(embedding)

  # Next, we measure the accuracy of our classifier on our validation set.
  train_eval_result = classifier.evaluate(input_fn=lambda: input_fn(validate_tf_file))
  print('Validation set accuracy for {}: {accuracy}'.format(embedding, **train_eval_result))

  # We then create a fairness_indicators callback to use in our TFMA model.
  # `labels_key` is the target feature ("toxicity" in our case).
  # `thresholds` are the values of the target feature at which to measure fairness metrics.
  fairness_indicator_callback = tfma.post_export_metrics.fairness_indicators(
                                    thresholds=[0.1, 0.3, 0.5, 0.7, 0.9],
                                    labels_key=LABEL)

  # We then use our create_tfma_model() function, which converts our classifier
  # to a TFMA EvalSharedModel that outputs Fairness Indicators.
  eval_shared_model = create_tfma_model(classifier, eval_input_receiver_fn,
                                        [fairness_indicator_callback])

  # We select the slices we want to compute Fairness results for.
  # In this case, we use the same identity terms that you selected at the
  # beginning of the colab.
  slice_spec = [tfma.slicer.SingleSliceSpec()]
  for identity in IDENTITY_TERMS:
    slice_spec.append(tfma.slicer.SingleSliceSpec(columns=[identity]))

  # We also need to create a unique path to store our results for this embedding.
  embedding_name = embedding.split('/')[-2]
  eval_result_path = os.path.join(BASE_DIR, 'eval_result', embedding_name)

  # Finally, we use our get_eval_result() function to compute and return the
  # Fairness Indicators results!
  eval_result = get_eval_result(validate_tf_file, eval_shared_model, slice_spec, eval_result_path)
  return eval_result

# Run TFMA & Fairness Indicators

## Fairness Indicators Metrics

Refer [here](https://github.com/tensorflow/fairness-indicators) for more information on Fairness Indicators. Below are some of the available metrics.

* [Negative Rate, False Negative Rate (FNR), and True Negative Rate (TNR)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#False_positive_and_false_negative_rates)
* [Positive Rate, False Positive Rate (FPR), and True Positive Rate (TPR)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#False_positive_and_false_negative_rates)
* [Accuracy](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Accuracy)
* [Precision and Recall](https://en.wikipedia.org/wiki/Precision_and_recall)
* [Precision-Recall AUC](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/AUC)
* [ROC AUC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve)

## Text Embeddings

**[TF-Hub](https://www.tensorflow.org/hub)** provides several **text embeddings**. These embeddings will serve as the feature column for our different models. For this Colab, we use the following embeddings:

* [**random-nnlm-en-dim128**](https://tfhub.dev/google/random-nnlm-en-dim128/1): random text embeddings, this serves as a convenient baseline.
* [**nnlm-en-dim128**](https://tfhub.dev/google/nnlm-en-dim128/1): a text embedding based on [A Neural Probabilistic Language Model](http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf). 
* [**universal-sentence-encoder**](https://tfhub.dev/google/universal-sentence-encoder/2): a text embedding based on [Universal Sentence Encoder](https://arxiv.org/pdf/1803.11175.pdf).

## Fairness Indicator Results

For each of the above embeddings, we will compute fairness indicators with our `embedding_fairness_result` pipeline, and then render the results in the Fairness Indicator UI widget with `widget_view.render_fairness_indicator`.

Note that the `widget_view.render_fairness_indicator` cells may need to be run twice for the visualization to be displayed.

#### Random NNLM

In [0]:
eval_result_random_nnlm = embedding_fairness_result('https://tfhub.dev/google/random-nnlm-en-dim128/1')

In [0]:
widget_view.render_fairness_indicator(eval_result_random_nnlm)

##### NNLM

In [0]:
eval_result_nnlm = embedding_fairness_result('https://tfhub.dev/google/nnlm-en-dim128/1')

In [0]:
widget_view.render_fairness_indicator(eval_result_nnlm)

##### Universal Sentence Encoder

In [0]:
eval_result_use = embedding_fairness_result('https://tfhub.dev/google/universal-sentence-encoder/2')

In [0]:
widget_view.render_fairness_indicator(eval_result_use)

## Exercises
1. Pick an identity category, such as religion or sexual orientation, and look at False Positive Rate for the Universal Sentence Encoder. How do different slices compare to each other? How do they compare to the Overall baseline?
2. Now pick a different identity category. Compare the results of this category with the previous one. Does the model weigh one category as more "toxic" than the other? Does this change with the embedding used?
3. Does the model generally tend to overestimate or underestimate the number of toxic comments?
4. Look at the graphs for different fairness metrics. Which metrics seem most informative? Which embeddings perform best and worst for that metric?

