# Fairness Indicators on TF-Hub Text Embeddings

In this colab, you will learn how to use Fairness Indicators. Fairness Indicators a suite of tools built on top of [TensorFlow Model Analysis](https://www.tensorflow.org/tfx/model_analysis/get_started) that facilitates evaluation and visualization of fairness metrics on models.


## Read Me First

This colab is presently designed for **Python 2**.

In [0]:
%tensorflow_version 1.x
!pip install fairness-indicator

In [0]:
import sys
print(sys.version)

# Imports

In [0]:
import os
import tempfile
import apache_beam as beam
from datetime import datetime
import numpy as np
import pandas as pd
import tensorflow_hub as hub
import tensorflow as tf
import tensorflow_model_analysis as tfma
from tensorflow_model_analysis.addons.fairness.view import widget_view
from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators  # must include to generate the post_export_metrics callback.

# Defining Constants

In [0]:
BASE_DIR = tempfile.gettempdir()

# The input and output features of the classifier
TEXT_FEATURE = 'comment_text'
LABEL = 'toxicity'

# Data

In this exercise, you'll work with the [Civil Comments dataset](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), approximately 2 million public comments made public by the [Civil Comments platform](https://github.com/reaktivstudios/civil-comments) in 2017 for ongoing research. This effort was sponsored by Jigsaw, who have hosted competitions on Kaggle to help classify toxic comments as well as minimize unintended model bias.

Each individual text comment in the dataset has a toxicity label, with the label being 1 if the comment is toxic and 0 if the comment is non-toxic. Within the data, a subset of comments are labeled with a variety of identity attributes, including categories for gender, sexual orientation, religion, and race or ethnicity.

In [0]:
train_tf_file = tf.keras.utils.get_file('train.tf', 'https://storage.googleapis.com/civil_comments_dataset/train.tfrecord')
validate_tf_file = tf.keras.utils.get_file('validate.tf', 'https://storage.googleapis.com/civil_comments_dataset/validate.tfrecord')

## Identity Terms

You can select the subset of identity groups you are interested in by removing the others from the list below. By default, we will look at all identity terms.

In [0]:
IDENTITY_TERMS = ['gender', 'sexual_orientation', 'race', 'religion', 'disability']

# Creating a Pipeline to Compare Text Embeddings

Let's compare the performance of models built with different text embeddings. First, we need to create a model.

## Input Function

TensorFlow parses features from data using [`FixedLenFeature`](https://www.tensorflow.org/api_docs/python/tf/io/FixedLenFeature) and [`VarLenFeature`](https://www.tensorflow.org/api_docs/python/tf/io/VarLenFeature). So to allow TensorFlow to parse our data, we will need to map out our input feature, output feature, and any slicing features that we will want to analyze via Fairness Indicators.

In [0]:
FEATURE_MAP = {
    # input and output features
    LABEL: tf.FixedLenFeature([], tf.float32),
    TEXT_FEATURE: tf.FixedLenFeature([], tf.string),

    # slicing features
    'sexual_orientation': tf.VarLenFeature(tf.string),
    'gender': tf.VarLenFeature(tf.string),
    'religion': tf.VarLenFeature(tf.string),
    'race': tf.VarLenFeature(tf.string),
    'disability': tf.VarLenFeature(tf.string)
}

Now that we have defined our features and their types, we can create an input function for our model.

In [0]:
def input_fn(tf_file):
  def parse_function(serialized):
    parsed_example = tf.io.parse_single_example(
        serialized=serialized, features=FEATURE_MAP)
    # Adds a weight column to deal with unbalanced classes.
    parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)
    return (parsed_example,
            parsed_example[LABEL])
  train_dataset = tf.data.TFRecordDataset(
      filenames=[tf_file]).map(parse_function).batch(512)
  return train_dataset

## Classifier

For each text embedding, we will train a **[DNN Classifier](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier)**.

**TF Hub** allows us to insert text embeddings as features to our model via **[`text_embedding_column`](https://www.tensorflow.org/hub/api_docs/python/hub/text_embedding_column)**. The function's signature is **`text_embedding_column(key, module_spec)`**, where...

* *`key`* is the name of the DataFrame's text feature (ex: `"comment_text"`)
* *`module_spec`* is a url path to an text embedding module (ex: `"https://tfhub.dev/google/nnlm-en-dim128/1"`)

Because each text embedding column is memory-intensive, the Colaboratory environment may crash if all embeddings are loaded at once. To avoid this, we encapsulate the embedding columns inside a pipeline and wait to get the pipeline's results before loading the next embedding.

In [0]:
def train_classifier(embedding):
  embedded_text_feature_column = hub.text_embedding_column(
      key=TEXT_FEATURE, 
      module_spec=embedding)
  model_dir = os.path.join(BASE_DIR, 'train', datetime.now().strftime(
    "%Y%m%d-%H%M%S"))
  classifier = tf.estimator.DNNClassifier(
      hidden_units=[500, 100],
      weight_column='weight',
      feature_columns=[embedded_text_feature_column],
      n_classes=2,
      optimizer=tf.train.AdagradOptimizer(learning_rate=0.003),
      model_dir= model_dir)
  classifier.train(input_fn=lambda: input_fn(train_tf_file), steps=1000);
  return classifier

## Fairness Indicators in TFMA

To analyze our model's results with **Fairness Indicators**, we need to add Fairness Indicators as a callback that is returned after model execution.

To do this, we use [TFMA]((https://www.tensorflow.org/tfx/model_analysis/get_started))'s **[`EvalSharedModel`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/types/EvalSharedModel)** class. `EvalSharedModel` builds on TFMA's **[`EvalSavedModel`](https://g3doc.corp.google.com/intelligence/lantern/tensorflow_model_analysis/g3doc/faq.md#what-is-an-evalsavedmodel)**, so we first need to convert our `tf.estimator` to an `EvalSavedModel`.

EvalSavedModels parse [`tf.Examples`](https://www.tensorflow.org/tutorials/load_data/tfrecord#tfexample) with an **`eval_input_receiver_fn`**, so we also need to create that.

In [0]:
def eval_model_with_fairness_indicators(classifier):
  eval_saved_model_path = tfma.export.export_eval_savedmodel(
      estimator=classifier,
      export_dir_base=os.path.join(BASE_DIR, 'tfma_eval_model'),
      eval_input_receiver_fn=eval_input_receiver_fn)
  fairness_indicator_callback = tfma.post_export_metrics.fairness_indicators(
                                    thresholds=[0.1, 0.3, 0.5, 0.7, 0.9],
                                    labels_key=LABEL)
  return tfma.default_eval_shared_model(
        eval_saved_model_path=eval_saved_model_path,
        add_metrics_callbacks=[fairness_indicator_callback])
  
def eval_input_receiver_fn():
  """Create a tfma.export.EvalInputReceiver to parse input features."""
  serialized_tf_example = tf.compat.v1.placeholder(
      dtype=tf.string, shape=[None], name='input_example_placeholder')
  receiver_tensors = {'examples': serialized_tf_example}
  features = tf.parse_example(serialized_tf_example, FEATURE_MAP)
  features['weight'] = tf.ones_like(features[LABEL])
  return tfma.export.EvalInputReceiver(
    features=features,
    receiver_tensors=receiver_tensors,
    labels=features[LABEL])

## TFMA - Apache Beam

Now that we have our evaluation model with a callback to the Fairness Indicators results, we can create a function that computes and returns those results!

TFMA is built on the **[Apache Beam](https://beam.apache.org/documentation/programming-guide/)**, data processing framework. TFMA provides the [`ExtractEvaluateAndWriteResults`](https://www.tensorflow.org/tfx/model_analysis/api_docs/python/tfma/ExtractEvaluateAndWriteResults) API to use as a [`PTransform`](https://beam.apache.org/documentation/programming-guide/#transforms) in Beam pipelines. Check the [Get Started with TensorFlow Model Analysis](https://www.tensorflow.org/tfx/model_analysis/get_started) tutorial for more information.

In [0]:
def get_eval_result(embedding, eval_shared_model, eval_result_path):
  slice_spec = [tfma.slicer.SingleSliceSpec()]
  for identity in IDENTITY_TERMS:
    slice_spec.append(tfma.slicer.SingleSliceSpec(columns=[identity]))
  with beam.Pipeline() as pipeline:
    _ = (
        pipeline
        | 'ReadFromTFRecord' >> beam.io.ReadFromTFRecord(
            file_pattern=validate_tf_file)
        | 'ExtractEvaluateAndWriteResults' >>
        tfma.ExtractEvaluateAndWriteResults(
                  eval_shared_model=eval_shared_model,
                  slice_spec=slice_spec,
                  compute_confidence_intervals=False,
                  output_path=eval_result_path)
    )
  return tfma.load_eval_result(output_path=eval_result_path)

# Evaluate Embedding

Now that we have all the steps in place - training a model on an embedding, converting it to a TFMA format, and computing Fairness Indicators on it - we can make a pipeline to compare Fairness Indicators on different embeddings!

In [0]:
def embedding_eval_result(embedding):

  # First, we use the train_classifier() function we created earlier to train a
  # basic classifier using our chosen embedding.
  print("Training classifier for " + embedding)
  classifier = train_classifier(embedding)

  # Next, we measure the accuracy of our classifier on our validation set.
  train_eval_result = classifier.evaluate(input_fn=lambda: input_fn(validate_tf_file))
  print('Validation set accuracy for {}: {accuracy}'.format(embedding, **train_eval_result))

  # We then use our eval_model_with_fairness_indicators() function to convert
  # the model to a TFMA EvalSharedModel with a callback to Fairness Indicators.
  eval_shared_model = eval_model_with_fairness_indicators(classifier)

  # We also need to create a unique path to store our results for this
  # embedding.
  embedding_name = embedding.split('/')[-2]
  eval_result_path = os.path.join(BASE_DIR, 'eval_result', embedding_name)

  # Finally, we use our get_eval_result() function to compute and return the
  # Fairness Indicators results!
  eval_result = get_eval_result(embedding, eval_shared_model, eval_result_path)
  return eval_result

# Run TFMA & Fairness Indicators

## Text Embeddings

**[TF-Hub](https://www.tensorflow.org/hub)** provides several **text embeddings**. These embeddings will serve as the feature column for our different models. For this Colab, we use the following embeddings:

* [**random-nnlm-en-dim128**](https://tfhub.dev/google/random-nnlm-en-dim128/1): random text embeddings, this serves as a convenient baseline.
* [**nnlm-en-dim128**](https://tfhub.dev/google/nnlm-en-dim128/1): a text embedding based on [A Neural Probabilistic Language Model](http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf). 
* [**universal-sentence-encoder**](https://tfhub.dev/google/universal-sentence-encoder/2): a text embedding based on [Universal Sentence Encoder](https://arxiv.org/pdf/1803.11175.pdf).



## Fairness Indicators Results

Refer [here](https://github.com/tensorflow/fairness-indicators) for more information on analyzing data with Fairness Indicators. Below are some of the available metrics.

* [Negative Rate, False Negative Rate (FNR), and True Negative Rate (TNR)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#False_positive_and_false_negative_rates)
* [Positive Rate, False Positive Rate (FPR), and True Positive Rate (TPR)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#False_positive_and_false_negative_rates)
* [Accuracy](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Accuracy)
* [Precision and Recall](https://en.wikipedia.org/wiki/Precision_and_recall)
* [Precision-Recall AUC](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/AUC)
* [ROC AUC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve)

Note that the `widget_view.render_fairness_indicator()` cells may need to be run twice for the visualization to be displayed.

#### Random NNLM

In [0]:
eval_result_random_nnlm = embedding_eval_result('https://tfhub.dev/google/random-nnlm-en-dim128/1')

In [0]:
widget_view.render_fairness_indicator(eval_result_random_nnlm)

##### NNLM

In [0]:
eval_result_nnlm = embedding_eval_result('https://tfhub.dev/google/nnlm-en-dim128/1')

In [0]:
widget_view.render_fairness_indicator(eval_result_nnlm)

##### Universal Sentence Encoder

In [0]:
eval_result_use = embedding_eval_result('https://tfhub.dev/google/universal-sentence-encoder/2')

In [0]:
widget_view.render_fairness_indicator(eval_result_use)

## Exercises
1. Pick an identity category, such as religion or sexual orientation, and look at False Positive Rate for the Universal Sentence Encoder. How do different slices compare to each other? How do they compare to the Overall baseline?
2. Now pick a different identity category. Compare the results of this category with the previous one. Does the model weigh one category as more "toxic" than the other? Does this change with the embedding used?
3. Does the model generally tend to overestimate or underestimate the number of toxic comments?
4. Look at the graphs for different fairness metrics. Which metrics seem most informative? Which embeddings perform best and worst for that metric?

