## Analyzing bias web communities using word embeddings
DATA 512 Project Report

#### Introduction

This project is an attempt to understand what word embedding models can tell us about the sources from which the models were trained. It build on prior research on evaluating binary gender bias in word embeddings, using visualizations to compares several pre-trained word embeddings. The aim of the project is to understand if it is possible to perform reverse inference from the model bias to dataset bias and check if so it's possible to draw any conclusions about the biases of the datasets. In this analysis, I've compared two independent sets of models, Facebook's fasttext models and Stanford university's Glove models. Given the complexity of training a complete word embedding model from scratch, this project uses pre-trained word embeddings. As discussed in the limitations section below, this is potentially a source of noise which makes interpreting the results harder than expected. The 'future work' section suggests a few potential solutions which might improve on this analysis.

#### Backgroud / Prior Work

Word embeddings are a way to encode the semantic structure of words using a high-dimensional vector space. Each word is mapped to a real valued vector such that words that tend to co-occur in a sentence tend to have similar vectors
Is able to capture interesting semantic structure

Some common examples used to descibe the expressiveness of word embedding models are:

1. vec(King) - vec(man) + vec(woman) ~= vec(Queen)
2. vec(Mom) - vec(Dad) ~= vec(Grandma) - vec(Grandpa) ~= vec(Her) - vec(He)

The second example above shows an example of an 'bias axis', which is used in this analysis. A bias axis is a pair of words (for example 'he' and 'she'). A more robust estimator for the bias axis (as described in the paper https://arxiv.org/pdf/1607.06520.pdf) is to collect a set of many word pairs that represent gender, and to compute the first principal component of their differences. An example of one such set of words is at (https://github.com/tolga-b/debiaswe/blob/master/data/definitional_pairs.json), (licensed under MIT, and collected by Amazon mechanical task workers)

#### Methods

The project contains two sets of comparisions:

- between glove models on common crawl and twitter text
- between fasttext models on wikipedia and common crawl text

Note that all comparisions are beteen models withing the same set (either fasttext or glove). The analysis avoids any comparision between a fasttext model and a glove model, since the algorithm being different could introducee a bias into the word embeddings that might not be representative of the source text.

However, even keeping the type of model constant, there are other factors that might make affect the results and conclusions of this project - The exact type of preprocessing on the text and the model parameters. All of which need to be kept constant across the models in each set. For this report, I've tried to pick out pre-trainied models that as as close to each other as possible, but given that the authors have not published any information on the exact parameeters used, it is possible that they might be different.

As described in the limitations section at the bottom of the report, given enough time, manually retraining each model from scratch, with the same parameters and pre-processing would be ideal. Since this was not possible at this time, this report should be considered more of a experiment on if model based comparisiions are possible (and not that the exact conclusions drawn from visualizing bias in the set of pre-trained models I've chosen is representative of the sources.)

In [4]:
import pickle
import numpy as np 
import pandas as pd
from scipy import spatial

from tqdm import tqdm
from collections import namedtuple

#### Data load

The pre-trained vector files are dowloaded into the models folder. These files are linked below and must be dowloaded and extracted beforere the script can be run.

The models are located at 

Glove:
- Large common crawl dataset: http://nlp.stanford.edu/data/glove.840B.300d.zip
- Small common crawl dataset: http://nlp.stanford.edu/data/glove.42B.300d.zip
- Twitter dataset: http://nlp.stanford.edu/data/glove.twitter.27B.zip

The above links are distributed from the site (https://nlp.stanford.edu/projects/glove/), under the er the Public Domain Dedication and License v1.0 (http://opendatacommons.org/licenses/pddl/)

Fasttext:
- Common crawl dataset: https://s3-us-west-1.amazonaws.com/fasttext-vectors/crawl-300d-2M-subword.zip
- Wikipedia dataset: https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.en.vec

The links are from the site (https://fasttext.cc) and the models are distributed under the Creative Commons Attribution-Share-Alike License 3.0. (https://creativecommons.org/licenses/by-sa/3.0/)

The analysis also uses a list of occupations, located at https://github.com/tolga-b/debiaswe/blob/master/data/professions.json, and is licensed under MIT.

In [5]:
def load_vectors(file_name, dim, has_header=True):
    """
    Reads vectors from the given file.
    has_header controls if the first line should be ignored.
    """
    vectors = []
    words = {}
    
    with open(file_name) as f:
        if has_header:
            count = int(f.readline().split()[0])
        for i, line in tqdm(enumerate(f)):
            a = line.strip().split(' ')
            vec = np.array(a[1:], dtype=np.float16)
            
            if vec.shape[0] == dim:
                words[a[0]] = i
                vectors.append(vec)
    
    return words, np.stack(vectors)

Given that the model files are large (~6Gb), The following functions parse through each file and cache the model
in the numpy format, which makes reading it into the notebook much faster.

In [6]:
def cache_model(model_path, name, dim = 300):
    """
    Caches the (words, vectors) tuple to disk for faster
    retreival. 
    """
    words, vectors = load_vectors(model_path, dim)
    np.save(f'./models/cache/{name}.vec.npy', vectors)
    with open(f'./models/cache/{name}.words.pkl', 'wb+') as f:
        pickle.dump(words, f)
        
# the datastructure we use to represent a word embdding model.
EmbeddingModel = namedtuple('EmbeddingModel', ['words', 'vectors'])

def load_cached_model(name):
    """
    Loads a model that was previously cached by cache_model
    """
    vectors = np.load(f'./models/cache/{name}.vec.npy')
    with open(f'./models/cache/{name}.words.pkl', 'rb') as f:
        words = pickle.load(f)
    
    return EmbeddingModel(words, vectors)

We pickle all the models used in this analysis at the first run. Subsequent runs of this
notebook only load in the picked varients.

In [7]:
def cache():
    """
    Helper function to cache all the models we want to use in the analysis.
    """
    cache_model('./models/fasttext/wiki.en.vec', 'wiki')
    cache_model('./models/fasttext/crawl-300d-2M-subword.vec', 'cc')    
    cache_model('./models/glove/glove.twitter.27B.200d.txt', 'twitter_glove', 200)
    cache_model('./models/glove/glove.42B.300d.txt', 'cc_42_glove')
    cache_model('./models/glove/glove.840B.300d.txt', 'cc_840_glove')
    
# cache() # This needs to be run only once.

wiki = load_cached_model('wiki')
cc = load_cached_model('cc')
glove_twitter = load_cached_model('twitter_glove')
glove_cc1 = load_cached_model('cc_42_glove')
glove_cc2 = load_cached_model('cc_840_glove')

Helper functions to fetch a vector for a word and to compute similarities between words, given a model.

In [24]:
def get_vector(model, word):
    """
    Fetchs the vector of the given word.
    Returns None if the word does not exist in the model.
    """
    if (word not in model.words):
        return None
    
    v = model.vectors[model.words[word]]
    return v / np.linalg.norm(v)

def compare_vectors(model, word_a, word_b):
    """
    Computes the cosine similarity between two words according to the
    model give.
    Returns none if either word does not exist in the model.
    """
    v1 = get_vector(model, word_a)
    v2 = get_vector(model, word_b)
    if v1 is None or v2 is None:
        return None
    
    return np.abs(1 - spatial.distance.cosine(v1, v2))    

def bias(m, axis, w, scale=False):
    """
    Computes the bias score for the word with repect to the axis specified.
    (the bias computation is as defined in [1] under the with the C=1), under
    the section 'Direct Bias'.
    
    If scale is false, this just returns the 2-D point for the word w.r.t to the
    two bias axes for plotting.
    
    [1] Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings: Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai: http://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf
    """
    a = compare_vectors(m, axis[0], w)
    b = compare_vectors(m, axis[1], w)
    
    if scale:
        if a is None or b is None:
            return None
        return np.abs(a-b)
#         f = a + b
#         if f > 0:
#             a /= f
#             b /= f
    
    return a, b

The following code block contains helper function to create Plot.ly plots.

In [46]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
%matplotlib inline
init_notebook_mode(connected=True)

def scatter(x, y, words, label=False, title='Occupations w.r.t he-she axis'):
    """
    Generates a scatter plot with the list of co-ordinates specified
    by zip(x, y)
    """
    trace = go.Scatter(
        x = x,
        y = y,
        text = words,
        mode = 'markers' + ('+text' if label else ''),
        textposition='bottom center'
    )

    data = [trace]
    layout= go.Layout(
        title= title,
        hovermode= 'closest',
        xaxis= dict(
            title= 'Similarity to "she"',
            ticklen= 5,
            zeroline= False,
            gridwidth= 2,
        ),
        yaxis=dict(
            title= 'Similarity to "he"',
            ticklen= 5,
            gridwidth= 2,
        ),
        showlegend= False
    )

    fig = go.Figure(data=data, layout=layout)
    iplot(fig)
    
def scatter_single_axis(points, words, label=True, title=""):
    """
    Creates a multiple 1-D axis char to enable comparisions between
    the speads in the bias scores of two models.
    """
    a = np.array(list(zip(*points)))
    data = []
    for r, w in zip(a, words):
        trace = go.Scatter(
            x = r[:, 0],
            y = r[:, 1],
            text = w,
            name = w,
            mode = 'markers' + ('+text' if label else ''),
            textposition='bottom center'
        )

        data.append(trace)

    layout= go.Layout(
        title= title,
        hovermode= 'closest',
        xaxis= dict(
            title= 'Bias score',
            ticklen= 5,
            zeroline= False,
            gridwidth= 2,
        ),
        yaxis=dict(
            autorange = True,
            categoryorder = "category descending",
            title = "",
            type = "category"
        ),
        showlegend= True
    )

    fig = go.Figure(data=data, layout=layout)
    iplot(fig)

#### Findings

In [11]:
print(get_vector(wiki, 'he'))
print(get_vector(wiki, 'she'))
print(get_vector(wiki, 'programmer'))

[ 0.02303   -0.01274   -0.01228    0.08746    0.014496  -0.07043
  0.04474   -0.04178   -0.08295    0.013214  -0.0367    -0.004097
 -0.002981  -0.07294   -0.00323   -0.082      0.0072     0.02426
 -0.03038    0.02136   -0.003014   0.08484    0.0233     0.0079
 -0.0921    -0.1343    -0.11536   -0.05005    0.0615     0.1045
 -0.047      0.007637  -0.01593    0.0318    -0.0626     0.11743
  0.00637    0.02963   -0.0533    -0.07227   -0.1285    -0.06335
  0.013885  -0.0055     0.0724     0.0312     0.03174   -0.00494
 -0.0394    -0.0813    -0.02652   -0.061      0.01103    0.05896
  0.03008   -0.0385     0.0519    -0.02095    0.0755    -0.00399
  0.0364     0.10443    0.04706    0.00819    0.03287    0.005127
  0.1263     0.0985    -0.04944   -0.04214    0.0545     0.00251
  0.1456     0.0836    -0.05157    0.05185    0.01952    0.000504
 -0.1051    -0.02686   -0.00909    0.0404     0.08655    0.02489
 -0.00826   -0.03796    0.0087     0.02452   -0.02812   -0.0532
  0.085      0.02737    0

In [18]:
(compare_vectors(wiki, 'he', 'programmer'), 
compare_vectors(wiki, 'she', 'programmer'))

(0.208740234375, 0.1787109375)

In [19]:
(compare_vectors(cc, 'he', 'programmer'), 
compare_vectors(cc, 'she', 'programmer'))

(0.321533203125, 0.26611328125)

The comparision above already shows, that both the models associate the word 'programmer' with the word 'he'
more than the word 'she'.

Also, it is interesting that the common crawl model assigns a higher similarity in both case, but the magnitude of the difference shows that there is larger spread. The word programmer is not equidistant from the words he and she, and is more skewed in the common crawl model than the wikipedia model

In [14]:
profs = pd.read_json('./data/professions.json')
profs.head()

Unnamed: 0,0,1,2
0,accountant,0.0,0.4
1,acquaintance,0.0,0.0
2,actor,0.8,0.0
3,actress,-1.0,0.0
4,adjunct_professor,0.0,0.5


### Research question:

How do web communities differ in their gender biases? 
 - Comparing Wikipedia to Common crawl.
 - Comparing Twitter to Common crawl.

In [52]:
points = np.array([bias(wiki, ['she', 'he'], w) for w in profs[0].values])
scatter(points[:, 0], points[:, 1], profs[0].values, label = True, title='Fasttext Wikipedia')

If the above plot is not visible on github, please use the following link:
(github does not render plot.ly graphs)

http://nbviewer.jupyter.org/github/viv-r/Data512-HCDS-Final-Project/blob/master/Report.ipynb

The above visualizations show all occupations along with their similarities to both the axes.
If the model were perfectly unbiased, we would expect all the words to lie the x=y line through the origin.

The spread of the points around this line is an indication of bias, and for the wikipedia data, most of the words seem clustered at about the same location around (0.15, 0.12), which shows a slight bias towards 'she' for the list of occupations we've chosen.

In [30]:
points = np.array([bias(cc, ['she', 'he'], w) for w in profs[0].values])
scatter(points[:, 0], points[:, 1], profs[0].values, label= True, title='Fasttext Common crawl')

Similar plots for the fasttextt model trained on the Common craw data set shows the similarites have larger magnitudes in general. The cluster center in this case is very close to the x=y line suggesting that most of the occupations we've chosen are equally biased towards 'he' and 'she'

In [32]:
points = np.array([bias(glove_twitter, ['she', 'he'], w) for w in profs[0].values])
scatter(points[:, 0], points[:, 1], profs[0].values, label= True, title='Glove Twitter')

In [34]:
points = np.array([bias(glove_cc1, ['she', 'he'], w) for w in profs[0].values])
scatter(points[:, 0], points[:, 1], profs[0].values, label= True, title='Glove Common crawl 1')

In [35]:
points = np.array([bias(glove_cc2, ['she', 'he'], w) for w in profs[0].values])
scatter(points[:, 0], points[:, 1], profs[0].values, label= True, title='Glove Common crawl 2')

The 3 plots above for the glove models show something different to the fast text models. The common crawl models both look relatively similar to each other. The twitter model looks clearly different to the other two, and seems to have two clusters - one close to the origin and one located at approximately (3.5,3). This suggests that, a subset of occupations are biased differently from the rest, but it not clear as to why this is so. 

### Computing bias scores for words

This section contains plots comparing two models based on the magnitude of difference between the similarites of the word to the bias axis.

In [47]:
profs_subset = ['physician', 'boss', 'programmer', 'adventurer', 'trader', 'dancer', 'housekeeper', 'socialite']
glove_models = [('glove_twitter', glove_twitter), ('glove_cc1', glove_cc2), ('glove_cc2', glove_cc2)]
ft_models = [('wiki', wiki), ('cc', cc)]

points = np.array([[[bias(m, ['she', 'he'], w, scale=True), i] for w in profs_subset] for i, m in glove_models[:-1]])
scatter_single_axis(points, profs_subset)

In [48]:
points = np.array([[[bias(m, ['she', 'he'], w, scale=True), i] for w in profs[0].values] for i, m in ft_models])
scatter_single_axis(points, profs_subset)

While I expected the above visualizations to help understand if the models agree on the magnitude of bias, the results are not clear for most of the words due to overlap. Coming up with a better way to visualize this is listed in the 'future improvements' section below. Words like 'adventurer' and 'socialite' are placed at opposite ends in different models in each set, which could suggest that there is a significant different in how these communities use these words. 

#### Limitations

Comparing machine learning models is a hard problem, there are many factors that affect what a model learns and keeping all of them consistent while varying just the data can be a challenge.

- Pre-trained models are good for prototyping but ideally we’d want to train models from scratch just to ensure all the model parameters are being held constant. (so that the bias introduced by the model itself is held constant across datasets)
- Original scope was to compare the models on multiple types of bias (religion, race, and gender), but I’ve had to reduce the scope to only gender with a binary gender model
- Identification of bias axis is hard.
  In this project I’ve used 'he' and 'she' as the bias axis.
  A more general approach would be using the data from
  data from https://github.com/tolga-b/debiaswe/tree/master/data
  the authors have crowdsourced word pairs that define the binary gender axis.
  However, in general, this can be subjective and hard to define.


#### Future work

In the future, I would like to extend the comparisions performed in this notebook to other types of biases: religion, race, etc. In addition, it would also be interesting to explore if it is possible to use a non-binary bias axis to compare words against. It would also make the comparisions and results much more reliable if the models used were trained from scratch, so that it is possible to ensure constant parameters. Finally, a better way to interactively visualize the bias score for each word would make the analysis easy to understand.


#### Conclusions

The resutls in the analyis suggest that potentially significant differences exist between communities and model based comparisions might be able to extract that information. Given the fact that I had no control over the training of the exact models used in this analysis, I cannot claim that the results are conclusive. However, the methods used here could potentially be used in the following human centered applications:

- Comparing across wikipedia articles to check if the writing style in one category of pages is different from other.
- Validation of moderation policies to see if they result in changes to bias in text content, by comparing a model trained on text before introduction of policy to a model (with same parameters) trained on the text written after.






#### References

- Llorens, Marisa (2018) "Text Analytics Techniques in the Digital World: Word Embeddings and Bias," Irish Communication Review: Vol. 16: Iss. 1, Article 6. doi:10.21427/D7TJ05 Available at: https://arrow.dit.ie/icr/vol16/iss1/6
- Demographic Word Embeddings for Racism Detection on Twitter, Mohammed Hasanuzzaman, Gae ̈l Dias, Andy Way: http://www.aclweb.org/anthology/I17-1093
- Quantifying and Reducing Stereotypes in Word Embeddings, Tolga Bolukbasi Kai-Wei Chang James Zou Venkatesh Saligrama Adam Kalai: https://pdfs.semanticscholar.org/2558/231cadaf0b1a4ac79d1a5c79322c8fbd327f.pdf
- Quantifying and Reducing Gender Stereotypes in Word Embeddings: https://drive.google.com/file/d/1IxIdmreH4qVYnx68QVkqCC9-_yyksoxR/view
- Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings: Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai: http://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation.