# SISU Digital Humanities: Textual and Language Analysis on Social Media
### Session 6: Language Biases
Created by Tom van Nuenen (tom.van_nuenen@kcl.ac.uk) <br />

## Introduction
Language carries implicit biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them. Using Natural Language Processing tools, we can trace these biases in the many language datasets to be found online.

One way to discover language biases is done using word embeddings. In order to do so, we first need to postulate concepts such as "male" or "female", both of which include a number of word vectors. Using these so-called *target concepts*, we can then compute relative similarities of other word vectors – particularly, words that act as evaluative attributes such as "strong" and "sensitive". 

These words can be categorised through clustering algorithms and labeled through a semantic analysis system into more general (conceptual) biases, yielding a broad picture of the biases present in a discourse community.

See https://xfold.github.io/WE-GenderBiasVisualisationWeb/ for a web demo
and https://github.com/xfold/LanguageBiasesInReddit for the full repo.

## Training a WE model

First, we need to train our Word Embeddings model. We create a function that takes in a CSV file and applies Gensim's `simple_preprocess` method on the "body" column.It also lemmatizes the data if we want, and finally creates a Word2Vec model with parameters we can feed into the function.

In [4]:
import math
from textblob import TextBlob as tb
import nltk
from nltk.corpus import stopwords
import re
# Current notebook only works with Gensim v3 - e.g. !pip install gensim==3.8.1
import gensim 
import pandas as pd
import logging
import os
import time
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()

def train_model(csv_document, csv_comment_column='body', outputname='output_model', window = 4, minf=10, epochs=100, ndim=100, lemmatiseFirst = False):
    '''
    Load the documents from document_l, a list of sentences, and train a WE model with specified
    minf, epochs and ndims. where:
    csv_document : csv document containing all information, where each comment is on a different row
    csv_comment_column : name of the column taht contains the text we want to process
    outputname : output path of the resulting model
    
    returns
    path of the trained models
    '''
    
    def preprocess_csv(path, column = 'body'):
        df_com = pd.read_csv(path, lineterminator='\n')

        documents = []
        for i, row in enumerate(df_com[column]):
            if i%500000 == 0:
                print('\t...processing line {}'.format(i))
            try:
                pp = gensim.utils.simple_preprocess (row)
                if(lemmatiseFirst == True):
                    pp = [wordnet_lemmatizer.lemmatize(w, pos="n") for w in pp]
                documents.append(pp)
            except:
                print('\terror with row {}'.format(row))

        logging.info ("Done reading and preprocessing data file {} ".format(path))
        return documents

    def train_WE_model(documents, outputfile, ndim, window, minfreq, epochss):
        '''
        size
        The size of the dense vector to represent each token or word. If you have very limited data, then size should be a much smaller
        value. If you have lots of data, its good to experiment with various sizes. A value of 100-150 has worked well for me.

        window
        The maximum distance between the target word and its neighboring word. If your neighbor's position is greater than the maximum 
        window width to the left and the right, then, some neighbors are not considered as being related to the target word. In theory, a 
        smaller window should give you terms that are more related. If you have lots of data, then the window size should not matter too 
        much, as long as its a decent sized window.

        min_count
        Minimium frequency count of words. The model would ignore words that do not statisfy the min_count. Extremely infrequent words are 
        usually unimportant, so its best to get rid of those. Unless your dataset is really tiny, this does not really affect the model.

        workers
        How many threads to use behind the scenes?
        '''
        starttime = time.time()
        print('->->Starting training model {} with dimensions:{}, minf:{}, epochs:{}'.format(outputfile,ndim, minfreq, epochss))
        model = gensim.models.Word2Vec (documents, size=ndim, window=window, min_count=minfreq, workers=5)
        model.train(documents,total_examples=len(documents),epochs=epochss)
        model.save(outputfile)
        print('->-> Model saved in {}'.format(outputfile))
    
    print('->Starting with {} [{}], output {}, window {}, minf {}, epochs {}, ndim {}'.format(csv_document, 
                                                                                       csv_comment_column,
                                                                                       outputname, window, minf, epochs, ndim))
    docs = preprocess_csv(csv_document, csv_comment_column)
    starttime = time.time()
    ofile = outputname
    print('-> Output will be saved in {}'.format(ofile))
    train_WE_model(docs, ofile, ndim, window, minf, epochs)
    print('-> Model creation ended in {} seconds'.format(time.time()-starttime))


In [5]:
print(gensim.__version__)

3.8.3


This function has been created to run over different CSVs and using different parameters (like we did with our topic models). Below, we create a `training_setup` dictionary that can include multiple CSV files and parameters. This makes it a bit easier to replicate the process. For now, we've entered one CSV file: the one we have loaded. We will save the output of our function – the Word Embeddings model – in a file as well.

In [6]:
training_setup = [
    {'csvfile': "data/TRP-comments.csv", 'output_file': 'trp_w4_f10_e100_d200.model', 'w':4, 'minf': 2, 'epochs':100 ,'ndim':100}
]

for setup in training_setup:
        train_model(setup['csvfile'], 
        outputname = setup['output_file'],
        window = setup['w'],
        minf = setup['minf'],
        epochs = setup['epochs'],
        ndim = setup['ndim']
        )
 

->Starting with data/TRP-comments.csv [body], output trp_w4_f10_e100_d200.model, window 4, minf 2, epochs 100, ndim 100
	...processing line 0
-> Output will be saved in trp_w4_f10_e100_d200.model
->->Starting training model trp_w4_f10_e100_d200.model with dimensions:100, minf:2, epochs:100




->-> Model saved in trp_w4_f10_e100_d200.model
-> Model creation ended in 35.668002128601074 seconds


## Load Model and get biased words

We now run our method of finding biased words towards our target sets.

Given a vocabulary and two sets of target words (such as, in this case, those for *women* and *men*, we rank the words from least to most biased. As such, we obtain two ordered lists of the most biased words towards each target set, obtaining an overall view of the bias distribution in that particular community with respect to those two target sets. 

Here's what happening in the next block of code:
- We calculate the centroid of a target set by averaging the embedding vectors in our target set (e.g. the vectors for `he, son, his, him, father, male` for our target concept `male`);
- We calculate the cosine similarity between the vectors for all words in our vocabulary as compared to our two centroids (we also apply POS-filtering to only work with parts of speech we expect to be relevant);
- We use a threshold based on standard deviation to determine how severe a bias needs to be before we include it;
- We rank the words in the vocabulary of our Word Embeddings model based on their bias towards either target concept.


In [7]:
from gensim.models import Word2Vec
from gensim.test.utils import datapath, get_tmpfile
from gensim.models import KeyedVectors
from gensim.scripts.glove2word2vec import glove2word2vec
from operator import itemgetter
from scipy import spatial
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('averaged_perceptron_tagger')
nltk.download('vader_lexicon')
import inflect
import numpy as np
import statistics
import json
import itertools

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from datetime import datetime
import statistics

def _calculate_centroid(model, wordlist):
    '''
    Calculate centroid of the wordlist list of words based on the model embedding vectors
    '''
    centr = np.zeros( len(model.wv[wordlist[0]]) )
    for w in wordlist:
        centr += np.array(model.wv[w])
    return centr/len(wordlist)

def _keep_only_model_words(model, words):
    aux = [ word for word in words if word in model.wv.vocab.keys()]
    return aux

def _get_word_freq(model, word):
    if word in model.wv.vocab:
        wm = model.wv.vocab[word]
        return [word, wm.count, wm.index]
    return None

def _get_model_min_max_rank(model):
    minF = 999999
    maxF = -1
    for w in model.wv.vocab:
        wm = model.wv.vocab[w] #wm.count, wm.index
        rank = wm.index
        if(minF>rank):
            minF = rank
        if(maxF<rank):
            maxF = rank
    return [minF, maxF]

sid = SentimentIntensityAnalyzer()
def _get_sentiment(word):
    return sid.polarity_scores(word)['compound']

'''
Normalises a value in the positive space
'''    
def _normalise(val, minF, maxF):
    #print(val, minF, maxF)
    if(maxF<0 or minF<0 or val<0):
        raise Exception('All values should be in the positive space. minf: {}, max: {}, freq: {}'.format(minF, maxF, val))
    if(maxF<= minF):
        raise Exception('Maximum frequency should be bigger than min frequency. minf: {}, max: {}, freq: {}'.format(minF, maxF, freq))
    val -= minF
    val = val/(maxF-minF)
    return val

def _get_cosine_distance(wv1, wv2):
    return spatial.distance.cosine(wv1, wv2)

def _get_min_max(dict_value):
    l = list(dict_value.values())
    return [ min(l), max(l)]

def _find_stdev_threshold_sal(dwords, stdevs):
    '''
    dword is an object like {'word':w, 'bias':bias, 'biasW':biasW, 'freq':freq, 'freqW':freqW, 'sal':val, 'wv':wv, 'sent':sent }
    stdevs : minimum stdevs for which we want to compute the threshold

    returns
    outlier_thr : the threshold correpsonding to stdevs considering salience values from the dwrods object list
    '''
    allsal = []
    for obj in dwords:
        allsal.append(obj['sal'])
    stdev = statistics.stdev(allsal)
    outlier_thr = (stdev*stdevs)+sum(allsal)/len(allsal)
    return outlier_thr

def calculate_biased_words(model, targetset1, targetset2, stdevs, 
                         acceptedPOS = ['JJ', 'JJS', 'JJR','NN', 'NNS', 'NNP', 'NNPS','VB', 'VBG', 'VBD', 'VBN', 'VBP', 'VBZ' ], 
                         words = None, force=False):
    '''
    this function calculates the list of biased words towards targetset1 and taregset2 with salience > than the 
    specified times (minstdev) of standard deviation.

    targetset1 <list of strings> : target set 1
    targetset2 <list of strings> : target set 2
    minstdev int : Minium threhsold for stdev to select biased words
    acceptedPOS <list<str>> : accepted list of POS to consider for the analysis, as defined in NLTK POS tagging lib. 
                              If None, no POS filtering is applied and all words in the vocab are considered
    words list<str> : list of words we want to consider. If None, all words in the vocab are considered
    '''
    if(model is None):
        raise Exception("You need to define a model to estimate biased words.")
    if(targetset1 is None or targetset2 is None):
        raise Exception("Target sets are necessary to estimate biased words.")
    if(stdevs is None):
        raise Exception("You need to define a minimum threshold for standard deviation to select biased words.")
   
    tset1 = _keep_only_model_words(model, targetset1) # remove target set words that do not exist in the model
    tset2 = _keep_only_model_words(model, targetset2) # remove target set words that do not exist in the model

    # We remove words in the target sets, and also their plurals from the set of interesting words to process.
    engine = inflect.engine()
    toremove = targetset1 + targetset2 + [engine.plural(w) for w in targetset1] + [engine.plural(w) for w in targetset2]
    if(words is None):
        words = [w for w in model.wv.vocab.keys() if w not in toremove]

    # Calculate centroids 
    tset1_centroid = _calculate_centroid(model, tset1)
    tset2_centroid = _calculate_centroid(model, tset2)
    [minR, maxR] = _get_model_min_max_rank(model)

    # Get biases for words
    biasWF = {}
    biasWM = {}
    for i, w in enumerate(words):
        p = nltk.pos_tag([w])[0][1]
        if acceptedPOS is not None and p not in acceptedPOS:
            continue
        wv = model.wv[w]
        diff = _get_cosine_distance(tset2_centroid, wv) - _get_cosine_distance(tset1_centroid, wv)
        if(diff>0):
            biasWF[w] = diff
        else:
            biasWM[w] = -1*diff

    # Get min and max bias for both target sets, so we can normalise these values later
    [minbf, maxbf] = _get_min_max(biasWF)
    [minbm, maxbm] = _get_min_max(biasWM)

    # Iterate through all 'selected' words
    biased1 = []
    biased2 = []
    for i, w in enumerate(words):
        # Print('..Processing ', w)
        p = nltk.pos_tag([w])[0][1]
        if acceptedPOS is not None and p not in acceptedPOS:
            continue
        wv = model.wv[w]
        # Sentiment
        sent = _get_sentiment(w)
        # Rank and rank norm
        freq = _get_word_freq(model, w)[1]
        rank = _get_word_freq(model, w)[2]
        rankW = 1-_normalise(rank, minR, maxR) 

        # Normalise bias
        if(w in biasWF):
            bias = biasWF[w]
            biasW = _normalise(bias, minbf, maxbf)
            val = biasW * rankW
            biased1.append({'word':w, 'bias':bias, 'biasW':biasW, 'freq':freq, 'rank':rank, 'rankW':rankW, 'sal':val, 'wv':wv.tolist(), 'sent':sent } ) 
        if(w in biasWM):
            bias = biasWM[w]
            biasW = _normalise(bias, minbm, maxbm)
            val = biasW * rankW
            biased2.append({'word':w, 'bias':bias, 'biasW':biasW, 'freq':freq, 'rank':rank, 'rankW':rankW, 'sal':val, 'wv':wv.tolist(), 'sent':sent } ) 

    # Calculate the salience threshold for both word sets, and select the list of biased words (i.e., which words do we discard?)
    stdevs1_thr = _find_stdev_threshold_sal(biased1, stdevs)
    stdevs2_thr = _find_stdev_threshold_sal(biased2, stdevs)
    # biased1.sort(key=lambda x: x['sal'], reverse=True)
    b1_dict = {}
    for k in biased1:
        if(k['sal']>=stdevs1_thr):
            b1_dict[k['word']] = k
    # biased2.sort(key=lambda x: x['sal'], reverse=True)
    b2_dict = {}
    for k in biased2:
        if(k['sal']>=stdevs2_thr):
            b2_dict[k['word']] = k

    #transform centroid tol list so they become serializable
    tset1_centroid = tset1_centroid.tolist() 
    tset2_centroid = tset2_centroid.tolist()
    return [b1_dict, b2_dict]

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/tomvannuenen/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/tomvannuenen/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [8]:
modelpath = "trp_w4_f10_e100_d200.model"

model = Word2Vec.load(modelpath)

In [9]:
# get similar words
sims = model.wv.most_similar('women', topn=10)  # get other similar words
sims

[('men', 0.8873928785324097),
 ('people', 0.7244042158126831),
 ('they', 0.6698467135429382),
 ('girls', 0.669687032699585),
 ('females', 0.658444881439209),
 ('guys', 0.657557487487793),
 ('feminists', 0.6263457536697388),
 ('them', 0.5586285591125488),
 ('alphas', 0.5215171575546265),
 ('sluts', 0.4877395033836365)]

Here we create the two target sets, called `t1` and `t2`. These two lists are the ones you'll want to swap out if you are going to create your own target sets to find biases!

In [10]:
t1=["sister" , "female" , "woman" , "girl" , "daughter" , "she" , "hers" , "her"]
t2=["brother" , "male" , "man" , "boy" , "son" , "he" , "his" , "him"] 

[b1, b2] = calculate_biased_words(model, t1, t2, 4)

Let's print some biases. Here you see the most-biased words towards our target concepts (1 being *women*, 2 being *men*).

In [11]:
print('Biased words towards target set 1')
print( [w for w in b1.keys()] )
print()
print('Biased words towards target set 2')
print( [w for w in b2.keys()] )

Biased words towards target set 1
['fact', 'present', 'line', 'body', 'anxiety', 'chances', 'reality', 'single', 'text', 'past', 'slut', 'obvious', 'spinning', 'plates', 'pussy', 'special', 'begin', 'partner', 'current', 'everyone', 'media', 'hamster', 'lawyer', 'assume', 'conversation', 'chick', 'yours', 'chicks', 'hubby', 'status', 'answer', 'word', 'number', 'exclusivity', 'plate', 'ignoring', 'levels', 'increases', 'increasing', 'phone', 'hang', 'everybody', 'physical', 'bang', 'moms', 'disorder', 'size', 'floor', 'option', 'hb', 'neg', 'subconscious', 'stranger', 'delete', 'suggest', 'league', 'minimal']

Biased words towards target set 2
['sense', 'elon', 'pill', 'became', 'died', 'continues', 'proud', 'shot', 'wish', 'sat', 'killed', 'went', 'poor', 'bought', 'worked', 'felt', 'taught', 'acted', 'rock', 'hell', 'saved', 'despise', 'en', 'fixed', 'petty', 'turned', 'incel', 'useless', 'voice', 'father', 'teaches', 'teen', 'pushed', 'roosh', 'bravo', 'sacrifice', 'savage', 'star',

## Clustering similar words (K-means + silhouette)

Here,  we group our language biases in more general clusters. We do so using the K-means clustering algorithm, and use silhouette scores to validate the consistency within our clusters of data. 

In general, this results in words with similar meanings being clustered together. Clustering allows the biased words to be better interpretable, as their context becomes clearer. 

In [12]:
import pandas as pd
import gensim
import nltk.data
import numpy as np
from scipy import spatial
from sklearn.cluster import KMeans
import sklearn

'''
TARGET SET 1
'''
t1_embeddings = [b1[w]['wv'] for w in b1] # t1 embeddings = list of embeddings of words biased towards target set 1
t1_words = [w for w in b1.keys()]

# Clustering
rangek = range(2, int((len(t1_embeddings)/2)-1) ) # Clusters should be min size 2 at max half of the amount of words (speeding up + forcing clusters)
print('[Testing', rangek, 'clusters]')
kmeans_p = [ KMeans(n_clusters=k).fit_predict(t1_embeddings) for k in rangek] 
kmeans_sil = [ sklearn.metrics.silhouette_score(t1_embeddings, labels) for labels in kmeans_p] 
print('[Silhouette values', kmeans_sil)
indexmaxsil =  kmeans_sil.index(max(kmeans_sil))
print('[Max silhouette, ', max(kmeans_sil), '; index_k: ',indexmaxsil,']')

# Aggregating all clusters from same index in list
clusters1 = {}
for i, index in enumerate(kmeans_p[indexmaxsil]): # returns list of cluster index, telling you which cluster each word belongs to 
    if(index in clusters1):
        clusters1[index].append(t1_words[i])
    else:
        clusters1[index]  = [t1_words[i]]
        
        
'''
TARGET SET 2
'''
t2_embeddings = [b2[w]['wv'] for w in b2]
t2_words = [w for w in b2.keys()]

# Clustering
rangek = range(2, int((len(t2_embeddings)/2)-1) )
print('[Testing', rangek, 'clusters]')
kmeans_p = [ KMeans(n_clusters=k).fit_predict(t2_embeddings) for k in rangek ] 
kmeans_sil = [ sklearn.metrics.silhouette_score(t2_embeddings, labels) for labels in kmeans_p] 
print('[Silhouette values', kmeans_sil)
indexmaxsil =  kmeans_sil.index(max(kmeans_sil))
print('[Max silhouette, ', max(kmeans_sil), '; index_k: ',indexmaxsil,']')

clusters2 = {}
for i, index in enumerate(kmeans_p[indexmaxsil]):
    if(index in clusters2):
        clusters2[index].append(t2_words[i])
    else:
        clusters2[index]  = [t2_words[i]]

[Testing range(2, 27) clusters]
[Silhouette values [0.030211519649646124, 0.03267810714196066, 0.028373644792074285, 0.027131390245210288, 0.01980438774302252, 0.01828340052848148, 0.015945987405179173, 0.016639705849289702, 0.02007767374622047, 0.025500601998201197, 0.018971702697745903, 0.030832368019154684, 0.0003182656101688742, 0.021037563641097927, 0.024730202101530794, 0.010006703597840498, 0.023152831775073393, 0.023732118453075567, 0.016065623264325243, 0.03220781650600742, 0.03176385814749919, 0.014371858993829782, 0.0324774790352782, 0.025859487599828663, 0.030716967389854063]
[Max silhouette,  0.03267810714196066 ; index_k:  1 ]
[Testing range(2, 20) clusters]
[Silhouette values [0.08735057469401863, 0.05134573494961164, 0.0438227465425197, 0.02552790196924445, 0.034608487421963846, 0.019085415887634006, 0.015774570977916694, -0.005412749571853527, 0.00258519443851502, 0.003950236859032627, -0.023845020032209455, -0.01699597673500568, 0.015540500867026168, 0.007145657344673

In [13]:
Y = clusters1.values()
Y

dict_values([['fact', 'single', 'text', 'slut', 'obvious', 'spinning', 'plates', 'pussy', 'hamster', 'conversation', 'chick', 'chicks', 'hubby', 'answer', 'word', 'number', 'plate', 'phone', 'hang', 'bang', 'moms', 'size', 'floor', 'option', 'hb', 'neg', 'delete', 'league'], ['present', 'line', 'body', 'anxiety', 'chances', 'reality', 'past', 'begin', 'partner', 'current', 'lawyer', 'assume', 'yours', 'status', 'exclusivity', 'ignoring', 'levels', 'increases', 'increasing', 'physical', 'disorder', 'subconscious', 'suggest', 'minimal'], ['special', 'everyone', 'media', 'everybody', 'stranger']])

Let's print the clusters we've got.

In [14]:
print('Clusters target set 1')
print( list( clusters1.values()) )        

print('Clusters target set 2')
print( list( clusters2.values()) )        

Clusters target set 1
[['fact', 'single', 'text', 'slut', 'obvious', 'spinning', 'plates', 'pussy', 'hamster', 'conversation', 'chick', 'chicks', 'hubby', 'answer', 'word', 'number', 'plate', 'phone', 'hang', 'bang', 'moms', 'size', 'floor', 'option', 'hb', 'neg', 'delete', 'league'], ['present', 'line', 'body', 'anxiety', 'chances', 'reality', 'past', 'begin', 'partner', 'current', 'lawyer', 'assume', 'yours', 'status', 'exclusivity', 'ignoring', 'levels', 'increases', 'increasing', 'physical', 'disorder', 'subconscious', 'suggest', 'minimal'], ['special', 'everyone', 'media', 'everybody', 'stranger']]
Clusters target set 2
[['sense', 'elon', 'pill', 'continues', 'proud', 'wish', 'killed', 'poor', 'rock', 'hell', 'saved', 'despise', 'en', 'fixed', 'petty', 'incel', 'useless', 'voice', 'father', 'teaches', 'teen', 'roosh', 'bravo', 'sacrifice', 'savage', 'star', 'lone', 'rollo', 'jp', 'daring'], ['became', 'died', 'shot', 'sat', 'went', 'bought', 'worked', 'felt', 'taught', 'acted', 't

In [16]:
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

X = model[model.wv.vocab]
tsne = TSNE(perplexity=40, n_components=2, init='pca', n_iter=2500, random_state=23)
X_tsne = tsne.fit_transform(X)

# PLOTTING - CHANGE THIS

plt.figure(figsize=(16, 16)) 
for i in range(len(x)):
    plt.scatter(x[i],y[i])
    plt.annotate(labels[i],
                  xy=(x[i], y[i]),
                  xytext=(5, 2),
                  textcoords='offset points',
                  ha='right',
                  va='bottom')
plt.show()

  X = model[model.wv.vocab]


NameError: name 'x' is not defined

<Figure size 1152x1152 with 0 Axes>

## Creating your own target sets

If you want, you can try to expose the biases of your own dataset. You can use the target sets defined below, but also create your own. For instance, if you'd want to see which words are biased towards the political left and right, you could create two target sets "Left" and "Right" with respective attributes such as `left-wing, leftist, progressive`, and `right-wing, reactionary, conservative`. The more expansive and accurate you can make your target set, the better the system will work. 

## Existing target sets - details

*Gender target sets taken from Nosek, Banaji, and Greenwald 2002.*

Female: `sister, female, woman, girl, daughter, she, hers, her`.

Male: `brother, male, man, boy, son, he, his, him`.


*Religion target sets taken from Garg et al. 2018.*

Islam: `allah, ramadan, turban, emir, salaam, sunni, koran, imam, sultan, prophet, veil, ayatollah, shiite, mosque, islam, sheik, muslim, muhammad`.

Christianity: `baptism, messiah, catholicism, resurrection, christianity, salva-tion, protestant, gospel, trinity, jesus, christ, christian, cross,catholic, church`.

*Racial target sets taken from Garg et al. 2017*

White last names: `harris, nelson, robinson, thompson, moore, wright, anderson, clark, jackson, taylor, scott, davis, allen, adams, lewis, williams, jones, wilson, martin, johnson`.

Hispanic last names: `ruiz, alvarez, vargas, castillo, gomez, soto,gonzalez, sanchez, rivera, mendoza, martinez, torres, ro-driguez, perez, lopez, medina, diaz, garcia, castro, cruz`.

Asian last names: `cho, wong, tang, huang, chu, chung, ng,wu, liu, chen, lin, yang, kim, chang, shah, wang, li, khan,singh, hong`.

Russian last names: `gurin, minsky, sokolov, markov, maslow, novikoff, mishkin, smirnov, orloff, ivanov, sokoloff, davidoff, savin, romanoff, babinski, sorokin, levin, pavlov, rodin, agin`.


*Career/family target and attribute sets taken from Garg et al. 2018.*

Career: `executive, management, professional, corporation, salary, office, business, career`.

Family: `home, parents, children, family, cousins, marriage, wedding, relatives.Math: math, algebra, geometry, calculus, equations, computation, numbers, addition`.


*Arts/Science target and attribute sets taken from Garg et al. 2018.*

Arts: `poetry, art, sculpture, dance, literature, novel, symphony, drama`.

Science: `science, technology, physics, chemistry, Einstein, NASA, experiment, astronomy`.

### Sources

Nosek, B. A., Banaji, M. R., & Greenwald, A. G. (2002). Harvesting implicit group attitudes and beliefs from a demonstration web site. Group Dynamics, 6(1), 101–115. https://doi.org/10.1037/1089-2699.6.1.101

Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2017). Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes, 1–33.