<a href="https://colab.research.google.com/github/wendyku/gender-neutral-captioning/blob/master/bias_amplification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Evaluating mean bias amplification

Definition : It is the amplification of bias in the model in the evaluation/test set. \

Bias on training set : $b^{*}(o, g)$\
Bias on test set :  $\tilde b(o, g)$

If $o$ is positively correlated with $g$ (i.e,
$b^{*}(o, g) > 1/||G||$) and $\tilde b(o, g)$ is larger than
$b^{*}(o, g)$, we say bias has been amplified. For
example, if $b^{*}(cooking, woman) = .66$, and $\tilde b(cooking, woman) = .84$, then the bias of woman toward cooking has been amplified.


<b> Mean bias amplification =$$\frac{1}{|O|}\sum\limits_{g}\sum\limits_{o\epsilon\{o\epsilon O|b^{*}(o,g)>1/||G||\}}\tilde b(o,g) - b^{*}(o,g) $$ </b>


This score estimates the average magnitude of bias
amplification for pairs of $o$ and $g$ which exhibited
bias.

Since we consider gender binary, $G$ = $\{man,woman\}$ and $||G||$ = 2

In [2]:
from amp_utils import *
from bias_analysis import *
from pprint import pprint
import glob

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\parva\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\parva\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


## **TO DO**
Bias Amplification without Balancing \
Bias Amplification With Balancing

In [2]:
test_captions = ['a group of men playing a game of baseball .','a man holding a tennis racquet on a tennis court .','a baseball player holding a bat near home plate .']

### Calculating bias amplification for initial unbalanced dataset

Loading training bias dictionaries: $b^{*}(o,g)$

In [3]:
train_man_dict, train_woman_dict = get_bias_dict('nouns')
pprint(train_man_dict)
pprint(train_woman_dict)

mean_bias_amp = bias_amplification([],test_captions=test_captions,train_dict_man = train_man_dict,train_dict_woman=train_woman_dict)
# print(mean_bias_amp)

./training bias\female_nouns.txt
./training bias\male_nouns.txt
{'bags': 0.8333,
 'beach': 0.8529,
 'beer': 0.8889,
 'carriage': 0.8889,
 'dirt': 0.8947,
 'grass': 0.8889,
 'kite': 0.875,
 'road': 0.84,
 'skateboard': 0.8919,
 'sunglasses': 0.8333,
 'surfboard': 0.8333,
 'tennis': 0.9231,
 'wagon': 0.8333}
{'bed': 0.9048,
 'bridle': 0.8333,
 'curb': 0.8,
 'device': 0.8,
 'dress': 0.9286,
 'face': 0.7778,
 'fire': 0.8182,
 'flowers': 0.875,
 'lap': 0.9091,
 'leash': 0.7778,
 'mouth': 0.8571,
 'teeth': 0.8}


GET ALL IMAGE IDS FOR BALANCED SET OF BIASED TERMS

In [15]:
# os.mkdir('trainfolder/')
files = glob.glob("activity_nouns/*.txt")
ids = []
for file in files:
    with open(file) as f:
       ids.extend([int(id[:-1]) for id in f]) 

captions, train_set = get_traincaptions(set(ids)-set([113159]),len(set(ids))-1)

Getting samples:  548 nos.
[112769, 559062, 507167, 25864, 103413]
537
Copied sampled images to trainfolder/


In [16]:
ids

[279806,
 499177,
 370165,
 118965,
 490847,
 139169,
 504498,
 84447,
 538230,
 29187,
 29776,
 26209,
 135690,
 71918,
 175737,
 102331,
 400853,
 4069,
 303738,
 290170,
 128978,
 22816,
 106375,
 382554,
 148843,
 31984,
 511463,
 239387,
 239351,
 170147,
 552532,
 540372,
 520964,
 293505,
 560626,
 52853,
 97006,
 354241,
 415153,
 573088,
 489944,
 3209,
 15956,
 104691,
 49068,
 371283,
 422755,
 380552,
 154090,
 43331,
 62483,
 12754,
 281766,
 328,
 304379,
 49942,
 98656,
 44081,
 194499,
 279521,
 34299,
 260478,
 514826,
 125661,
 239351,
 109454,
 156045,
 163682,
 555361,
 571364,
 111683,
 90778,
 537376,
 293888,
 343852,
 573223,
 561256,
 62483,
 150144,
 48674,
 12754,
 259085,
 441247,
 138488,
 509192,
 430525,
 519542,
 481386,
 459680,
 141197,
 48044,
 32947,
 419828,
 228604,
 217440,
 273132,
 505471,
 443835,
 191613,
 32780,
 52087,
 487338,
 45976,
 378163,
 191691,
 555669,
 302292,
 442549,
 488240,
 564404,
 501919,
 318022,
 549450,
 363181,
 381134,