Skip to content
Remove problematic gender bias from word embeddings.
Branch: master
Clone or download
Latest commit 10277b2 Apr 2, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data added professions May 9, 2017
debiaswe added tutorial notebook Dec 11, 2017
embeddings added initial data and code Dec 28, 2016
.gitignore added initial data and code Dec 28, 2016
LICENSE Initial commit Dec 25, 2016
README.md Update README.md Apr 2, 2018
tutorial_example1.ipynb

README.md

Debiaswe: try to make word embeddings less sexist

🔴FAT* 2018 tutorial slides

Here we have the code and data for the following paper: Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings by Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Proceedings of NIPS 2016.

Just looking to download a debiased embedding?

You can download binary/txt hard debiased version of the Google's Word2Vec embedding trained on Google News (Origin: GoogleNews-vectors-negative300.bin.gz found here).

Python scripts:

  • learn_gender_specific.py: given a word embedding and a seed set of gender-specific words (like king, she, etc.), it learns a much larger list of gender-specific words
  • debias.py: given a word embedding, sets of gender-pairs, gender-specific words, and pairs to equalize, it outputs a new word embedding. This version basically reads/writes word2vec binary file format.
python learn_gender_specific.py ../embeddings/GoogleNews-vectors-negative300.bin 50000 ../data/gender_specific_seed.json gender_specific_full.json
python debias.py ../embeddings/GoogleNews-vectors-negative300.bin ../data/definitional_pairs.json ../data/gender_specific_full.json ../data/equalize_pairs.json ../embeddings/GoogleNews-vectors-negative300-hard-debiased.bin

We also have seed data used to debias and crowd data used to evaluate the embeddings.

Data files:

  • gender_specific_seed.json: A list of 218 gender-specific words
  • gender_specific_full.json: A list of 1441 gender-specific words
  • definitional_pairs.json: The ten pairs of words we use to define the gender direction
  • equalize_pairs.json: Some crowdsourced F-M pairs of words that represent gender direction

🔵 This is only a partial repo at the moment. I will add more features as I get time.

(All external files that I refer within this repo can be found in this folder.)

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.