# Operations on word vectors

Welcome to your second programming assignment of week 1. You are going to apply operations between word vectors in order to assess their workability and usefulness. In the last part, you are going to remove the bias hurting these word vectors. 

**After this assignment you will be able to:**
- Apply word vectors to operations like “king” - “queen”
- Code and understand cosine similarity
- Understand word analogies
- Encode concepts such as "gender" in a vector
- Debias word vectors using a series of projections

Let's get started! Run the following cell to load the packages your are going to use.

In [None]:
import numpy as np
import tensorflow as tf
from w2v_utils import *

Let's also load the word vectors from an embedding matrix that maps every english word (from a wide vocabulary) into a vector. In this notebook, we chose to use 50-dimensional GloVe vectors to represent our words. Run the following cell to load the `word_to_vec_map` which contains all the vector representations.

In [None]:
words, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')

You've loaded:
- `words`: list of all the words from the vocabulary.
- `word_to_vec_map`: dictionary mapping words to their GloVe vector representation.

You've seen that one-hot vectors are not comparable in term of word meaning. Now you have word representations and the question is: how do you define similarity between two word vectors?

# 1 - Cosine similarity

We need a way to find the similarity between vectors and this can be done using Cosine Similarity. Mathematically, for two vectors $u$ and $v$,

$$\text{CosineSimilarity(u, v)} = \frac {u . v} {||u||_2 ||v||_2} = cos(\theta) \tag{1}$$

where $u.v$ is the dot (or scalar) product of two vectors, $||u||_2$ is the magnitude of the vector $u$, $\theta$ is the angle between $u$ and $v$. This similarity depends actually on the angle between $u$ and $v$ as you can see on the figure below.

<img src="images/cosine_sim.png" style="width:800px;height:250px;">
<caption><center> **Figure 1**: The cosine of the angle between word vectors is a good metric to define their similarity</center></caption>

**Exercise**: Implement the function `cosine_similarity()` to evaluate similarity between word vectors.

**Reminder**: The magnitude of $u$ is defined by $ ||u||_2 = \sqrt{\sum_{i=0}^{n} u_i^2}$

In [None]:
# GRADED FUNCTION: cosine_similarity

def cosine_similarity(u, v):
    """
    Similarity metric defined by the formula above
    
    Arguments:
    u -- a word vector of shape (n,)
    v -- a word vector of shape (n,)
    
    Returns:
    cosine_similarity -- the cosine similarity between u and v defined by the formula above.
    """
    
    distance = 0.0
    
    ### START CODE HERE ###
    None
    ### END CODE HERE ###
    
    return cosine_similarity

In [None]:
father = word_to_vec_map["father"]
mother = word_to_vec_map["mother"]
ball = word_to_vec_map["ball"]
crocodile = word_to_vec_map["crocodile"]
france = word_to_vec_map["france"]
italy = word_to_vec_map["italy"]
paris = word_to_vec_map["paris"]
rome = word_to_vec_map["rome"]

print("cosine_similarity(father, mother) = ", cosine_similarity(father, mother))
print("cosine_similarity(ball, crocodile) = ",cosine_similarity(ball, crocodile))
print("cosine_similarity(france - paris, rome - italy) = ",cosine_similarity(france - paris, rome - italy))

**Expected Output**:

<table>
    <tr>
        <td>
            **cosine_similarity(father, mother)** =
        </td>
        <td>
         0.890903844289
        </td>
    </tr>
        <tr>
        <td>
            **cosine_similarity(ball, crocodile)** =
        </td>
        <td>
         0.274392462614
        </td>
    </tr>
        <tr>
        <td>
            **cosine_similarity(france - paris, rome - italy)** =
        </td>
        <td>
         -0.675147930817
        </td>
    </tr>
</table>

Once you got the correct expected output, please don't hesitate to modify the expected output cell above to test the cosine similarity with your own words.

## 2 - Word analogy task

In the word analogy task, we complete the sentence <font color='brown'>"*a* is to *b* as *c* is to **____**"</font>. An example is <font color='brown'> '*man* is to *woman* as *king* is to *queen*' </font>. Mathematically, we are trying to find a word *d*, such that the associated word vectors $v_a, v_b, v_c, v_d$ are related in the following manner: $v_b - v_a$ is most similar to $v_d - v_c$. This can also be written as finding the word *d* such that $v_d$ is most similar to the "combined vector" $v_b - v_a + v_c$.

**Exercise**: Complete the code below to be able to perform word analogies!

In [None]:
# GRADED FUNCTION: complete_analogy

def complete_analogy(w1, w2, w3):
    """
    Performs the word analogy task as explained above.
    
    Arguments:
    w1 -- a word, string
    w2 -- a word, string
    w3 -- a word, string
    
    Returns:
    best_word --  the word such that best_word is most similar to  w2 - w1 + w3
    """
    
    # convert words to lower case
    w1, w2, w3 = w1.lower(), w2.lower(), w3.lower()
    
    ### START CODE HERE ### (approx. 1 line)
    # Compute the combined vector
    combined_vector = None
    ### END CODE HERE ###
    
    max_cosine_sim = -1                # Initialize max_cosine_sim to the minimum possible cosine similarity
    best_word = None                   # Initialize best_word, it will help keep track of the word to output

    # loop over the whole word vector set
    for w in words:
        
        # to avoid best_word being w1, w2 or w3, pass them.
        if w in [w1, w2, w3] :
            continue
        
        ### START CODE HERE ### (approx. 4 lines)
        # Compute cosine similarity between the combined_vector and the current word
        cosine_sim = None
        
        # If the cosine_sim is more than the max_cosine_sim seen so far,
        # then: set the new max_cosine_sim to the current cosine_sim and the best_word to the current word
        None
        ### END CODE HERE ###
        
    return best_word

Run the cell below to test your code, this may take 1-2 minutes.

In [None]:
triads_to_try = [('italy', 'italian', 'spain'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('large', 'larger', 'small')]
for triad in triads_to_try:
    print ('{} -> {} :: {} -> {}'.format( *triad, complete_analogy(*triad)))

**Expected Output**:

<table>
    <tr>
        <td>
            **italy -> italian** ::
        </td>
        <td>
         spain -> spanish
        </td>
    </tr>
        <tr>
        <td>
            **india -> delhi** ::
        </td>
        <td>
         japan -> tokyo
        </td>
    </tr>
        <tr>
        <td>
            **man -> woman ** ::
        </td>
        <td>
         boy -> girl
        </td>
    </tr>
        <tr>
        <td>
            **large -> larger ** ::
        </td>
        <td>
         small -> smaller
        </td>
    </tr>
</table>

Once you got the correct expected output, please don't hesitate to modify the expected output cell above to test your own analogies.

## 3 - Debiasing word vectors

In the following exercise, you will look at the implications of using a particular training dataset. You will first compute a vector $v_1 - v_2$, where $v_1$ represents the word vector corresponding to the word *woman*, whereas $v_2$ corresponds to the word vector corresponding to the word *man*. The resulting vector encodes the concept of "gender".

The code below encodes the meaning of "gender" in a vector by taking the difference between word vectors of "woman" and "man".

In [None]:
gender = word_to_vec_map['woman'] - word_to_vec_map['man']

Now, you will consider the cosine similarity of different words with the constructed *gender* vector. Consider what a positive value of similarity means vs a negative cosine similarity. 

In [None]:
print ('List of names and their similarities with constructed vector:')

# girls and boys name
name_list = ['john', 'marie', 'sophie', 'ronaldo', 'priya', 'rahul', 'danielle', 'reza', 'katy', 'yasmin']

for w in name_list:
    print (w, cosine_similarity(word_to_vec_map[w], gender))

As you can see, female first names have a positive cosine similarity with our constructed *gender* vector while male first names have a negative cosine similarity. This is not suprising and it is not a bias. Let's try with other words.

In [None]:
print('Other words and their similarities:')
word_list = ['lipstick', 'guns', 'science', 'arts', 'literature', 'warrior','doctor', 'tree', 'receptionist', 
             'technology',  'fashion', 'teacher', 'engineer', 'pilot', 'computer', 'singer']
for w in word_list:
    print (w, cosine_similarity(word_to_vec_map[w], gender))

Do you notice anything surprising? It is astonishing how these results underline the real-life existing bias between women and men. For example, "computer" is closer to "man" while "literature" is closer to "woman".

The dataset you choose to train your word vectors on has immense power, so you should be careful when you train! You will now remove the gender bias of some of these words.

Note that some words such as "actor"/"actress" or "grandmother"/"grandfather", should remain gender specific while other words such as "receptionist" or "scientist" should be neutralized, i.e. not be gender-related.

You have to treat these two type of words differently when debiasing.

### 3.1 - Neutralize bias for non-gender specific words

The following figure should help you visualize what neutralizing does.

<img src="images/neutralize_kiank.png" style="width:800px;height:300px;">
<caption><center> **Figure 2**: The word vector for "receptionist" represented before and after applying the neutralize operation. </center></caption>

**Exercise**: Implement `neutralize()` to remove the bias of words such as "receptionist" or "scientist".

**Reminder**: a vector $u$ can be split into two parts: its projection over a vector-axis $v_B$ and its projection over the axis orthogonal to $v$:
$$u = u_B + u_{\perp}$$
where : $u_B = \frac{u . v_B}{||v_B||_2 ||v_B||_2} * v_B$ and $ u_{\perp} = u - u_B $

In [None]:
def neutralize(word, bias_axis):
    """
    Removes the bias of "word" by projecting it on the space orthogonal to the bias axis. 
    This function ensures that gender neutral words are zero in the gender subspace.
    
    Arguments:
    word -- string indicating the word to debias
    bias_axis -- numpy-array of shape (50,), vector corresponding to the bias axis, e.g. gender
    """
    
    ### START CODE HERE ###
    None
    ### END CODE HERE ###
    
    return u

In [None]:
word = "receptionist"
bias = gender
print("cosine similarity between " + word + " and gender, before neutralizing: ", cosine_similarity(word_to_vec_map["receptionist"], gender))

v = neutralize("receptionist", gender)
print("cosine similarity between " + word + " and gender, after neutralizing: ", cosine_similarity(v, gender))

**Expected Output**:

<table>
    <tr>
        <td>
            **cosine similarity between receptionist and gender, before neutralizing:** :
        </td>
        <td>
         0.330779417506
        </td>
    </tr>
        <tr>
        <td>
            **cosine similarity between receptionist and gender, after neutralizing:** :
        </td>
        <td>
         -3.26732746085e-17
        </td>
    </tr>
</table>

### 3.2 - Equalize bias for gender-specific words

Now, you will debias gender specific words using a technique called equalization.

Equalization is applied to pairs of words which should differ in meaning only because of their gender properties. For example, "businessman" and "businesswoman" should have the same vector representation in the space orthogonal to the *gender* space. They should differ only in the *gender* space.

Equalizing can be carried out in 6 steps as explained in the figure below.

<img src="images/equalize_kiank1.png" style="width:800px;height:300px;"> <br>
<img src="images/equalize_kiank2.png" style="width:800px;height:300px;"> <br>
<img src="images/equalize_kiank3.png" style="width:800px;height:300px;">
<caption><center> **Figure 3**: The 6 steps to carry out equalizing in order to debias gender-specific words. </center></caption>

**Exercise**: Implement equalize().

In [None]:
def equalize(pair, bias_axis):
    """
    Debias gender specific words by following the equalize method described in the figure above.
    
    Arguments:
    pair -- pair of strings of gender specific words to debias, e.g. ("actress", "actor") 
    bias_axis -- numpy-array of shape (50,), vector corresponding to the bias axis, e.g. gender
    """
    
    ### START CODE HERE ###
    # Step 1: Select word vector representation of "word". Use word_to_vec_map. (≈ 2 lines)
    w1, w2 = None
    u1, u2 = None
    
    # Step 2: Compute the mean of u1 and u2 (≈ 1 line)
    mu = None

    # Step 3: Compute the projections of mu over the bias axis and the orthogonal axis (≈ 2 lines)
    mu_B = None
    mu_orth = None

    # Step 4: Set u1_orth and u2_orth to be equal to mu_orth (≈2 lines)
    u1_orth = None
    u2_orth = None
        
    # Step 5: Adjust the Bias part of u1 and u2 using the formulas given in the figure above (≈2 lines)
    u1_B = None
    u2_B = None

    # Step 6: Debias by equalizing u1 and u2 to the sum of their projections (≈2 lines)
    u1 = None
    u2 = None
    ### END CODE HERE ###
    
    return u1, u2

In [None]:
print("cosine similarities before equalizing:")
print("cosine_similarity(word_to_vec_map[\"man\"], gender) = ", cosine_similarity(word_to_vec_map["man"], gender))
print("cosine_similarity(word_to_vec_map[\"woman\"], gender) = ", cosine_similarity(word_to_vec_map["woman"], gender))

print()
u1, u2 = equalize(("man", "woman"), gender)
print("cosine similarities after equalizing:")
print("cosine_similarity(u1, gender) = ", cosine_similarity(u1, gender))
print("cosine_similarity(u2, gender) = ", cosine_similarity(u2, gender))

**Expected Output**:

cosine similarities before equalizing:
<table>
    <tr>
        <td>
            **cosine_similarity(word_to_vec_map["man"], gender)** =
        </td>
        <td>
         -0.117110957653
        </td>
    </tr>
        <tr>
        <td>
            **cosine_similarity(word_to_vec_map["woman"], gender)** =
        </td>
        <td>
         0.356666188463
        </td>
    </tr>
</table>

cosine similarities after equalizing:
<table>
    <tr>
        <td>
            **cosine_similarity(u1, gender)** =
        </td>
        <td>
         -0.700436428931
        </td>
    </tr>
        <tr>
        <td>
            **cosine_similarity(u2, gender)** =
        </td>
        <td>
         0.700436428931
        </td>
    </tr>
</table>

Please feel free to play with the above cell to equalize your own pair.

We also encourage you to run your implementations to tackle other types of bias such as:
- quantity, which can be encoded using: "numerous" - "single"
- reality, which can be encoded using: "real" - "fake"
- wealth, which can be encoded using: "poor" - "rich"
- ...

### Congratulations!
Congratulations on finishing this assignment. Here are the main points we would like you to remember:

- Many operations such as cosine similarity or analogies can be applied to word vectors.
- Cosine similarity is the main metric to compare word vectors, although L2 distance may also be used.
- Your word vectors are learned by training a model on a dataset, they thus suffer from a bias inherent to the dataset.
- There are different debiasing methods given some words are bias-specific (need to be equalized) while others are non-bias-specific (need to be neutralized)

**References**:
- Bolukbasi et al., 2016, [Man is to Computer Programmer as Woman is to
Homemaker? Debiasing Word Embeddings](https://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf)