<center> <font size = 24 color = 'steelblue'> <b>Machine Translation<br>
<img src = "https://drive.google.com/uc?export=view&id=1kDOK8t-HSazBsqscjf11ml0NFlhiQxRh" width = 600>

# <a id= 'f0'> 
<font size = 4>
    
**Table of Contents:**<br>
[1. Introduction](#f1)<br>
[2. Loading libraries](#f2)<br>
[3. Loading embeddings](#f3)<br>
[4. Translating English dictionary to French](#f4)<br> 
> [4.1 Working with embeddings](#f4.1)<br>
> [4.2 Computing the gradient of loss in respect to transform matrix R](#f4.2)<br>

[5. Cosine Similarity](#f5)

##### <a id = 'f1'>
<font size = 10 color = 'midnightblue'> **Introduction**

<div class="alert alert-block alert-success"> 
<font size = 4>

- Machine translation involves the use of automated systems to translate text or speech from one language to another.
- NLP plays a crucial role in understanding, interpreting, and generating human language in a way that considers context and meaning.
- NLP techniques are employed to enhance the quality and accuracy of machine translation systems.
- NLP helps in addressing linguistic nuances, context understanding, and idiosyncrasies specific to each language.

##### <a id = 'f2'>
<font size = 10 color = 'midnightblue'> **Load the libraries**

In [None]:
import nltk
import pdb
import pickle
import string
import pandas as pd

import time

import gensim
import matplotlib.pyplot as plt

import numpy as np
import scipy
import sklearn
from gensim.models import KeyedVectors
from nltk.corpus import stopwords, twitter_samples
from nltk.tokenize import TweetTokenizer
import re
from nltk.stem import PorterStemmer

from sklearn.metrics.pairwise import cosine_similarity

In [None]:
nltk.download('twitter_samples')
nltk.download('stopwords')

In [None]:
twitter_samples.fileids()

In [None]:
data = twitter_samples.strings('positive_tweets.json')

##### <a id = 'f3'>
<font size = 10 color = 'midnightblue'> **Load English and French embeddings**

In [None]:
en_embeddings_subset = pickle.load(open("en_embeddings.p", "rb"))
fr_embeddings_subset = pickle.load(open("fr_embeddings.p", "rb"))

In [None]:
# english to French dictionary :
file = pd.read_csv('en-fr.train.txt', delimiter = ' ', header = None, index_col = [0]).squeeze("columns")
eng_to_french_dict_train  = file.to_dict()

In [None]:
file = pd.read_csv('en-fr.test.txt', delimiter = ' ', header = None, index_col = [0]).squeeze("columns")
eng_to_french_dict_test  = file.to_dict()

[top](#f0)

##### <a id= 'f4'>
<font size = 10 color = 'midnightblue'> **Translating English dictionary to French** <br>


##### <a id = 'f4.1'>
<font size = 6 color = 'pwdrblue'> <b>Working with embeddings

<div class="alert alert-block alert-success"> 
<font size = 4>
    
- Generate a matrix where where the columns are the English embeddings.
- Generate a matrix where the columns correspond to the French embeddings.
- Generate the projection matrix that minimizes the F norm ||X R -Y||^2.

> - The goal is often to find a transformation matrix that minimizes the difference between two matrices.
> - The Frobenius norm is a way to measure the "size" or magnitude of a matrix.

In [None]:
# get the set of words of English

eng_words = en_embeddings_subset.keys()
frn_words = fr_embeddings_subset.keys()

# get French words in the dictionary
frnch_wrds_dict = eng_to_french_dict_test.values()


<font size = 5 color = 'seagreen'> <b>Check whether embedding is present for both the English and French words present in translations dictionary

In [None]:
eng_emb, frn_emb = [],[]
for eng, frnc in eng_to_french_dict_train.items():
  if (eng in eng_words) and (frnc in frn_words):
    # get the embeddings and store
    eng_emb.append(en_embeddings_subset[eng])
    frn_emb.append(fr_embeddings_subset[frnc])



<font size = 5 color = 'seagreen'> <b>Create English and French embedded matrix

In [None]:
X = np.vstack(eng_emb)
X.shape

In [None]:
Y = np.vstack(frn_emb)
Y.shape

<font size = 5 color = 'seagreen'> <b>Translation

<div class="alert alert-block alert-success"> 
<font size = 4>
    
The loss function will be squared Frobenius norm of the difference between
matrix and its approximation, divided by the number of training examples $m$.
</div>

<font size = 5>
$$ L(X, Y, R)=\frac{1}{m}\sum_{i=1}^{m} \sum_{j=1}^{n}\left( a_{i j} \right)^{2}$$


<font size = 4>
    
<center> where $a_{i j}$ is value in $i$th row and $j$th column of the matrix $\mathbf{XR}-\mathbf{Y}$.

##### <a id = 'f4.2'>
<font size = 6 color = 'pwdrblue'> <b>Computing the gradient of loss in respect to transform matrix R

<div class="alert alert-block alert-success"> 
<font size = 4>
    
* Calculate the gradient of the loss with respect to transform matrix `R`.
* The gradient is a matrix that encodes how much a small change in `R`
affect the change in the loss function.
* The gradient gives us the direction in which we should decrease `R`
to minimize the loss.
* $m$ is the number of training examples (number of rows in $X$).
* The formula for the gradient of the loss function $𝐿(𝑋,𝑌,𝑅)$ is:

$$\frac{d}{dR}𝐿(𝑋,𝑌,𝑅)=\frac{d}{dR}\Big(\frac{1}{m}\| X R -Y\|_{F}^{2}\Big) = \frac{2}{m}X^{T} (X R - Y)$$



In [None]:
np.random.seed(129)
# R is a square matrix with length equal to the number of dimensions in th  word embedding
R = np.random.rand(X.shape[1], X.shape[1])
train_steps=400
learning_rate=0.8
for i in range(train_steps):
  if i % 25 == 0:
    diff = np.dot(X,R) - Y
    sq_diff = diff ** 2
    loss = np.sum(sq_diff)/X.shape[0]
    print(f"loss at iteration {i} is: {loss:.4f}")

  gradient = np.dot(X.transpose(),np.dot(X,R)-Y)*(2/X.shape[0])
  R -= learning_rate * gradient

[top](#f0)

##### <a id = 'f5'>
<font size = 10 color = 'midnightblue'> <b>Cosine similarity

<div class="alert alert-block alert-success"> 
<font size = 4>
    
Cosine similarity between vectors $u$ and $v$ calculated as the cosine of the angle between them.
The formula is

$$\cos(u,v)=\frac{u\cdot v}{\left\|u\right\|\left\|v\right\|}$$

* $\cos(u,v)$ = $1$ when $u$ and $v$ lie on the same line and have the same direction.
* $\cos(u,v)$ is $-1$ when they have exactly opposite directions.
* $\cos(u,v)$ is $0$ when the vectors are orthogonal (perpendicular) to each other.

In [None]:
similarity_l = []
for row in candidates:
  # get the cosine similarity
  cos_similarity = cosine_similarity(v,row)
  # append the similarity to the list
  similarity_l.append(cos_similarity)
# sort the similarity list and get the indices of the sorted list
sorted_ids = np.argsort(similarity_l)
# get the indices of the k most similar candidate vectors
_idx = sorted_ids[-k:]
k_idx

In [None]:
similarity_l = []
cosine_similarity()

In [None]:
# The prediction is X times R
    pred = np.dot(X,R)

    # initialize the number correct to zero
    num_correct = 0

    # loop through each row in pred (each transformed embedding)
    for i in range(len(pred)):
        # get the index of the nearest neighbor of pred at row 'i'; also pass in the candidates in Y
        pred_idx = nearest_neighbor(pred[i],Y)

        # if the index of the nearest neighbor equals the row of i... \
        if pred_idx == i:
            # increment the number correct by 1.
            num_correct += 1

    # accuracy is the number correct divided by the number of rows in 'pred' (also number of rows in X)
    accuracy = num_correct / len(pred)

[top](#f0)