# Introduction

Ici je vais écrire la version finale de la méthode EM pour le *cas 3 : Contrainte de pureté*


1. [Loading of datasets](#Load-datasets)
2. [Transformation of datasets](#Transform-datasets)
3. [Helper functions](#Helper-functions)
4. [Manual EMANN](#Manual-EMANN)
    1. [EM Starts Here !](#EM-Starts-Here-!)
5. [Test the result](#Test-the-result)


[**[Back to top]**](#Introduction)

In [None]:
from __future__ import division, print_function
import sys
if '..' not in sys.path:
    sys.path.append('..')

import theano
import theano.tensor as T
import lasagne

import time
import visual

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

from sklearn.metrics import confusion_matrix, pairwise_distances

from nn.helper import CNN, NN
from nn import block as nnb
from nn import compilers as nnc


In [None]:
%matplotlib inline

# Load datasets

- the datasets are loaded/built.
- The batchsize is defined
- half of the data name (the source part) is defined

[**[Back to top]**](#Introduction)

## Datasets Imports 

In [None]:
from datasets.toys import make_clouds, make_circles, make_X, make_moons
from datasets.utils import make_dataset, make_domain_dataset


# Transform datasets

- the transformed datasets are built.
- last part of the data name (the target part) is defined

[**[Back to top]**](#Introduction)

## Transformation Imports

In [None]:
from datasets.utils import make_domain_dataset, make_corrector_dataset
import datasets.transform as transform

# Helper functions

[**[Back to top]**](#Introduction)

In [None]:
# Import loggers
from logs import new_logger, empty_logger
logger = new_logger()

In [None]:
from align_learn.probability import mass, align, proba_src_P, proba_tgt_P, renorm, softmax_alpha

Softmax_alpha computes :

$$res_{ij} = \frac{e^{\alpha x_{ij}}}{\sum_j e^{\alpha x_{ij}}}$$

# Manual EMANN

[**[Back to top]**](#Introduction)

In [None]:
EM_ITER = 0
proba_P = proba_src_P
# proba_P = proba_tgt_P

## Generate data

Première étape : générer les données

In [None]:
n_classes_1 = 4
n_classes_2 = 4
n_samples = 1000
X_src, y_src = make_clouds(n_samples=n_samples, n_classes=n_classes_1)

X_tgt, y_tgt = make_clouds(n_samples=n_samples, n_classes=n_classes_1)
# X_tgt, y_tgt = make_circles(n_samples=n_samples,  n_classes=n_classes_2)

data_name='Clouds -> Same'

## Clusters

Choisir/construire les partitions $C_{1i}$ et $C_{2j}$. 

Avoir des labels pour chaque points, placés dans $l_{src}$ et $l_{tgt}$. On garde $y_{src}$ et $y_{tgt}$ pour les véritables labels de classe.

In [None]:
from sklearn.cluster import KMeans

k_src = 10
k_tgt = 12
# We do not need to have the same number of cluster in the source and target data.
k_means_src = KMeans(n_clusters=k_src).fit(X_src)
k_means_tgt = KMeans(n_clusters=k_tgt).fit(X_tgt)
# labels
l_src, l_tgt = k_means_src.labels_, k_means_tgt.labels_
# l_src, l_tgt = np.asarray(y_src, dtype=int), np.asarray(y_tgt, dtype=int),

# Mass
w_src = mass(l_src)
w_tgt = mass(l_tgt)

# Params
n_class_tgt = len(np.unique(l_tgt))
n_class_src = len(np.unique(l_src))

## Initialisation de la matrice de proba du plongement.

On met dans $P_{ij}$ la probabilité de plongement d'élément de la partition $C_{1i}$ dans $C_{2j}$

In [None]:
P = np.random.uniform(0,1, size=(n_class_src, n_class_tgt))
P = renorm(P)
visual.mat(P)
plt.title("Proba matrix")
plt.show()

## Training dataset

Build the training datasets.

The data from the source and the target distribution ordered so $x_s$ should correspond to $x_t$.

The target is the probability that $x$ belong to the label $y$ in the source space

In [None]:
# Get the alignment indexes according to the given probability matrix
align_idx = align(P, l_src, l_tgt)
# Align the data
X_S, y_S = X_src, l_src
X_T, y_T = X_tgt[align_idx], l_tgt[align_idx]
# Get the probability to be predicted for each couple of data point.
p_src, p_tgt = proba_P(P, l_src, l_tgt)
n_class = n_class_tgt if proba_P is proba_tgt_P else n_class_src

# Shuffle it all to prevent the index to be correclated to the labels
indices = np.arange(X_S.shape[0])
np.random.shuffle(indices)
X_S, X_T, p_src, p_tgt = X_S[indices], X_T[indices], p_src[indices], p_tgt[indices]
l_src, l_tgt = l_src[indices], l_tgt[indices]
# Build split dataset (train, valid, test)
src_data = make_dataset(X_S, p_src, batchsize=100)
tgt_data = make_dataset(X_T, p_tgt, batchsize=100)
adversarial_data = make_domain_dataset([src_data, tgt_data])

## Neural Network Architecture

2 entries : 
- one for the source data. The source data goes throught 2 NN parts $\varphi$ (projection to target space) and $\rho$ (classifier)
- one for the target data. The target data goes throught 1 NN parts $\rho$ (classifier)

$\rho(\varphi (x_s)) = P(x_s\in C_{1i})$

$\rho(x_t) = P(x_t\in C_{1i} || x_t\in C_{2j})$

In [None]:
# Get general information :
# =========================
batchsize = src_data.batchsize
_shape = np.shape(src_data.X_train)
n_dim = len(_shape)
n_features = np.prod(_shape[1:])

shape = (batchsize,) + _shape[1:]
target_var = T.ivector('targets')

# Logs
logger.info('Building the input and output variables for : {}'.format(data_name))
logger.info('Input data expected shape : {}'.format(shape))

# WARNING :: Une seule couche de proba. On prédit les lignes pas les colonnes !
# Build the layers :
# ==================
# Inputs layers
# -------------
input_layer_src = lasagne.layers.InputLayer(shape=shape)
input_layer_tgt = lasagne.layers.InputLayer(shape=shape)

# Representaion layers for the source data
# ----------------------------------------
dense_1 = lasagne.layers.DenseLayer(input_layer_src, 3, nonlinearity=lasagne.nonlinearities.rectify)
dense_2 = lasagne.layers.DenseLayer(dense_1, shape[1], nonlinearity=None)
repr_layer = dense_2

# "Classification" layers for the source data
# -------------------------------------------
# WARNING :: Une seule couche de proba. On prédit les lignes pas les colonnes !
# last = lasagne.layers.NonlinearityLayer(dense_2, nonlinearity=lasagne.nonlinearities.rectify)
dense_3 = lasagne.layers.DenseLayer(repr_layer, 2, nonlinearity=lasagne.nonlinearities.rectify)
cluster_src = lasagne.layers.DenseLayer(dense_3, n_class, nonlinearity=lasagne.nonlinearities.softmax)

# "Classification" layers for the target data
# -------------------------------------------
# WARNING :: Une seule couche de proba. On prédit les lignes pas les colonnes !
dense_3_bis = lasagne.layers.DenseLayer(input_layer_tgt, 2, nonlinearity=lasagne.nonlinearities.rectify)
cluster_tgt = lasagne.layers.DenseLayer(dense_3_bis, n_class, nonlinearity=lasagne.nonlinearities.softmax,
                                         W=cluster_src.W, b=cluster_src.b)


## Compile the NN

Compile the functions:
- training, validation, proba output for the source path
- training, validation, proba output for the target path
- raw output for the representation
- training, validation, proba output for the adverssarial path


In [None]:
# Instanciate the NN :
# ====================
nn = CNN(name='EMANN test')
nn.add_output('proba_src', cluster_src)
nn.add_output('proba_tgt', cluster_tgt)
nn.add_output('repr', repr_layer)
# Ok for the adversarial the code is not intuitive. [Further work]
nn.add_output('adversarial', [repr_layer, input_layer_tgt])

# Compile :
# =========
nn.compile('proba_src', nnc.crossentropy_sgd_mom, lr=0.1, mom=0.9)
nn.compile('proba_src', nnc.crossentropy_validation)
nn.compile('proba_src', nnc.output)
nn.compile('proba_tgt', nnc.crossentropy_sgd_mom, lr=0.1, mom=0.9)
nn.compile('proba_tgt', nnc.crossentropy_validation)
nn.compile('proba_tgt', nnc.output)
nn.compile('repr', nnc.output)
nn.compile('adversarial', nnc.adversarial, hp_lambda=0.1, lr=0.1, mom=0.9)

logger.info("Compilation Done")

## Train the NN

Now is the training session.

It altarnatively (mini-batch after mini-batch) train (forwward-backward propagation) each part of the neural network.

- Source data $\to$ Predict the label of the source data in the source space
- Target data $\to$ Predict the probability of being in the partition of the target data in the source space
- Adversarial $\to$ Predict from wich distribution the data comes from (Source or Target)



In [None]:
# Train the nn :
# ==============
# nn.train(data, num_epochs=100);
nn.train([src_data, tgt_data, adversarial_data], ['proba_src', 'proba_tgt', 'adversarial'], num_epochs=5);

In [None]:
# ================
# Learning curve
# ================
# Usefull regex : 'proba.* loss', 'loss', 'acc'
fig, ax = visual.learning_curve(nn.global_stats, regex='loss')
#     SAVE
# fig.tight_layout()
# fig.savefig(fig_title+'-Learning_curve.png',bbox_inches='tight')
fig.show()

## Check some results

Check the output of the NN:

The predicted probability of being in a partition **vs** the true value.

In [None]:
y_pred = nn.parts['proba_src'].output(src_data.X_test)[0]
i = np.random.randint(0, src_data.X_test.shape[0])
# print('\n'.join('{:1.5f}--{:1.5f}'.format(pred, truth) for pred, truth in zip(y_pred[i], data.y_test[i])))
width=0.4
plt.bar(np.arange(n_class), y_pred[i], width, color='r', label='prediction')
plt.bar(np.arange(n_class)+width, src_data.y_test[i], width, color='b', label='true value')
plt.title("One point distrib")
plt.legend(bbox_to_anchor=(1.25,1.))
# plt.yscale('log')
plt.show()

In [None]:
y_pred = nn.parts['proba_tgt'].output(tgt_data.X_test)[0]
i = np.random.randint(0, tgt_data.X_test.shape[0])
# print('\n'.join('{:1.5f}--{:1.5f}'.format(pred, truth) for pred, truth in zip(y_pred[i], data.y_test[i])))
width=0.4
plt.bar(np.arange(n_class), y_pred[i], width, color='r', label='prediction')
plt.bar(np.arange(n_class)+width, tgt_data.y_test[i], width, color='b', label='true value')
plt.title("One point distrib")
plt.legend(bbox_to_anchor=(1.25,1.))
# plt.yscale('log')
plt.show()

In [None]:
X = nn.parts['repr'].output(X_src)[0]
fig, ax = visual.target_2D(X_tgt, y_tgt);
visual.corrected_2D(X, y_src, ax=ax);
visual.add_legend(ax)
plt.show()

## **EM Starts Here !**

[**[Back to top]**](#Introduction)

**Rebuild P**

In [None]:
# Get some outputs for the lines of P
# -----------------------------------
n_samples = X_T.shape[0]
n_train = int(0.6*n_samples)
n_val = int(0.15*n_samples)+n_train

# for each label
if proba_P is proba_tgt_P:
    for l in np.unique(l_src):
        # get some points
        a = tgt_data.X_train[np.where(l_src[:n_train])]
        x = a[np.random.choice(a.shape[0], size=10, replace=False)]
        # get the output of the NN
        p = nn.parts['proba_tgt'].output(x)
        # Agregate lines
        P[l, :] = np.median(p, axis=1)
    # Update P
else:
    for l in np.unique(l_tgt):
        # get some points
        a = tgt_data.X_train[np.where(l_tgt[:n_train])]
        x = a[np.random.choice(a.shape[0], size=10, replace=False)]
        # get the output of the NN
        p = nn.parts['proba_tgt'].output(x)
        # Agregate lines
        P[:, l] = np.median(p, axis=1)
    # Update P
    
# P = softmax_alpha(P, alpha=15)

fig, ax = visual.mat(P)
plt.title("Proba matrix")
plt.show()

**Dual Proba dataset**

In [None]:
# Get the alignment indexes according to the given probability matrix
align_idx = align(P, l_src, l_tgt)
# Align the data
X_S, y_S = X_src, l_src
X_T, y_T = X_tgt[align_idx], l_tgt[align_idx]
# Get the probability to be predicted for each couple of data point.
p_src, p_tgt = proba_P(P, l_src, l_tgt)
n_class = n_class_tgt if proba_P is proba_tgt_P else n_class_src

# Shuffle it all to prevent the index to be correclated to the labels
indices = np.arange(X_S.shape[0])
np.random.shuffle(indices)
X_S, X_T, p_src, p_tgt = X_S[indices], X_T[indices], p_src[indices], p_tgt[indices]
# Build split dataset (train, valid, test)
src_data = make_dataset(X_S, p_src, batchsize=100)
tgt_data = make_dataset(X_T, p_tgt, batchsize=100)

**Neural network** (re-initialization)

**Train the NN**

In [None]:
# Train the nn :
# ==============
# nn.train(data, num_epochs=100);
nn.train([src_data, tgt_data, adversarial_data], ['proba_src', 'proba_tgt', 'adversarial'], num_epochs=5);


In [None]:
EM_ITER += 1
print('Iteration n*', EM_ITER)

In [None]:
# ================
# Learning curve
# ================
# Usefull regex : 'proba.* loss', 'loss', 'acc'
fig, ax = visual.learning_curve(nn.global_stats, regex='proba.* loss')
#     SAVE
# fig.tight_layout()
# fig.savefig(fig_title+'-Learning_curve.png',bbox_inches='tight')
fig.show()

**Check some results**

In [None]:
y_pred = nn.parts['proba_src'].output(src_data.X_test)[0]
i = np.random.randint(0, src_data.X_test.shape[0])
# print('\n'.join('{:1.5f}--{:1.5f}'.format(pred, truth) for pred, truth in zip(y_pred[i], data.y_test[i])))
width=0.4
plt.bar(np.arange(n_class), y_pred[i], width, color='r', label='prediction')
plt.bar(np.arange(n_class)+width, src_data.y_test[i], width, color='b', label='true value')
plt.title("One point distrib")
plt.legend(bbox_to_anchor=(1.25,1.))
# plt.yscale('log')
plt.show()

In [None]:
y_pred = nn.parts['proba_tgt'].output(tgt_data.X_test)[0]
i = np.random.randint(0, tgt_data.X_test.shape[0])
# print('\n'.join('{:1.5f}--{:1.5f}'.format(pred, truth) for pred, truth in zip(y_pred[i], data.y_test[i])))
width=0.4
plt.bar(np.arange(n_class), y_pred[i], width, color='r', label='prediction')
plt.bar(np.arange(n_class)+width, tgt_data.y_test[i], width, color='b', label='true value')
plt.title("One point distrib")
plt.legend(bbox_to_anchor=(1.25,1.))
# plt.yscale('log')
plt.show()

[**[EM LOOP]**](#EM-Starts-Here-!)

# Test the result

[**[Back to top]**](#Introduction)

In [None]:
X = nn.parts['repr'].output(X_src)[0]
fig, ax = visual.target_2D(X_tgt, y_tgt);
visual.corrected_2D(X, y_src, ax=ax);
visual.add_legend(ax)
plt.show()

# Remaining work

- Have more pertinent graphics and results monitoring

- **Build 2 similar Notebooks for case 1 and case 2**