# План

* Transfer learning/fine-tuning
* ULMFiT
* Tensorflow Hub

## Transfer Learning, Fine-Tuning
### Идея

Transfer learning -- область в глубинном обучении, которая изучает возможность применения знаний, полученных на решениии одной задачи, к другой.


<img src="http://ruder.io/content/images/2017/03/traditional_ml_setup.png" width="400">

# Computer Vision

### ImageNet

[ImageNet](http://www.image-net.org/) is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images.


<img src="http://ruder.io/content/images/2018/07/imagenet_challenge.png" width="500">

### AlexNet

<img src="https://www.researchgate.net/profile/Liangpei_Zhang/publication/319148488/figure/fig1/AS:528165933195264@1502935979026/The-AlexNet-architecture-with-side-supervision-The-AlexNet-architecture-with-side.png" width="500">


### VGG16

<img src="https://neurohive.io/wp-content/uploads/2018/11/vgg16-1-e1542731207177.png" width="500">


### ResNet

<img src="https://neurohive.io/wp-content/uploads/2019/01/resnet-neural-e1548772388921.png" width="500">

### Аналогия с крокодилом

[Источник](https://github.com/yandexdataschool/Practical_DL/tree/spring2019/week04_finetuning) аналогии

3 шага к успеху:

1. Тренируем сеть на каком-нибудь датасете
2. Отрезаем голову у этой сети, вставляем другую (используем тело в качестве feature extractor)
3. Обучаем на другом датасете

<img src="1.png" width="500">

<img src="2.png" width="500">

<img src="3.png" width="500">

### Pre-trained layers

Мы надеемся, что сеть выучит полезные признаки, которые можно будет использовать на других задачах.

<img src="http://ruder.io/content/images/2018/07/feature_visualization.png" width="700">

[Distill link](https://distill.pub/2017/feature-visualization/)

# NLP

## Проблемы

1. Мало размеченных датасетов
2. Разные языки


## Word2vec

Remember word2vec? Мы инициализируем матрицу эмбеддингов с помощью модели, обученной на большом неразмеченном корпусе.

<img src="http://ruder.io/content/images/2018/07/word2vec_relations.png" width="500">

## Language Modeling

В NLP есть огромные неразмеченные корпусы данных, которые можно использовать для предобучения.


## ULMFiT

Одной из статей, которая дала развитие transfer learning в NLP, была [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/abs/1801.06146)

<img src="https://memegenerator.net/img/instances/84330585/pretrain-your-nlp-models-and-everybody-loses-their-minds.jpg" width="500">

### Шаги к успеху

1. Обучаем LM (AWD LSTM) модель на большом неразмеченном корпусе
2. Дообучаем LM на нашем корпусе
3. Переносим на другую задачу

<img src="http://nlp.fast.ai/images/ulmfit_approach.png" width="600">
<img src="http://ruder.io/content/images/2018/07/ulmfit.png" width="600">


### Выводы

* Нужно меньше данных для обучения
* Быстрое схождение

<img src="http://nlp.fast.ai/images/ulmfit_imdb.png" width="300">

## Практика

Попробуем решить [соревнование](https://www.kaggle.com/c/60k-classes-text-classification), которое было дано в качестве домашки.

In [None]:
import pandas as pd
import numpy as np
import re

import tensorflow as tf
import tensorflow_hub as hub

from sklearn.metrics import f1_score
import seaborn as sns
import matplotlib.pyplot as plt

from tqdm import tqdm
from sklearn.model_selection import train_test_split

In [None]:
def plot_similarity(labels, features, rotation=90):
    corr = np.inner(features, features)
    sns.set(font_scale=1.2)
    g = sns.heatmap(
      corr,
      xticklabels=labels,
      yticklabels=labels,
      vmin=0,
      vmax=1,
      cmap="YlOrRd")
    g.set_xticklabels(labels, rotation=rotation)
    g.set_title("Semantic Textual Similarity")
    plt.show()

In [None]:
def tokenize_string(string):
    string = re.sub(r"[^A-Za-z0-9]", " ", string)  
    return string.strip().lower()

In [None]:
train = pd.read_csv('../05_dssm/competition/train.csv')
test = pd.read_csv('../05_dssm/competition/test_with_answers.csv')

In [None]:
train['text'] = train['text'].apply(lambda x: tokenize_string(x))
test['text'] = test['text'].apply(lambda x: tokenize_string(x))

In [None]:
test_texts = test['text'].values
train_texts = train['text'].values

# TensorFlow Hub

[TensorFlow Hub](https://www.tensorflow.org/hub) is a library for reusable machine learning modules.

# Zero-Shot learning

В качестве feature extractor мы будем использовать [Universal Sentence Encoder](https://arxiv.org/pdf/1803.11175.pdf)


<img src="https://www.gstatic.com/aihub/tfhub/universal-sentence-encoder/example-similarity.png" width="700">

In [None]:
# оставим только позитивные примеры из трейна

condition = train['labels'] != -1
train_labels = train.loc[condition, 'labels'].values
test_labels = test['labels'].values

In [None]:
embedder = hub.Module("https://tfhub.dev/google/universal-sentence-encoder/2", trainable=False)

In [None]:
train_embeddings = embedder(train.loc[condition, 'text'].tolist())
test_embeddings = embedder(test['text'].tolist())

In [None]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(tf.tables_initializer())

    train_embeds = sess.run(train_embeddings)
    test_embeds = sess.run(test_embeddings)

In [None]:
train_embeds.shape, test_embeds.shape

In [None]:
# np.save('train.npy', train_embeds)
# np.save('test.npy', test_embeds)

In [None]:
train_embeds = np.load('train.npy')
test_embeds = np.load('test.npy')

## Prediction

In [None]:
import nmslib

In [None]:
%%time

# индексируем
index = nmslib.init()
index.addDataPointBatch(train_embeds)
index.createIndex(print_progress=True)

In [None]:
%%time

# предсказываем
neighbors_distances = np.array(index.knnQueryBatch(test_embeds, k=1))

neighbors = neighbors_distances[:, 0, :].astype(np.int32).flatten()
distances = neighbors_distances[:, 1, :].flatten()

In [None]:
distances.max(), distances.min(), distances.mean()

**Вопрос:** Как выбрать порог?

In [None]:
predicted_labels = train_labels[neighbors]

predicted_labels[distances > 0.15] = -1

In [None]:
f1 = f1_score(test_labels, predicted_labels, average='micro')

print(f'F1 score = {f1:.3f}')

Посмотрим на какие-нибудь примеры

In [None]:
some_embeddings = test_embeds[:5]

its_texts = test['text'].tolist()[:5]

plot_similarity(its_texts, some_embeddings)

In [None]:
some_embeddings = test_embeds[test_labels == 9273]

its_texts = test.loc[test_labels == 9273, 'text'].tolist()

plot_similarity(its_texts, some_embeddings)

# Transfer learning

In [None]:
# формируем выборку

pos_texts = train['text'][condition].values
neg_texts = train['text'][~condition].values

pos_pairs = pos_texts.reshape(-1, 2)
neg_pairs = np.array(list(zip(pos_texts, np.random.choice(neg_texts, size=len(pos_texts)))))
pairs = np.append(pos_pairs, neg_pairs, axis=0)
labels = np.array([1] * len(pos_pairs) + [0] * len(neg_pairs))

In [None]:
pairs.shape, labels.shape

In [None]:
num_samples = 10000

indexes = np.random.permutation(range(len(pairs)))[:num_samples]

pairs = pairs[indexes]
labels = labels[indexes]

In [None]:
pairs.shape, labels.shape

In [None]:
data = pd.DataFrame(pairs, columns=['q1', 'q2'])
data['labels'] = labels

In [None]:
data.head()

In [None]:
train, test = train_test_split(data, stratify=data['labels'], test_size=0.1, random_state=24)

In [None]:
train.shape, test.shape

In [None]:
train_input_fn = tf.estimator.inputs.pandas_input_fn(train, train["labels"], num_epochs=3, shuffle=True)

train_input_fn_pred = tf.estimator.inputs.pandas_input_fn(train, train["labels"], shuffle=False)
test_input_fn_pred = tf.estimator.inputs.pandas_input_fn(test, test["labels"], shuffle=False)

In [None]:
# используем простой DNN классифаер

hub_module = 'https://tfhub.dev/google/universal-sentence-encoder/2'

def train_and_evaluate_with_module(hub_module=hub_module, train_module=False):
    q1 = hub.text_embedding_column(key="q1", module_spec=hub_module, trainable=train_module)
    q2 = hub.text_embedding_column(key="q2", module_spec=hub_module, trainable=train_module)

    estimator = tf.estimator.DNNClassifier(
      hidden_units=[500, 100],
      feature_columns=[q1, q2],
      n_classes=2,
      optimizer=tf.train.AdagradOptimizer(learning_rate=0.003))

    estimator.train(input_fn=train_input_fn)

    train_eval_result = estimator.evaluate(input_fn=train_input_fn_pred)
    test_eval_result = estimator.evaluate(input_fn=test_input_fn_pred)

    training_set_accuracy = train_eval_result["accuracy"]
    test_set_accuracy = test_eval_result["accuracy"]

    metrics = {
      "Training accuracy": training_set_accuracy,
      "Test accuracy": test_set_accuracy
    }
    
    return estimator, metrics

In [None]:
# обучение

estimator, metrics = train_and_evaluate_with_module()

## Что вместо kNN?

In [None]:
# предсказание

train_pos_texts = train_texts[condition]

num_examples = 10

df = []

for t in test_texts[:num_examples]:
    df.extend([[t, k] for k in train_pos_texts])

df = pd.DataFrame(df, columns=['q1', 'q2'])

test_pred = tf.estimator.inputs.pandas_input_fn(df, shuffle=False)

In [None]:
preds = estimator.predict(test_pred)

predictions = []

for p in tqdm(preds):
    predictions.append(p['probabilities'])

In [None]:
predictions = []

for p in tqdm(preds):
    predictions.append(p['probabilities'])

## References

* [Neural Transfer Learning for Natural Language Processing](http://ruder.io/thesis/neural_transfer_learning_for_nlp.pdf)
* [CS231n, Transfer Learning](http://cs231n.github.io/transfer-learning/)
* [Introducing state of the art text classification](http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html)
* [Transfer Learning - Machine Learning's Next Frontier](http://ruder.io/transfer-learning/)
* [NLP's ImageNet moment has arrived](http://ruder.io/nlp-imagenet/index.html)
* [Feature Visualization](https://distill.pub/2017/feature-visualization/)