<a href="https://colab.research.google.com/github/zhestyatsky/abbyy-nlp-course/blob/main/2sem/2hw/06_GoalOriented_ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# shorturl.at/iOY35

Основано на: https://github.com/DanAnastasyev/DeepNLP-Course Week 12

In [1]:
!git clone https://github.com/MiuLab/SlotGated-SLU.git
!wget -qq https://raw.githubusercontent.com/yandexdataschool/nlp_course/master/week08_multitask/conlleval.py

fatal: destination path 'SlotGated-SLU' already exists and is not an empty directory.


In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
DEVICE = torch.device('cpu')

# Диалоговые системы

Диалоговые системы делятся на два типа - *goal-orientied* и *general conversation*.

**General conversation** - это болталка, разговор на свободную тему:  
<img src="https://i.ibb.co/bFwwGpc/alice.jpg" width="200"/>

Сегодня будем говорить не про них, а про **goal-orientied** системы:

<img src="https://hsto.org/webt/gj/3y/xl/gj3yxlqbr7ujuqr9r2akacxmkee.jpeg" width="600"/>

*From [Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)*

Пользователь говорит что-то, это что-то распознается. По распознанному определяется - что, где и когда он хотел. Дальше диалоговый движок решает, действительно ли пользователь знает, чего хотел попросить. Происходит поход в источники - узнать информацию, которую (кажется) запросил пользователь. Исходя из всего этого генерируется некоторый ответ:

<img src="https://i.ibb.co/8XcdpJ7/goal-orientied.png" width="600"/>

*From [Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)*

Будем учить ту часть, которая посередине - классификатор и теггер. Всё остальное обычно - эвристики и захардкоженные ответы.

## Данные

Есть условно стандартный датасет - atis, который неприлично маленький, на самом деле.

К нему можно взять еще датасет snips - он больше и разнообразнее.

Оба датасета возьмем из репозитория статьи [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://aclweb.org/anthology/N18-2118).

Начнем с atis.

In [3]:
import os 

def read_dataset(path):
    with open(os.path.join(path, 'seq.in')) as f_words, \
            open(os.path.join(path, 'seq.out')) as f_tags, \
            open(os.path.join(path, 'label')) as f_intents:
        
        return [
            (words.strip().split(), tags.strip().split(), intent.strip()) 
            for words, tags, intent in zip(f_words, f_tags, f_intents)
        ]

In [4]:
train_data = read_dataset('SlotGated-SLU/data/atis/train/')
val_data = read_dataset('SlotGated-SLU/data/atis/valid/')
test_data = read_dataset('SlotGated-SLU/data/atis/test/')

In [5]:
intent_to_example = {example[2]: example for example in train_data}
#for example in intent_to_example.values():
#    print('Intent:\t', example[2])
#    print('Text:\t', '\t'.join(example[0]))
#    print('Tags:\t', '\t'.join(example[1]))
#    print()

In [9]:
!pip install torchtext==0.10.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [6]:
from torchtext.legacy.data import Field, LabelField, Example, Dataset, BucketIterator

tokens_field = Field()
tags_field = Field(unk_token=None)
intent_field = LabelField()

fields = [('tokens', tokens_field), ('tags', tags_field), ('intent', intent_field)]

train_dataset = Dataset([Example.fromlist(example, fields) for example in train_data], fields)
val_dataset = Dataset([Example.fromlist(example, fields) for example in val_data], fields)
test_dataset = Dataset([Example.fromlist(example, fields) for example in test_data], fields)

tokens_field.build_vocab(train_dataset)
tags_field.build_vocab(train_dataset)
intent_field.build_vocab(train_dataset)

print('Vocab size =', len(tokens_field.vocab))
print('Tags count =', len(tags_field.vocab))
print('Intents count =', len(intent_field.vocab))

train_iter, val_iter, test_iter = BucketIterator.splits(
    datasets=(train_dataset, val_dataset, test_dataset), batch_sizes=(32, 128, 128), 
    shuffle=True, sort=False
)

Vocab size = 869
Tags count = 121
Intents count = 21


То же самое со snips

In [7]:
snips_train_data = read_dataset('SlotGated-SLU/data/snips/train/')
snips_val_data = read_dataset('SlotGated-SLU/data/snips/valid/')
snips_test_data = read_dataset('SlotGated-SLU/data/snips/test/')
snips_intent_to_example = {example[2]: example for example in snips_train_data}
#for example in snips_intent_to_example.values():
#    print('Intent:\t', example[2])
#    print('Text:\t', '\t'.join(example[0]))
#    print('Tags:\t', '\t'.join(example[1]))
#    print()

In [8]:
from torchtext.legacy.data import Field, LabelField, Example, Dataset, BucketIterator

snips_tokens_field = Field()
snips_tags_field = Field(unk_token=None)
snips_intent_field = LabelField()

fields = [('tokens', snips_tokens_field), ('tags', snips_tags_field), ('intent', snips_intent_field)]

snips_train_dataset = Dataset([Example.fromlist(example, fields) for example in snips_train_data], fields)
snips_val_dataset = Dataset([Example.fromlist(example, fields) for example in snips_val_data], fields)
snips_test_dataset = Dataset([Example.fromlist(example, fields) for example in snips_test_data], fields)

snips_tokens_field.build_vocab(snips_train_dataset)
snips_tags_field.build_vocab(snips_train_dataset)
snips_intent_field.build_vocab(snips_train_dataset)

print('Vocab size =', len(snips_tokens_field.vocab))
print('Tags count =', len(snips_tags_field.vocab))
print('Intents count =', len(snips_intent_field.vocab))

train_iter, val_iter, test_iter = BucketIterator.splits(
    datasets=(snips_train_dataset, snips_val_dataset, snips_test_dataset), batch_sizes=(32, 128, 128), 
    shuffle=True, device = 'cpu', sort=False
)

Vocab size = 11420
Tags count = 73
Intents count = 7


## Классификатор интентов

Начнем с классификатора: к какому интенту относится данный запрос.

Ничего умного - берём rnn'ку и учимся предсказывать метки-интенты.

In [9]:
class IntentClassifierModel(nn.Module):
    def __init__(self, vocab_size, intents_count, emb_dim=64,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.lstm_layer = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                  bidirectional=True, num_layers=num_layers)
        self.out_layer = nn.Linear(lstm_hidden_dim * 2, intents_count)

    def forward(self, inputs):
        projections = self.embeddings_layer.forward(inputs)
        projections = projections.reshape(projections.size(0), projections.size(1), -1)
        output, (final_hidden_state, _) = self.lstm_layer(projections)
        hidden = self.dropout(torch.cat((final_hidden_state[0], final_hidden_state[1]), dim=1))
        output = self.out_layer.forward(hidden)
        return output

In [10]:
class ModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.correct_count, self.total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.correct_count / self.total_count
        )
        
    def on_batch(self, batch):
        logits = self.model(batch.tokens.transpose(0, 1))
        loss = self.criterion(logits, batch.intent)
        predicted_intent = torch.max(logits, axis=1)[1]
        self.total_count += predicted_intent.size(0)
        self.correct_count += torch.sum(predicted_intent == batch.intent).item()
        if self.is_train:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()
        self.epoch_loss += loss.item()

In [11]:
import math
from tqdm import tqdm
tqdm.get_lock().locks = []


def do_epoch(trainer, data_iter, is_train, name=None):
    trainer.on_epoch_begin(is_train, name, batches_count=len(data_iter))
    
    with torch.autograd.set_grad_enabled(is_train):
        with tqdm(total=trainer.batches_count) as progress_bar:
            for i, batch in enumerate(data_iter):
                batch_progress = trainer.on_batch(batch)

                progress_bar.update()
                progress_bar.set_description(batch_progress)
                
            epoch_progress = trainer.on_epoch_end()
            progress_bar.set_description(epoch_progress)
            progress_bar.refresh()

            
def fit(trainer, train_iter, epochs_count=1, val_iter=None):
    best_val_loss = None
    for epoch in range(epochs_count):
        name_prefix = '[{} / {}] '.format(epoch + 1, epochs_count)
        do_epoch(trainer, train_iter, is_train=True, name=name_prefix + 'Train:')
        
        if not val_iter is None:
            do_epoch(trainer, val_iter, is_train=False, name=name_prefix + '  Val:')

In [13]:
model = IntentClassifierModel(vocab_size=len(snips_tokens_field.vocab), intents_count=len(snips_intent_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
trainer = ModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter = train_iter, epochs_count=30, val_iter=val_iter)

[1 / 30] Train: Loss = 0.39493, Accuracy = 88.70%: 100%|██████████| 409/409 [00:19<00:00, 20.81it/s]
[1 / 30]   Val: Loss = 0.17411, Accuracy = 94.86%: 100%|██████████| 6/6 [00:00<00:00, 22.63it/s]
[2 / 30] Train: Loss = 0.09135, Accuracy = 97.13%: 100%|██████████| 409/409 [00:17<00:00, 22.92it/s]
[2 / 30]   Val: Loss = 0.12398, Accuracy = 96.57%: 100%|██████████| 6/6 [00:00<00:00, 23.69it/s]
[3 / 30] Train: Loss = 0.05052, Accuracy = 98.43%: 100%|██████████| 409/409 [00:17<00:00, 22.77it/s]
[3 / 30]   Val: Loss = 0.10799, Accuracy = 96.71%: 100%|██████████| 6/6 [00:00<00:00, 23.33it/s]
[4 / 30] Train: Loss = 0.02993, Accuracy = 99.08%: 100%|██████████| 409/409 [00:18<00:00, 22.50it/s]
[4 / 30]   Val: Loss = 0.16122, Accuracy = 96.14%: 100%|██████████| 6/6 [00:00<00:00, 23.59it/s]
[5 / 30] Train: Loss = 0.02248, Accuracy = 99.38%: 100%|██████████| 409/409 [00:17<00:00, 22.82it/s]
[5 / 30]   Val: Loss = 0.12737, Accuracy = 97.00%: 100%|██████████| 6/6 [00:00<00:00, 25.63it/s]
[6 / 30] T

In [None]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.24093, Accuracy = 95.71%: 100%|██████████| 6/6 [00:00<00:00, 22.97it/s]


## Теггер

![](https://commons.bmstu.wiki/images/0/00/NER1.png)  
*From [NER](https://ru.bmstu.wiki/NER_(Named-Entity_Recognition)*

#### **Задание 1.1**
Напишите простой теггер

In [12]:
class TokenTaggerModel(nn.Module):
    def __init__(self, vocab_size, tags_count, emb_dim=64,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.lstm_layer = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                  bidirectional=True, num_layers=num_layers)
        self.out_layer = nn.Linear(2*lstm_hidden_dim, tags_count)

    def forward(self, inputs):
        projections = self.embeddings_layer.forward(inputs)
        last_lstm_layer_output, _ = self.lstm_layer(projections)
        output = self.dropout(last_lstm_layer_output)
        output = self.out_layer.forward(output)
        return output

#### **Задание 1.2**
Обновите `ModelTrainer`: считать нужно всё те же лосс и accuracy, только теперь немного по-другому.

In [13]:
class TagModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.correct_count, self.total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.correct_count / self.total_count
        )
        
    def on_batch(self, batch):
      # batch_size x n_tokens
      tokens = batch.tokens.t()
      tag_labels = batch.tags.t()

      # batch_size x n_tokens x n_tags
      outputs = self.model(tokens)

      # batch_size x n_tags x n_tokens
      logits = outputs.transpose(1, 2)
      loss = self.criterion(logits, tag_labels)

      # batch_size x n_tokens
      tag_prediction_indices = outputs.max(axis=2)[1]

      self.correct_count += torch.sum(tag_labels == tag_prediction_indices).item() - torch.sum(tag_labels == 0).item()
      self.total_count += torch.sum(tag_labels > 0).item()

      if self.is_train:
          loss.backward()
          self.optimizer.step()
          self.optimizer.zero_grad()
      self.epoch_loss += loss.item()

In [14]:
model = TokenTaggerModel(vocab_size=len(snips_tokens_field.vocab), tags_count=len(snips_tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
trainer = TagModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter, epochs_count=10, val_iter=val_iter)


[1 / 10] Train: Loss = 0.87282, Accuracy = 62.10%: 100%|██████████| 409/409 [00:23<00:00, 17.41it/s]
[1 / 10]   Val: Loss = 0.35810, Accuracy = 79.93%: 100%|██████████| 6/6 [00:00<00:00, 17.54it/s]
[2 / 10] Train: Loss = 0.29251, Accuracy = 84.43%: 100%|██████████| 409/409 [00:22<00:00, 17.93it/s]
[2 / 10]   Val: Loss = 0.21084, Accuracy = 87.22%: 100%|██████████| 6/6 [00:00<00:00, 17.84it/s]
[3 / 10] Train: Loss = 0.18526, Accuracy = 89.75%: 100%|██████████| 409/409 [00:22<00:00, 17.83it/s]
[3 / 10]   Val: Loss = 0.16562, Accuracy = 89.80%: 100%|██████████| 6/6 [00:00<00:00, 18.25it/s]
[4 / 10] Train: Loss = 0.13427, Accuracy = 92.47%: 100%|██████████| 409/409 [00:22<00:00, 18.53it/s]
[4 / 10]   Val: Loss = 0.13980, Accuracy = 91.71%: 100%|██████████| 6/6 [00:00<00:00, 18.84it/s]
[5 / 10] Train: Loss = 0.10253, Accuracy = 94.21%: 100%|██████████| 409/409 [00:23<00:00, 17.42it/s]
[5 / 10]   Val: Loss = 0.11380, Accuracy = 92.67%: 100%|██████████| 6/6 [00:00<00:00, 19.78it/s]
[6 / 10] T

In [15]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.10671, Accuracy = 93.36%: 100%|██████████| 6/6 [00:00<00:00, 17.26it/s]


In [16]:
!pip install conlleval

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [17]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1)).transpose(1, 2).max(dim=1)[1].cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in batch.tags.transpose(0, 1).cpu().tolist()])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

Precision = 89.36%, Recall = 89.22%, F1 = 89.29%


## Multi-task learning

Реализуем модель, которая умеет сразу и предсказывать теги и интенты. Идея в том, что в этом всем есть общая информация, которая должна помочь как одной, так и другой задаче: зная интент, можно понять, какие слоты вообще могут быть, а зная слоты, можно угадать и интент.

#### **Задание 2.1**
Реализуйте объединенную модель.

In [28]:
class SharedModel(nn.Module):
    def __init__(self, vocab_size, intents_count, tags_count, emb_dim=64,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()

        self.embedding_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.lstm_layer = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                  bidirectional=True, num_layers=num_layers)
        self.out_layer_intent = nn.Linear(2*lstm_hidden_dim, intents_count)
        self.out_layer_tags = nn.Linear(2*lstm_hidden_dim, tags_count)

    def forward(self, inputs):
        projection = self.embedding_layer.forward(inputs)
        output, (hidden, _) = self.lstm_layer(projection)
        hidden = torch.cat((hidden[0], hidden[1]), dim=1)
        
        output = self.dropout(output)
        hidden = self.dropout(hidden)

        tags_output = self.out_layer_tags.forward(output)
        intent_output = self.out_layer_intent.forward(hidden)
        
        return tags_output, intent_output

#### **Задание 2.2**
Допишите SharedModelTrainer

In [41]:
class SharedModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.tag_correct_count, self.tag_total_count = 0, 0
        self.intent_correct_count, self.intent_total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Tags accuracy = {:.2%}, Intents accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.tag_correct_count / self.tag_total_count, 
            self.intent_correct_count / self.intent_total_count
        )
        
    def on_batch(self, batch):
      tokens = batch.tokens.t()
      tag_labels = batch.tags.t()
      intent_labels = batch.intent

      tags_outputs, intent_outputs = self.model(tokens)
      tags_logits = tags_outputs.transpose(1, 2)

      tags_loss = self.criterion(tags_logits, tag_labels)
      intent_loss = self.criterion(intent_outputs, intent_labels)
      loss = tags_loss + intent_loss

      tag_prediction_indices = tags_outputs.max(axis=2)[1]
      intent_prediction_indices = intent_outputs.max(axis=1)[1]

      self.tag_correct_count += torch.sum(tag_labels == tag_prediction_indices).item() - torch.sum(tag_labels == 0).item()
      self.tag_total_count += torch.sum(tag_labels > 0).item()
      
      self.intent_correct_count += torch.sum(intent_labels == intent_prediction_indices).item()
      self.intent_total_count += intent_labels.size(0)

      if self.is_train:
          loss.backward()
          self.optimizer.step()
          self.optimizer.zero_grad()

      self.epoch_loss += loss.item()

In [42]:
model = SharedModel(vocab_size=len(snips_tokens_field.vocab), intents_count=len(snips_intent_field.vocab),
                    tags_count=len(snips_tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
trainer = SharedModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter, epochs_count=10, val_iter=val_iter)

[1 / 10] Train: Loss = 1.36672, Tags accuracy = 60.62%, Intents accuracy = 86.53%: 100%|██████████| 409/409 [00:25<00:00, 16.07it/s]
[1 / 10]   Val: Loss = 0.49822, Tags accuracy = 78.13%, Intents accuracy = 97.00%: 100%|██████████| 6/6 [00:00<00:00, 19.63it/s]
[2 / 10] Train: Loss = 0.42029, Tags accuracy = 82.26%, Intents accuracy = 97.27%: 100%|██████████| 409/409 [00:22<00:00, 18.00it/s]
[2 / 10]   Val: Loss = 0.34908, Tags accuracy = 85.32%, Intents accuracy = 96.71%: 100%|██████████| 6/6 [00:00<00:00, 20.39it/s]
[3 / 10] Train: Loss = 0.27444, Tags accuracy = 87.59%, Intents accuracy = 98.48%: 100%|██████████| 409/409 [00:22<00:00, 18.28it/s]
[3 / 10]   Val: Loss = 0.25962, Tags accuracy = 88.41%, Intents accuracy = 97.43%: 100%|██████████| 6/6 [00:00<00:00, 17.91it/s]
[4 / 10] Train: Loss = 0.20393, Tags accuracy = 90.48%, Intents accuracy = 98.99%: 100%|██████████| 409/409 [00:23<00:00, 17.05it/s]
[4 / 10]   Val: Loss = 0.21875, Tags accuracy = 89.41%, Intents accuracy = 98.29%

In [43]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.23148, Tags accuracy = 92.97%, Intents accuracy = 97.00%: 100%|██████████| 6/6 [00:00<00:00, 15.87it/s]


In [44]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1))[0].transpose(1, 2).max(dim=1)[1].cpu().tolist()
            true = batch.tags.transpose(0, 1).cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in true])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

Precision = 88.70%, Recall = 88.47%, F1 = 88.58%


 ## Асинхронное обучение

Идея описана в статье [A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling](http://aclweb.org/anthology/N18-2050).

<img src="https://i.ibb.co/qrgVSqF/2018-11-27-2-11-17.png" width="600"/>

Основное отличие от того, что уже реализовали в том, в каком порядке все оптимизируется. Вместо объединенного обучения всех слоев, сети для теггера и для классификатора обучаются отдельно.

На каждом шаге обучения генерируются последовательности скрытых состояний $h^1$ и $h^2$ - для классификатора и для теггера.

Дальше сначала считаются потери от предсказания интента и делается шаг оптимизатора, а затем потери от предсказания теггов - и опять шаг оптимизатора.

#### **Задание 3.1**
Реализуйте асинхронное обучение совместной модели

In [None]:
class AsyncSharedModel(nn.Module):
    def __init__(self, vocab_size, intents_count, tags_count, emb_dim=65,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()
        # YOUR CODE HERE

    def forward(self, inputs):
        # YOUR CODE HERE
        return tag_output, intent_output

In [None]:
class AsyncSharedModelTrainer():
    def __init__(self, model, criterion, tags_optimizer, intent_optimizer):
        self.model = model
        self.criterion = criterion
        self.tags_optimizer = tags_optimizer
        self.intent_optimizer = intent_optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.tag_correct_count, self.tag_total_count = 0, 0
        self.intent_correct_count, self.intent_total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Tags accuracy = {:.2%}, Intents accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.tag_correct_count / self.tag_total_count, 
            self.intent_correct_count / self.intent_total_count
        )
        
    def on_batch(self, batch):
        # YOUR CODE HERE

Затем их нужно передать в отдельные оптимизаторы и учить отдельно.

*Еще, может быть, пригодится retain_graph параметр метода backward()*.

In [None]:
model = AsyncSharedModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab),
                         tags_count=len(tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
tags_parameters = [param for name, param in model.named_parameters() if not 'intent' in name]
intent_parameters = [param for name, param in model.named_parameters() if not 'tags' in name]
tags_optimizer = optim.Adam(tags_parameters)
intent_optimizer = optim.Adam(intent_parameters)
trainer = AsyncSharedModelTrainer(model, criterion, tags_optimizer, intent_optimizer)
fit(trainer, train_iter, epochs_count=30, val_iter=val_iter)

In [None]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

In [None]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1))[0].transpose(1, 2).max(dim=1)[1].cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in batch.tags.transpose(0, 1).cpu().tolist()])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

#### **Задание 3.2**
Посмотрите на параметры в статье и попробуйте добиться похожего качества.

#### **Задание 4**
Посмотрите результаты на SNIPS

## Async Multi-task Learning for POS Tagging

Ещё одна статья: [Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings](https://arxiv.org/pdf/1805.08237.pdf)

Архитектура там такая:

<img src="https://i.ibb.co/0nSX6CC/2018-11-27-9-26-15.png" width="400"/>

Multi-task задача - обучение отдельных классификаторов более низкого уровня (над символами и словами) для предсказания тегов отдельными оптимизаторами.

## DeepPavlov go_bot

http://docs.deeppavlov.ai/en/master/features/skills/go_bot.html

In [None]:
!pip install deeppavlov
!python -m deeppavlov install gobot_dstc2

Collecting deeppavlov
  Downloading deeppavlov-0.17.2-py3-none-any.whl (880 kB)
[K     |████████████████████████████████| 880 kB 5.1 MB/s 
[?25hCollecting fastapi==0.47.1
  Downloading fastapi-0.47.1-py3-none-any.whl (43 kB)
[K     |████████████████████████████████| 43 kB 1.5 MB/s 
[?25hCollecting filelock==3.0.12
  Downloading filelock-3.0.12-py3-none-any.whl (7.6 kB)
Collecting pyopenssl==19.1.0
  Downloading pyOpenSSL-19.1.0-py2.py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 1.6 MB/s 
[?25hCollecting sacremoses==0.0.35
  Downloading sacremoses-0.0.35.tar.gz (859 kB)
[K     |████████████████████████████████| 859 kB 41.2 MB/s 
[?25hCollecting Cython==0.29.14
  Downloading Cython-0.29.14-cp37-cp37m-manylinux1_x86_64.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 47.0 MB/s 
[?25hCollecting tqdm==4.62.0
  Downloading tqdm-4.62.0-py2.py3-none-any.whl (76 kB)
[K     |████████████████████████████████| 76 kB 3.8 MB/s 
[?25hCollecting pymor

2022-03-01 10:55:34.188 INFO in 'deeppavlov.core.common.file'['file'] at line 32: Interpreting 'gobot_dstc2' as '/usr/local/lib/python3.7/dist-packages/deeppavlov/configs/go_bot/gobot_dstc2.json'
Collecting tensorflow==1.15.5
  Downloading tensorflow-1.15.5-cp37-cp37m-manylinux2010_x86_64.whl (110.5 MB)
[K     |████████████████████████████████| 110.5 MB 1.1 kB/s 
Collecting keras-applications>=1.0.8
  Downloading Keras_Applications-1.0.8-py3-none-any.whl (50 kB)
[K     |████████████████████████████████| 50 kB 4.7 MB/s 
Collecting tensorflow-estimator==1.15.1
  Downloading tensorflow_estimator-1.15.1-py2.py3-none-any.whl (503 kB)
[K     |████████████████████████████████| 503 kB 49.0 MB/s 
Collecting gast==0.2.2
  Downloading gast-0.2.2.tar.gz (10 kB)
Collecting tensorboard<1.16.0,>=1.15.0
  Downloading tensorboard-1.15.0-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 32.3 MB/s 
Building wheels for collected packages: gast
  Building wheel for gast (setup.

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 
from deeppavlov import build_model, configs

bot1 = build_model(configs.go_bot.gobot_dstc2, download=True)

bot1(['hi, i want restaurant in the cheap pricerange'])
bot1(['bye'])

2022-03-01 10:56:07.732 INFO in 'deeppavlov.core.data.utils'['utils'] at line 95: Downloading from http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt to /root/.deeppavlov/downloads/embeddings/glove.6B.100d.txt
347MB [00:07, 46.8MB/s]
2022-03-01 10:56:16.549 INFO in 'deeppavlov.core.data.utils'['utils'] at line 95: Downloading from http://files.deeppavlov.ai/deeppavlov_data/dstc_slot_vals.tar.gz to /root/.deeppavlov/downloads/dstc_slot_vals.tar.gz
100%|██████████| 1.62k/1.62k [00:00<00:00, 173kB/s]
2022-03-01 10:56:17.242 INFO in 'deeppavlov.core.data.utils'['utils'] at line 272: Extracting /root/.deeppavlov/downloads/dstc_slot_vals.tar.gz archive into /root/.deeppavlov/downloads/dstc2
2022-03-01 10:56:17.924 INFO in 'deeppavlov.core.data.utils'['utils'] at line 95: Downloading from http://files.deeppavlov.ai/deeppavlov_data/slotfill_dstc2.tar.gz to /root/.deeppavlov/slotfill_dstc2.tar.gz
100%|██████████| 641k/641k [00:00<00:00, 1.29MB/s]
2022-03-01 10:56:19.145 INFO in 'deeppavlov

KeyboardInterrupt: ignored

Поддробные туториалы:

Simple: https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/gobot_tutorial.ipynb

Extended: https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/gobot_extended_tutorial.ipynb

# Дополнительные материалы

## Статьи
A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling, 2018 [[pdf]](http://aclweb.org/anthology/N18-2050)

Slot-Gated Modeling for Joint Slot Filling and Intent Prediction, 2018 [[pdf]](http://aclweb.org/anthology/N18-2118) 

Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings, 2018 [[pdf]](https://arxiv.org/pdf/1805.08237.pdf)

BERT for Joint Intent Classification and Slot Filling
 [[pdf]](https://arxiv.org/pdf/1902.10909.pdf)

## Блоги
[Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)  