In [205]:
import json
import copy

In [206]:
# pretty prints
from pprint import pprint

In [207]:
# creating directory for json configs
import os

if not os.path.isdir("gobot"):
    os.mkdir("gobot")

<img src="https://static.tildacdn.com/tild6461-6138-4365-a139-383131346165/ipavlov_logo__.png" alt="teacher_forcing" width=50%/>

In [208]:
import deeppavlov

<style type="text/css">
#notebook-container{width: 210mm;!important;
 font-family: inherit;
 font-size:120%;
} 

**NOTE**: "go_bot" model trains faster on a CPU, so let's ignore existing GPUs:

In [209]:
!export CUDA_VISIBLE_DEVICES=""
!echo "cuda visible devices = '"$CUDA_VISIBLE_DEVICES"'"

cuda visible devices = ''


# Hybrid goal-oriented bot

Dialog bots are categorized into two types:

1. **goal-oriented models **

    (those who have to achieve some kind of a goal in the end of conversation: 
     - restaurant and flight booking,
     - customer support service,
     - etc.);
 
2. **chit-chat models **

   (those who chat just for fun, the longer bot speaks with you the better, example:
    - "replica" mobile application).

We will only dive into goal-oriented task specification.

![go bot architecture 0](scheme000.png)

A classical dialog system consists of:

1. **Natural Language Understanding component (NLU)**

 that is intended to "understand" human and represent it's "understanding" in a machine readable format. 

 It takes an utterance text as input and converts to a dialog "frame".

 "Frame" may consist of:
   - domain value (domain is some kind of "a type of dialogs");
   - intent value (intent of current human utterance: "welcome_message", "asking_weather", etc.);
   - entity slots (entities are mentioned by human "location", "time", etc.).

2. **Dialogue Manager component (DM)**

 that is intended to decide what to respond. 
    
 It takes a filled by NLU frame and outputs action (it isn't a final text, it is a label). 
    
 For example, there may be actions: "say_welcome", "say_goodbye", "ask_location", "give_weather", etc.

3. **Natural Language Generation component(NLG)**

 that is intended to convert action to an actual text response representation.
 
 For example, "say_goodbye" -> "You are welcome!".

## NLU 

![go bot architecture 0](scheme001.png)

Let's consider a dialog system with NLU component that consists of a single Named Entity Recognition 
    component (or NER).

One of the previous tutorials introduced deeppavlov NER model and showed how to use it.

## DM & NLG

The tutorial is focused on how to implement
 - Dialogue Manager and
 - Natural Language Generator.

### Dataset

We will train the chatbot on a [Dialog State Tracking Chellenge 2](http://camdial.org/~mh521/dstc/) data.

Let's download it first.

In [210]:
from deeppavlov.dataset_readers.dstc2_reader import DSTC2Version2DatasetReader

data = DSTC2Version2DatasetReader().read(data_path="tmp/my_download_of_dstc2")

2018-07-05 17:09:48.125 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from tmp/my_download_of_dstc2/dstc2-trn.jsonlist]
2018-07-05 17:09:48.645 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from tmp/my_download_of_dstc2/dstc2-val.jsonlist]
2018-07-05 17:09:48.793 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from tmp/my_download_of_dstc2/dstc2-tst.jsonlist]


`DSTC2Version2DatasetReader` downloaded the needed data and saved to disk.

`DialogDatasetIterator` took the data as input and transformed it to batches.

In [211]:
from deeppavlov.dataset_iterators.dialog_iterator import DialogDatasetIterator

batches_generator = DialogDatasetIterator(data, seed=1443, shuffle=True)\
                                         .gen_batches(batch_size=4, data_type='train')

-------------
-------------
   Let's take a closer look at a batch content:

In [212]:
batch = batches_generator.__next__()

Each batch is a tuple of two elements:
  - list of x's and
  - list of y's

In [213]:
x_batch, y_batch = batch


`x_batch` (and `y_batch`) consists of 4 samples. This is because `batch_size` was 4.

In [214]:
len(x_batch)

4

One sample is a dialog. This is how one turn from a dialog looks like:

In [215]:
dialog_id = 0
dialog = [(x, y) for x, y in zip(x_batch[dialog_id], y_batch[dialog_id])]

turn_id = 0
print("----{}th turn----".format(turn_id)) 
pprint(dialog[turn_id], indent=8)

----0th turn----
(       {'intents': [], 'prev_resp_act': None, 'text': ''},
        {       'act': 'welcomemsg',
                'text': 'Hello, welcome to the Cambridge restaurant system. '
                        'You can ask for restaurants by area, price range or '
                        'food type. How may I help you?'})


This is how the whole dialog looks (printing `'text'` parts of `x_batch` and `y_batch`):

In [216]:
for turn in dialog:
    x, y = turn
    print('::', x['text']) 
    print('>>', y['text'], '\n')

:: 
>> Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you? 

:: traditional
>> api_call area="dontcare" food="traditional" pricerange="dontcare" 

:: traditional
>> I am sorry but there is no traditional restaurant that matches your request. 

:: italian food
>> What part of town do you have in mind? 

:: south
>> api_call area="south" food="italian" pricerange="dontcare" 

:: south
>> Pizza hut cherry hinton is a nice place in the south of town serving tasty italian food. 

:: phone number
>> The phone number of pizza hut cherry hinton is 01223 323737. 

:: post code
>> The post code of pizza hut cherry hinton is C.B 1, 7 D.Y. 

:: thank you good bye
>> You are welcome! 



### HowTo: DeepPavlov configs

DeepPavlov uses json configs as a tool to configure data pipelines.

Config has the following sections:
     
 - **dataset_reader**
   - configuration of dataset reader component (is responsible for data download and saving to disk);
  
 - **dataset_iterator**
   - configuration of dataset iterator component (is responsible for making batches (sequences) of data that will be further fed to pipe components);
  
 - **metadata**
   - extra info (urls for data download and telegram configuration);

 - **train**
   - training process configuration (size of batches, number of training epochs, etc.);
   
 - **chainer**
   - specifies data flow (which components are run and in what order);

Let's construct a simple config that builds a dictionary of input sample tokens.

In [217]:
vocab_config = {}

- **dataset_reader** -- configuration of dataset reader component (that is responsible for data download and saving to disk)

In [218]:
dstc2_reader_comp_config = {
    'name': 'dstc2_v2_reader',
    'data_path': 'dstc2_v2'
}

In [219]:
vocab_config['dataset_reader'] = dstc2_reader_comp_config

- **dataset_iterator** -- configuration of dataset iterator component (that is responsible for making batches (sequences) of data that will be further fed to pipe components)

In [220]:
dialog_iterator_comp_config = {
    'name': 'dialog_iterator'
}

In [221]:
vocab_config['dataset_iterator'] = dialog_iterator_comp_config

- **metadata** -- some extra info
     - **metadata.download** -- a list of data which should be downloaded in order for config to work

In [222]:
dstc2_download_config = {
    'url': 'http://lnsigo.mipt.ru/export/datasets/dstc2_v2.tar.gz',
    'subdir': 'dstc2_v2'
}

In [223]:
vocab_config['metadata'] = {}
vocab_config['metadata']['download'] = [
    dstc2_download_config
]

- **train** -- training process configuration
     
We don't need to train anything now, just build (fit on whole dataset once) a dictionary, so "train" section is empty.

In [224]:
vocab_config['train'] = {}

 - **chainer** specifies data flow:
 
     - **chainer.in** -- is a list of input sample names (one data sample might consist of several variables);
     - **chainer.in_y** -- is a list of input label names (each sample might have labels of different kind);
     - **chainer.out** -- is a list of output prediction names (usually has the same length as "chainer.in_y");
     
X is only an utterance here.

Y is empty (we don't need to train the dictionary like neural networks)

There is no prediction for the config, nothing to predict.

In [225]:
vocab_config['chainer'] = {}
vocab_config['chainer']['in'] = ['utterance']
vocab_config['chainer']['in_y'] = []
vocab_config['chainer']['out'] = []

- **chainer**
     - **chainer.pipe** -- is a list of consequently run components. This is the place where you specify in which order and what kind of data will be fed to components. 
     
Our pipe consists of one component -- "default_vocab".

##### HowTo: Component config

Component configs are always just a part of a global model config (described above).

Config for any component contains the following **_required_** parameters:
 - **name** -- registered name of a component (it is a link to python component implementation)
 - **save_path** -- path to save the component (sometimes is not needed, for example, for tokenizers)
 - **load_path** -- path to load the component (sometimes is not needed, for examples, for tokenizers)

and the following **_optional_** parameters:
 - **id** -- reference name for the component
 - **ref** -- "id" of a component that was previously initialized. It can be used instead of "name".
 - **fit\_on** -- a list of data fields to fit on (it calls \_\_fit\_\_ method of the component)
 - **in** -- a list of data fields that are inputs during inference (prediction)
 - **out** -- a list of data fields that are outputs during inference (prediction)
 
 
"default_vocab" component also has it's on unique parameters:
 - level -- on which level to operate ('token' level and 'char' (character) level are available)
 - tokenizer -- if input is a string, then it will be tokenized by the tokenizer, _optional parameter_

In [226]:
vocab_comp_config = {
    'name': 'default_vocab',
    'save_path': 'vocabs/token.dict',
    'load_path': 'vocabs/token.dict',
    'fit_on': ['utterance'],
    'level': 'token',
    'tokenizer': {'name': 'split_tokenizer'},
    'main': True
}

In [227]:
vocab_config['chainer']['pipe'] = [
    vocab_comp_config
]

In [228]:
json.dump(vocab_config, open("gobot/vocab_config.json", 'wt'))

To download "dstc2_v2" dataset use `deeppavlov.deep_download` script (you have to do it only once):

In [229]:
from deeppavlov.download import deep_download # it is called "deep" in honor of "Deep Pavlov"

deep_download(['--config', 'gobot/vocab_config.json'])

2018-07-05 17:09:53.977 INFO in 'deeppavlov.download'['download'] at line 142: Downloading...
2018-07-05 17:09:53.983 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 208: Starting new HTTP connection (1): lnsigo.mipt.ru
2018-07-05 17:09:54.338 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 396: http://lnsigo.mipt.ru:80 "GET /export/datasets/dstc2_v2.tar.gz HTTP/1.1" 200 506300
2018-07-05 17:09:54.345 INFO in 'deeppavlov.core.data.utils'['utils'] at line 65: Downloading from http://lnsigo.mipt.ru/export/datasets/dstc2_v2.tar.gz to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2.tar.gz
100%|██████████| 506k/506k [00:02<00:00, 187kB/s]  
2018-07-05 17:09:57.65 INFO in 'deeppavlov.core.data.utils'['utils'] at line 149: Extracting /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2.tar.gz archive into /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2
2018-07-05 17:09:57.129 INFO in 'deeppavlov.download'['download'] at line 144: 
D

All data and models are saved to root of deeppavlov module + `../download` (`DEEPPAVLOV_ROOT/../download/`).

In [230]:
dstc2_v2_path = deeppavlov.__path__[0] + '/../download/dstc2_v2'

Data was downloaded to `dstc2_v2_path`:

In [231]:
# The command will only work for linux, do not panic otherwise -- it isn't something crucially important.
# You can further just comment bash commands.
!echo "> ls $dstc2_v2_path"
!ls $dstc2_v2_path

> ls /home/temkahap/Рабочий стол/CISS/DeepPavlov/deeppavlov/../download/dstc2_v2
ls: невозможно получить доступ к '/home/temkahap/Рабочий': Нет такого файла или каталога
ls: невозможно получить доступ к 'стол/CISS/DeepPavlov/deeppavlov/../download/dstc2_v2': Нет такого файла или каталога


Let's build our vocabulary.

In [232]:
from deeppavlov.core.commands.train import train_evaluate_model_from_config

train_evaluate_model_from_config("gobot/vocab_config.json")

2018-07-05 17:10:00.886 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-trn.jsonlist]
2018-07-05 17:10:01.394 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-val.jsonlist]
2018-07-05 17:10:01.854 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-tst.jsonlist]
2018-07-05 17:10:02.41 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 162: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 17:10:02.75 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 150: [saving vocabulary to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 17:10:02.77 I

{"valid": {"eval_examples_count": 575, "metrics": {"accuracy": 0.0}, "time_spent": "0:00:01"}}
{"test": {"eval_examples_count": 576, "metrics": {"accuracy": 0.0}, "time_spent": "0:00:01"}}


Vocabulary was built on data and saved to disk.

`save_path = 'vocabs/token.dict'` and component files are saved to `DEEPPAVLOV_ROOT/../download/vocabs/token.dict`.

In [233]:
vocabs_path = deeppavlov.__path__[0] + '/../download/vocabs'

In [234]:
!echo "> ls $vocabs_path"
!ls $vocabs_path

> ls /home/temkahap/Рабочий стол/CISS/DeepPavlov/deeppavlov/../download/vocabs
ls: невозможно получить доступ к '/home/temkahap/Рабочий': Нет такого файла или каталога
ls: невозможно получить доступ к 'стол/CISS/DeepPavlov/deeppavlov/../download/vocabs': Нет такого файла или каталога


This is the content of the saved "token.dict":

In [235]:
!echo "> head $vocabs_path/token.dict"
!head $vocabs_path/token.dict

> head /home/temkahap/Рабочий стол/CISS/DeepPavlov/deeppavlov/../download/vocabs/token.dict
head: невозможно открыть '/home/temkahap/Рабочий' для чтения: Нет такого файла или каталога
head: невозможно открыть 'стол/CISS/DeepPavlov/deeppavlov/../download/vocabs/token.dict' для чтения: Нет такого файла или каталога


##### Using trained component

We can use built vocabulary by initializing it with `build_model_from_config`.

We need to add `in` and `out` to component configuration ( to know what are inputs and outputs during prediction ) :
 - **in** -- a list of data fields that are inputs during inference (prediction)
 - **out** -- a list of data fields that are outputs during inference (prediction)

In [236]:
vocab_comp_config['in'] = ['utterance']
vocab_comp_config['out'] = ['utterance_token_indices']

vocab_config['chainer']['pipe'] = [
    vocab_comp_config
]
vocab_config['chainer']['out'] = ['utterance_token_indices']

In [237]:
from deeppavlov.core.commands.infer import build_model_from_config

model = build_model_from_config(vocab_config)

2018-07-05 17:10:02.637 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 162: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]


Model expects a list of samples (batch) as input.

In [238]:
model(['hi'])
model

<deeppavlov.core.common.chainer.Chainer at 0x7f989db3f0b8>

### Model `gobot_dstc2_simple`

Now let's train a simple goal-oriented bot:

In [239]:
from deeppavlov.download import deep_download
from deeppavlov.core.commands.train import train_evaluate_model_from_config
from deeppavlov.core.commands.infer import build_model_from_config

"dataset_reader", "dataset_iterator" and "metadata" will be the same as for vocabulary only.

In [240]:
simple_config = {}

simple_config['dataset_reader'] = dstc2_reader_comp_config
simple_config['dataset_iterator'] = dialog_iterator_comp_config
simple_config['metadata'] = {}
simple_config['metadata']['download'] = [
    dstc2_download_config
]

X here is a dict `'x'` containing context 'text', 'intents', db_result', 'prev_resp_act'

Y here is a dict `'y'` containing response 'act' and 'text'

Prediction `'y_predicted'` here will be only 'text'

In [241]:
simple_config['chainer'] = {}
simple_config['chainer']['in'] = ['x']
simple_config['chainer']['in_y'] =['y']
simple_config['chainer']['out'] = ['y_predicted']

The bot consists (`pipe` section) of two components:
- **`default_vocab`** (or DefaultVocabulary) component that 

    - remembers all tokens from user utterances. 
    - `DefaultVocabulary.__call__` method inputs batch of tokens and outputs their indeces.

Vocabulary component will be the same as before, but let's add reference to component using `id` 
- **id** -- reference name for the component

In [242]:
vocab_comp_config = {
    'name': 'default_vocab',
    'id': 'token_vocab',
    'load_path': 'vocabs/token.dict',
    'save_path': 'vocabs/token.dict',
    'fit_on': ['x'],
    'level': 'token',
    'tokenizer': {'name': 'split_tokenizer'}
}

Adding vocabulary to chainer:

In [243]:
simple_config['chainer']['pipe'] = []
simple_config['chainer']['pipe'].append(vocab_comp_config)

- **`go_bot`** (or GoalOrientedBot) component that
    - calls `slot_filler` that for user utterance outputs mentioned slots 
        (for example, "i want cheap food" -> {'pricerange': 'cheap'})
    - updates dialog state with `tracker` (DialogStateTracker)
        
          (for example, if old state was {'location': 'north'}, 
          and current slots are {'pricerange': 'cheap'}, 
          then new dialog state will be {'location': 'north', 'pricerange': 'cheap'})
    - converts user utterance in string format (`x`) to tokens with `tokenizer`

          (for example, "hi, i want some cheap food" -> ['hi', ',', 'i', 'want', 'some', 'cheap', 'food'])
    - then embeds the tokens with bag-of-words using `bow_embedder`(if not None) and `word_vocab`

          (for example, "cheap" -> [1, 0, 0, 0, .., 0])
    - embeds the utterance with continuous `embedder` (if not None) as a mean of embeddings of utterance tokens
        
          (for example, "i" -> [0.1231, 0.23423, .., 0.03489])
    - concatenates embeddings and passes it as an input to a recurrent neural network (RNN)
    - trains RNN (with LongShortTermMemory (LSTM) as a core graph) that outputs an action label
    - loads templates (mapping from labels to string) using `template_path` and `template_type` and converts action label to string
        
          (for example, "bye_msg" -> "You are welcome!")
    - fills result string with slot values from dialog state
        
          (for example, if
           dialog state is equal to {'pricerange': 'cheap'}
           and output string is "There are no restaurants in a #pricerange pricerange"
           then the result response will be "There are no restaurants in a cheap pricerange")

In [244]:
bot_comp_config = {
    'name': 'go_bot',
    'in': ['x'],
    'in_y': ['y'],
    'out': ['y_predicted'],
    'word_vocab': None,
    'bow_embedder': {"name": "bow"},
    'embedder': None,
    'slot_filler': None,
    'template_path': 'dstc2_v2/dstc2-templates.txt',
    'template_type': 'DualTemplate',
    'database': None,
    'api_call_action': 'api_call',
    'network_parameters': {
      'load_path': 'gobot_dstc2_simple/model',
      'save_path': 'gobot_dstc2_simple/model',
      'dense_size': 64,
      'hidden_size': 128,
      'learning_rate': 0.002,
      'attention_mechanism': None
    },
    'tokenizer': {'name': 'stream_spacy_tokenizer',
                  'lowercase': False},
    'tracker': {'name': 'featurized_tracker',
                'slot_names': ['pricerange', 'this', 'area', 'food', 'name']},
    'main': True,
    'debug': False
}

This is how we use vocabulary by reference:

In [245]:
bot_comp_config['word_vocab'] = '#token_vocab'

Announcing slot filler component.
We assume that slot filler is already trained, and use it by referencing it's config.

In [246]:
slot_filler_comp_config = {
    'config_path': deeppavlov.__path__[0] + '/../deeppavlov/configs/ner/slotfill_dstc2.json'
}

Adding slot filler to bot component:

In [247]:
bot_comp_config['slot_filler'] = slot_filler_comp_config

Adding `bot_comp_config` to `pipe`:

In [248]:
simple_config['chainer']['pipe'].append(bot_comp_config)

Neural network (in the bot) is trained in epochs, and needs data in the form of batches.

That is why we are now filling "train" section with training parameters.

- **train** -- training process configuration
     - **train.batch_size** is a number of samples in a batch (feeded to the network during one training step)
     - **train.epochs** is a number of iterations over dataset during training
     - **train.log_every_n_batches** and **train.log_every_n_epochs** control frequency of logging messages
     - **train.metrics** is a list of metrics used to validate our performance
     - **train.val_every_n_batches** and **train.val_every_n_epochs** describes how often we calculate metrics on `valid` data split
     - **train.validation_patience** is a number of epochs without metric improvement on `valid` data that we are able to endure =)

In [249]:
simple_bot_train_config = {
    'batch_size': 4,
    'epochs': 2,
    'log_every_n_batches': -1,
    'log_every_n_epochs': 1,
    'metrics': ['per_item_dialog_accuracy'],
    'val_every_n_epochs': 1,
    'validation_patience': 20
}

In [250]:
simple_config['train'] = simple_bot_train_config

In [251]:
json.dump(simple_config, open("gobot/simple_config.json", 'wt'))

`train.epochs` is set to '2' for now, if you intend to train a smarter model, you should increase it (a range from 10 to 200 epochs is recommended).

In [252]:
deep_download(['--config', slot_filler_comp_config['config_path']])

2018-07-05 17:10:03.441 INFO in 'deeppavlov.download'['download'] at line 142: Downloading...
2018-07-05 17:10:03.445 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 208: Starting new HTTP connection (1): lnsigo.mipt.ru
2018-07-05 17:10:03.615 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 396: http://lnsigo.mipt.ru:80 "GET /export/deeppavlov_data/slotfill_dstc2.tar.gz HTTP/1.1" 200 640674
2018-07-05 17:10:03.617 INFO in 'deeppavlov.core.data.utils'['utils'] at line 65: Downloading from http://lnsigo.mipt.ru/export/deeppavlov_data/slotfill_dstc2.tar.gz to /home/temkahap/Рабочий стол/CISS/DeepPavlov/slotfill_dstc2.tar.gz
100%|██████████| 641k/641k [00:01<00:00, 495kB/s] 
2018-07-05 17:10:04.913 INFO in 'deeppavlov.core.data.utils'['utils'] at line 149: Extracting /home/temkahap/Рабочий стол/CISS/DeepPavlov/slotfill_dstc2.tar.gz archive into /home/temkahap/Рабочий стол/CISS/DeepPavlov/download
2018-07-05 17:10:04.928 INFO in 'deeppavlov.download'['download'] at l

In [253]:
train_evaluate_model_from_config("gobot/simple_config.json")

2018-07-05 17:10:04.936 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-trn.jsonlist]
2018-07-05 17:10:05.539 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-val.jsonlist]
2018-07-05 17:10:05.682 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-tst.jsonlist]
2018-07-05 17:10:06.144 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 162: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 17:10:06.177 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 150: [saving vocabulary to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 17:10:06.18

INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model


2018-07-05 17:10:06.944 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model
2018-07-05 17:10:07.450 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 69: [loading templates from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-templates.txt]
2018-07-05 17:10:07.451 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 72: 46 templates loaded
2018-07-05 17:10:07.452 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 96: Calculated input size for `GoalOrientedBotNetwork` is 511
2018-07-05 17:10:08.189 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 52: [initializing `GoalOrientedBotNetwork` from saved]
2018-07-05 17:10:08.190 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 303: [loading parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model.json]
2018-07-05 17:10:08.191 INFO in 'deeppavlov.core.models.

INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model


2018-07-05 17:10:08.202 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model
2018-07-05 17:11:05.750 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 315: Updating global step, learning rate = 0.002000.


{"train": {"epochs_done": 1, "batches_seen": 242, "train_examples_seen": 967, "metrics": {"per_item_dialog_accuracy": 0.614}, "time_spent": "0:00:58"}}


2018-07-05 17:11:25.941 INFO in 'deeppavlov.core.commands.train'['train'] at line 343: New best per_item_dialog_accuracy of 0.4635
2018-07-05 17:11:25.942 INFO in 'deeppavlov.core.commands.train'['train'] at line 345: Saving model
2018-07-05 17:11:25.942 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 49: [saving model to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model]
2018-07-05 17:11:26.396 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 297: [saving parameters to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model.json]


{"valid": {"eval_examples_count": 575, "metrics": {"per_item_dialog_accuracy": 0.4635}, "time_spent": "0:01:18", "epochs_done": 1, "batches_seen": 242, "train_examples_seen": 967, "impatience": 0, "patience_limit": 20}}


2018-07-05 17:12:21.362 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 315: Updating global step, learning rate = 0.002000.


{"train": {"epochs_done": 2, "batches_seen": 484, "train_examples_seen": 1934, "metrics": {"per_item_dialog_accuracy": 0.6239}, "time_spent": "0:02:14"}}


2018-07-05 17:12:41.180 INFO in 'deeppavlov.core.commands.train'['train'] at line 343: New best per_item_dialog_accuracy of 0.4877
2018-07-05 17:12:41.181 INFO in 'deeppavlov.core.commands.train'['train'] at line 345: Saving model
2018-07-05 17:12:41.181 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 49: [saving model to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model]
2018-07-05 17:12:41.289 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 297: [saving parameters to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model.json]
2018-07-05 17:12:41.295 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 162: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 17:12:41.298 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 94: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/word.dict]
2018-0

{"valid": {"eval_examples_count": 575, "metrics": {"per_item_dialog_accuracy": 0.4877}, "time_spent": "0:02:33", "epochs_done": 2, "batches_seen": 484, "train_examples_seen": 1934, "impatience": 0, "patience_limit": 20}}


2018-07-05 17:12:42.58 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 40: [loading model from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model]


INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model


2018-07-05 17:12:42.77 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model
2018-07-05 17:12:42.497 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 69: [loading templates from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-templates.txt]
2018-07-05 17:12:42.498 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 72: 46 templates loaded
2018-07-05 17:12:42.499 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 96: Calculated input size for `GoalOrientedBotNetwork` is 511
2018-07-05 17:12:43.147 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 52: [initializing `GoalOrientedBotNetwork` from saved]
2018-07-05 17:12:43.148 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 303: [loading parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model.json]
2018-07-05 17:12:43.149 INFO in 'deeppavlov.core.models.t

INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model


2018-07-05 17:12:43.162 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model
2018-07-05 17:12:43.207 INFO in 'deeppavlov.core.commands.train'['train'] at line 174: Testing the best saved model


{"valid": {"eval_examples_count": 575, "metrics": {"per_item_dialog_accuracy": 0.4877}, "time_spent": "0:00:20"}}
{"test": {"eval_examples_count": 576, "metrics": {"per_item_dialog_accuracy": 0.4813}, "time_spent": "0:00:20"}}


Let's comminicate with the resulting bot. "exit" message initiates end of dialogue.

In [254]:
model = build_model_from_config(simple_config)

2018-07-05 17:13:22.618 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 162: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 17:13:22.624 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 94: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/word.dict]
2018-07-05 17:13:22.629 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 94: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/tag.dict]
2018-07-05 17:13:23.737 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 40: [loading model from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model]


INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model


2018-07-05 17:13:23.752 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model
2018-07-05 17:13:24.165 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 69: [loading templates from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-templates.txt]
2018-07-05 17:13:24.166 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 72: 46 templates loaded
2018-07-05 17:13:24.167 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 96: Calculated input size for `GoalOrientedBotNetwork` is 511
2018-07-05 17:13:24.781 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 52: [initializing `GoalOrientedBotNetwork` from saved]
2018-07-05 17:13:24.782 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 303: [loading parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model.json]
2018-07-05 17:13:24.783 INFO in 'deeppavlov.core.models.

INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model


2018-07-05 17:13:24.796 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_simple/model


In [255]:
model(['hi, i want some cheap italian food in the north of town'])

2018-07-05 17:13:24.939 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 346: Made api_call with {'pricerange': 'cheap', 'food': 'italian', 'area': 'north'}, got 0 results.


['Sorry there is no italian restaurant in the north of town.']

In [256]:
model(['thanks, bye'])

2018-07-05 17:13:24.964 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 346: Made api_call with {'pricerange': 'cheap', 'food': 'italian', 'area': 'north'}, got 0 results.


['Sorry there is no italian restaurant in the north of town.']

In [257]:
model.reset() # resetting dialog context to start a new one

In [258]:
# if the cell is running, please do not run other cells in parallel -- there is a possibility of a hangup

utterance = ""
while utterance != 'exit':
    print(">> " + model([utterance])[0])
    utterance = input(':: ')

2018-07-05 17:13:24.991 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 346: Made api_call with {'pricerange': 'cheap', 'food': 'italian', 'area': 'north'}, got 0 results.


>> Sorry there is no italian restaurant in the north of town.
:: exit


The model couldn't fill some slots. For example, #address, #phone, #postcode of a restaurant couldn't be inferred from user utterance. 

A list of available restaurants is required.

### Model `gobot_dstc2_db`

Now let's now add a database with restaurants and train a new model:

Initializing new config:

In [259]:
db_config = copy.deepcopy(simple_config)

db_config['chainer']['pipe'] = []

Creating database component config:

In [260]:
db_comp_config = {
    'name': 'sqlite_database',
    'id': 'restaurant_database', 
    'save_path': 'dstc2_v2/resto.sqlite',
    'primary_keys': ['name'],
    'table_name': 'mytable'
}

Adding vocab and database components to pipe:

In [261]:
db_config['chainer']['pipe'].append(vocab_comp_config)
db_config['chainer']['pipe'].append(db_comp_config)

Initializing bot component config:

In [262]:
bot_with_db_comp_config = copy.deepcopy(bot_comp_config)

**WARNING:** Do no forget to change `load_path` and `save_path` in neural network configuration when 
             training a new modification. Otherwise previous model's files will be overwritten.

In [263]:
bot_with_db_comp_config['network_parameters']['load_path'] = 'gobot_dstc2_db/model'
bot_with_db_comp_config['network_parameters']['save_path'] = 'gobot_dstc2_db/model'

Adding database to bot component config:

In [264]:
bot_with_db_comp_config['database'] = '#restaurant_database'

Addind bot component to pipe:

In [265]:
db_config['chainer']['pipe'].append(bot_with_db_comp_config)

In [266]:
json.dump(db_config, open("gobot/db_config.json", 'wt'))

The new model now updates dialog state not only with entity values mentioned by user ("i want cheap food" -> {'pricerange': 'cheap'}), but also with restaurant info taken from sql database of restaurants.

Model has a special action `api_call_action`, which initiates a request to sql database with current dialog state and thus receives info of a single matching restaurant.

So now such slots as #address, #phone and #postcode can be filled in bot responses.

In [267]:
train_evaluate_model_from_config("gobot/db_config.json")

2018-07-05 17:13:48.895 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-trn.jsonlist]
2018-07-05 17:13:49.392 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-val.jsonlist]
2018-07-05 17:13:49.536 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-tst.jsonlist]
2018-07-05 17:13:50.38 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 162: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 17:13:50.75 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 150: [saving vocabulary to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 17:13:50.78 I

INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model


2018-07-05 17:13:51.45 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model
2018-07-05 17:13:51.495 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 69: [loading templates from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-templates.txt]
2018-07-05 17:13:51.496 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 72: 46 templates loaded
2018-07-05 17:13:51.496 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 96: Calculated input size for `GoalOrientedBotNetwork` is 511
2018-07-05 17:13:52.152 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 52: [initializing `GoalOrientedBotNetwork` from saved]
2018-07-05 17:13:52.153 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 303: [loading parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_db/model.json]
2018-07-05 17:13:52.157 INFO in 'deeppavlov.core.models.tf_mo

INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_db/model


2018-07-05 17:13:52.169 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_db/model
2018-07-05 17:14:46.50 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 315: Updating global step, learning rate = 0.002000.


{"train": {"epochs_done": 1, "batches_seen": 242, "train_examples_seen": 967, "metrics": {"per_item_dialog_accuracy": 0.6032}, "time_spent": "0:00:54"}}


2018-07-05 17:15:05.624 INFO in 'deeppavlov.core.commands.train'['train'] at line 343: New best per_item_dialog_accuracy of 0.4882
2018-07-05 17:15:05.625 INFO in 'deeppavlov.core.commands.train'['train'] at line 345: Saving model
2018-07-05 17:15:05.626 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 49: [saving model to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_db/model]
2018-07-05 17:15:05.749 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 297: [saving parameters to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_db/model.json]


{"valid": {"eval_examples_count": 575, "metrics": {"per_item_dialog_accuracy": 0.4882}, "time_spent": "0:01:14", "epochs_done": 1, "batches_seen": 242, "train_examples_seen": 967, "impatience": 0, "patience_limit": 20}}


2018-07-05 17:15:59.705 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 315: Updating global step, learning rate = 0.002000.


{"train": {"epochs_done": 2, "batches_seen": 484, "train_examples_seen": 1934, "metrics": {"per_item_dialog_accuracy": 0.6167}, "time_spent": "0:02:08"}}


2018-07-05 17:16:20.68 INFO in 'deeppavlov.core.commands.train'['train'] at line 350: Did not improve on the per_item_dialog_accuracy of 0.4882
2018-07-05 17:16:20.70 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 162: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 17:16:20.73 INFO in 'deeppavlov.core.data.sqlite_database'['sqlite_database'] at line 57: Loading database from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/resto.sqlite.
2018-07-05 17:16:20.75 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 94: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/word.dict]
2018-07-05 17:16:20.79 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 94: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/tag.dict]


{"valid": {"eval_examples_count": 575, "metrics": {"per_item_dialog_accuracy": 0.4768}, "time_spent": "0:02:28", "epochs_done": 2, "batches_seen": 484, "train_examples_seen": 1934, "impatience": 1, "patience_limit": 20}}


2018-07-05 17:16:21.202 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 40: [loading model from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model]


INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model


2018-07-05 17:16:21.217 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model
2018-07-05 17:16:21.646 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 69: [loading templates from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-templates.txt]
2018-07-05 17:16:21.647 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 72: 46 templates loaded
2018-07-05 17:16:21.647 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 96: Calculated input size for `GoalOrientedBotNetwork` is 511
2018-07-05 17:16:22.321 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 52: [initializing `GoalOrientedBotNetwork` from saved]
2018-07-05 17:16:22.322 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 303: [loading parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_db/model.json]
2018-07-05 17:16:22.325 INFO in 'deeppavlov.core.models.tf_m

INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_db/model


2018-07-05 17:16:22.342 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_db/model
2018-07-05 17:16:22.396 INFO in 'deeppavlov.core.commands.train'['train'] at line 174: Testing the best saved model


{"valid": {"eval_examples_count": 575, "metrics": {"per_item_dialog_accuracy": 0.4882}, "time_spent": "0:00:21"}}
{"test": {"eval_examples_count": 576, "metrics": {"per_item_dialog_accuracy": 0.4861}, "time_spent": "0:00:21"}}


In [268]:
model = build_model_from_config(db_config)

2018-07-05 17:17:03.586 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 162: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 17:17:03.589 INFO in 'deeppavlov.core.data.sqlite_database'['sqlite_database'] at line 57: Loading database from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/resto.sqlite.
2018-07-05 17:17:03.592 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 94: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/word.dict]
2018-07-05 17:17:03.597 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 94: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/tag.dict]
2018-07-05 17:17:04.475 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 40: [loading model from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model]


INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model


2018-07-05 17:17:04.495 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model
2018-07-05 17:17:04.912 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 69: [loading templates from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-templates.txt]
2018-07-05 17:17:04.913 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 72: 46 templates loaded
2018-07-05 17:17:04.914 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 96: Calculated input size for `GoalOrientedBotNetwork` is 511
2018-07-05 17:17:05.973 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 52: [initializing `GoalOrientedBotNetwork` from saved]
2018-07-05 17:17:05.974 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 303: [loading parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_db/model.json]
2018-07-05 17:17:05.976 INFO in 'deeppavlov.core.models.tf_m

INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_db/model


2018-07-05 17:17:05.985 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_db/model


In [152]:
# if the cell is running, please do not run other cells in parallel -- there is a possibility of a hangup

model.reset() # starting new dialog

utterance = ""
while utterance != 'exit':
    print(">> " + model([utterance])[0])
    utterance = input(':: ')

>> Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?
:: hello
>> Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?
:: frecnh restaurant
>> Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?
:: Is there french restaurant?
>> Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?
:: 
>> Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?
:: 
>> Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?


KeyboardInterrupt: 

### Model `gobot_dstc2_emb`

Now let's train a goal-oriented bot with fasttext embeddings:

**NOTICE:** YOU NEED TO CONSTRUCT A NEW CONFIG YOURSELF

Initalizing new config:

In [275]:
emb_config = copy.deepcopy(db_config)

emb_config['chainer']['pipe'] = []
emb_config

{'chainer': {'in': ['x'], 'in_y': ['y'], 'out': ['y_predicted'], 'pipe': []},
 'dataset_iterator': {'name': 'dialog_iterator'},
 'dataset_reader': {'data_path': 'dstc2_v2', 'name': 'dstc2_v2_reader'},
 'metadata': {'download': [{'subdir': 'dstc2_v2',
    'url': 'http://lnsigo.mipt.ru/export/datasets/dstc2_v2.tar.gz'}]},
 'train': {'batch_size': 4,
  'epochs': 2,
  'log_every_n_batches': -1,
  'log_every_n_epochs': 1,
  'metrics': ['per_item_dialog_accuracy'],
  'val_every_n_epochs': 1,
  'validation_patience': 20}}

Adding vocab component to chainer pipe:

In [276]:
emb_config['chainer']['pipe'].append(vocab_comp_config)
emb_config['chainer']['pipe']

[{'fit_on': ['x'],
  'id': 'token_vocab',
  'level': 'token',
  'load_path': 'vocabs/token.dict',
  'name': 'default_vocab',
  'save_path': 'vocabs/token.dict',
  'tokenizer': {'name': 'split_tokenizer'}}]

Initalizing embedder component:

In [277]:
embedder_comp_config = {
    'id': 'my_embedder',
    'name': 'fasttext',
    'load_path': 'embeddings/dstc2_fastText_model.bin',
    'save_path': 'embeddings/dstc2_fastText_model.bin',
    'dim': 100
}

In [278]:
# TODO: add embedder component to chainer pipe
emb_config['chainer']['pipe'].append(embedder_comp_config)

In [281]:
emb_config

{'chainer': {'in': ['x'],
  'in_y': ['y'],
  'out': ['y_predicted'],
  'pipe': [{'fit_on': ['x'],
    'id': 'token_vocab',
    'level': 'token',
    'load_path': 'vocabs/token.dict',
    'name': 'default_vocab',
    'save_path': 'vocabs/token.dict',
    'tokenizer': {'name': 'split_tokenizer'}},
   {'dim': 100,
    'id': 'my_embedder',
    'load_path': 'embeddings/dstc2_fastText_model.bin',
    'name': 'fasttext',
    'save_path': 'embeddings/dstc2_fastText_model.bin'}]},
 'dataset_iterator': {'name': 'dialog_iterator'},
 'dataset_reader': {'data_path': 'dstc2_v2', 'name': 'dstc2_v2_reader'},
 'metadata': {'download': [{'subdir': 'dstc2_v2',
    'url': 'http://lnsigo.mipt.ru/export/datasets/dstc2_v2.tar.gz'}]},
 'train': {'batch_size': 4,
  'epochs': 2,
  'log_every_n_batches': -1,
  'log_every_n_epochs': 1,
  'metrics': ['per_item_dialog_accuracy'],
  'val_every_n_epochs': 1,
  'validation_patience': 20}}

Initializing bot component config:

In [279]:
bot_with_embedder_comp_config = copy.deepcopy(bot_with_db_comp_config)

bot_with_embedder_comp_config['network_parameters']['load_path'] = 'gobot_dstc2_emb/model'
bot_with_embedder_comp_config['network_parameters']['save_path'] = 'gobot_dstc2_emb/model'

In [280]:
# TODO: add #my_embedder to bot_with_embedder_comp_config
bot_with_embedder_comp_config['embedder']= '#' + embedder_comp_config['id']
#bot_comp_config
bot_with_embedder_comp_config

{'api_call_action': 'api_call',
 'bow_embedder': {'name': 'bow'},
 'database': '#restaurant_database',
 'debug': False,
 'embedder': '#my_embedder',
 'in': ['x'],
 'in_y': ['y'],
 'main': True,
 'name': 'go_bot',
 'network_parameters': {'attention_mechanism': None,
  'dense_size': 64,
  'hidden_size': 128,
  'learning_rate': 0.002,
  'load_path': 'gobot_dstc2_emb/model',
  'save_path': 'gobot_dstc2_emb/model'},
 'out': ['y_predicted'],
 'slot_filler': {'config_path': '/home/temkahap/Рабочий стол/CISS/DeepPavlov/deeppavlov/../deeppavlov/configs/ner/slotfill_dstc2.json'},
 'template_path': 'dstc2_v2/dstc2-templates.txt',
 'template_type': 'DualTemplate',
 'tokenizer': {'lowercase': False, 'name': 'stream_spacy_tokenizer'},
 'tracker': {'name': 'featurized_tracker',
  'slot_names': ['pricerange', 'this', 'area', 'food', 'name']},
 'word_vocab': '#token_vocab'}

In [283]:
# TODO: add bot_with_embedder_comp_config to chainer pipe
emb_config['chainer']['pipe'].append(bot_with_embedder_comp_config)


{'chainer': {'in': ['x'],
  'in_y': ['y'],
  'out': ['y_predicted'],
  'pipe': [{'fit_on': ['x'],
    'id': 'token_vocab',
    'level': 'token',
    'load_path': 'vocabs/token.dict',
    'name': 'default_vocab',
    'save_path': 'vocabs/token.dict',
    'tokenizer': {'name': 'split_tokenizer'}},
   {'dim': 100,
    'id': 'my_embedder',
    'load_path': 'embeddings/dstc2_fastText_model.bin',
    'name': 'fasttext',
    'save_path': 'embeddings/dstc2_fastText_model.bin'},
   {'api_call_action': 'api_call',
    'bow_embedder': {'name': 'bow'},
    'database': '#restaurant_database',
    'debug': False,
    'embedder': '#my_embedder',
    'in': ['x'],
    'in_y': ['y'],
    'main': True,
    'name': 'go_bot',
    'network_parameters': {'attention_mechanism': None,
     'dense_size': 64,
     'hidden_size': 128,
     'learning_rate': 0.002,
     'load_path': 'gobot_dstc2_emb/model',
     'save_path': 'gobot_dstc2_emb/model'},
    'out': ['y_predicted'],
    'slot_filler': {'config_path': '/

These are download urls for new required data:

In [285]:
embedder_required_data = {
    'url': 'http://lnsigo.mipt.ru/export/deeppavlov_data/embeddings/dstc2_fastText_model.bin',
    'subdir': 'embeddings'
}

In [290]:
# TODO: add embedder download info to emb_config['metadata']
emb_config['metadata']['download'].append(embedder_required_data)

In [291]:
json.dump(emb_config, open("gobot/emb_config.json", 'wt'))

As far as we are now using embeddings, we added a file named `dstc2_fastText_model.bin` to `metadata.download` section. 

Let's run data loading again.

In [292]:
deep_download(['--config', 'gobot/emb_config.json'])

2018-07-05 17:30:35.515 INFO in 'deeppavlov.download'['download'] at line 142: Downloading...
2018-07-05 17:30:35.521 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 208: Starting new HTTP connection (1): lnsigo.mipt.ru
2018-07-05 17:30:35.778 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 396: http://lnsigo.mipt.ru:80 "GET /export/datasets/dstc2_v2.tar.gz HTTP/1.1" 200 506300
2018-07-05 17:30:35.787 INFO in 'deeppavlov.core.data.utils'['utils'] at line 65: Downloading from http://lnsigo.mipt.ru/export/datasets/dstc2_v2.tar.gz to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2.tar.gz
100%|██████████| 506k/506k [00:03<00:00, 137kB/s] 
2018-07-05 17:30:39.503 INFO in 'deeppavlov.core.data.utils'['utils'] at line 149: Extracting /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2.tar.gz archive into /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2
2018-07-05 17:30:39.575 DEBUG in 'urllib3.connectionpool'['connectionpool'] at li

In [293]:
train_evaluate_model_from_config("gobot/emb_config.json")

2018-07-05 18:00:10.429 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-trn.jsonlist]
2018-07-05 18:00:10.680 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-val.jsonlist]
2018-07-05 18:00:11.287 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-tst.jsonlist]
2018-07-05 18:00:11.493 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 162: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 18:00:11.525 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 150: [saving vocabulary to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 18:00:11.52

INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model


2018-07-05 18:00:14.115 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model
2018-07-05 18:00:14.560 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 69: [loading templates from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-templates.txt]
2018-07-05 18:00:14.561 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 72: 46 templates loaded
2018-07-05 18:00:14.562 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 96: Calculated input size for `GoalOrientedBotNetwork` is 611
2018-07-05 18:00:15.278 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 55: [initializing `GoalOrientedBotNetwork` from scratch]
2018-07-05 18:01:14.976 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 315: Updating global step, learning rate = 0.002000.


{"train": {"epochs_done": 1, "batches_seen": 242, "train_examples_seen": 967, "metrics": {"per_item_dialog_accuracy": 0.3921}, "time_spent": "0:01:00"}}


2018-07-05 18:01:36.544 INFO in 'deeppavlov.core.commands.train'['train'] at line 343: New best per_item_dialog_accuracy of 0.4564
2018-07-05 18:01:36.544 INFO in 'deeppavlov.core.commands.train'['train'] at line 345: Saving model
2018-07-05 18:01:36.545 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 49: [saving model to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_emb/model]
2018-07-05 18:01:36.672 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 297: [saving parameters to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_emb/model.json]


{"valid": {"eval_examples_count": 575, "metrics": {"per_item_dialog_accuracy": 0.4564}, "time_spent": "0:01:22", "epochs_done": 1, "batches_seen": 242, "train_examples_seen": 967, "impatience": 0, "patience_limit": 20}}


2018-07-05 18:02:32.980 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 315: Updating global step, learning rate = 0.002000.


{"train": {"epochs_done": 2, "batches_seen": 484, "train_examples_seen": 1934, "metrics": {"per_item_dialog_accuracy": 0.5628}, "time_spent": "0:02:18"}}


2018-07-05 18:02:53.465 INFO in 'deeppavlov.core.commands.train'['train'] at line 343: New best per_item_dialog_accuracy of 0.4901
2018-07-05 18:02:53.466 INFO in 'deeppavlov.core.commands.train'['train'] at line 345: Saving model
2018-07-05 18:02:53.467 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 49: [saving model to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_emb/model]
2018-07-05 18:02:53.569 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 297: [saving parameters to /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_emb/model.json]
2018-07-05 18:02:53.575 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 162: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 18:02:53.577 INFO in 'deeppavlov.models.embedders.fasttext_embedder'['fasttext_embedder'] at line 69: [loading embeddings from `/home/temkahap/Рабочий стол/CISS/DeepPavlov/download/embeddings/dstc2_fas

{"valid": {"eval_examples_count": 575, "metrics": {"per_item_dialog_accuracy": 0.4901}, "time_spent": "0:02:39", "epochs_done": 2, "batches_seen": 484, "train_examples_seen": 1934, "impatience": 0, "patience_limit": 20}}


2018-07-05 18:02:54.564 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 94: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/word.dict]
2018-07-05 18:02:54.574 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 94: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/tag.dict]
2018-07-05 18:02:55.349 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 40: [loading model from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model]


INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model


2018-07-05 18:02:55.368 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model
2018-07-05 18:02:55.800 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 69: [loading templates from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-templates.txt]
2018-07-05 18:02:55.801 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 72: 46 templates loaded
2018-07-05 18:02:55.805 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 96: Calculated input size for `GoalOrientedBotNetwork` is 611
2018-07-05 18:02:56.858 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 52: [initializing `GoalOrientedBotNetwork` from saved]
2018-07-05 18:02:56.859 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 303: [loading parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_emb/model.json]
2018-07-05 18:02:56.862 INFO in 'deeppavlov.core.models.tf_

INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_emb/model


2018-07-05 18:02:56.876 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_emb/model
2018-07-05 18:02:56.947 INFO in 'deeppavlov.core.commands.train'['train'] at line 174: Testing the best saved model


{"valid": {"eval_examples_count": 575, "metrics": {"per_item_dialog_accuracy": 0.49}, "time_spent": "0:00:21"}}
{"test": {"eval_examples_count": 576, "metrics": {"per_item_dialog_accuracy": 0.4881}, "time_spent": "0:00:21"}}


In [294]:
model = build_model_from_config(emb_config)

2018-07-05 18:03:37.910 INFO in 'deeppavlov.core.data.vocab'['vocab'] at line 162: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/vocabs/token.dict]
2018-07-05 18:03:37.914 INFO in 'deeppavlov.models.embedders.fasttext_embedder'['fasttext_embedder'] at line 69: [loading embeddings from `/home/temkahap/Рабочий стол/CISS/DeepPavlov/download/embeddings/dstc2_fastText_model.bin`]
2018-07-05 18:03:38.923 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 94: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/word.dict]
2018-07-05 18:03:38.928 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 94: [loading vocabulary from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/tag.dict]
2018-07-05 18:03:39.691 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 40: [loading model from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model]


INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model


2018-07-05 18:03:39.710 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/slotfill_dstc2/model
2018-07-05 18:03:40.132 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 69: [loading templates from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/dstc2_v2/dstc2-templates.txt]
2018-07-05 18:03:40.134 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 72: 46 templates loaded
2018-07-05 18:03:40.134 INFO in 'deeppavlov.skills.go_bot.bot'['bot'] at line 96: Calculated input size for `GoalOrientedBotNetwork` is 611
2018-07-05 18:03:40.792 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 52: [initializing `GoalOrientedBotNetwork` from saved]
2018-07-05 18:03:40.792 INFO in 'deeppavlov.skills.go_bot.network'['network'] at line 303: [loading parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_emb/model.json]
2018-07-05 18:03:40.795 INFO in 'deeppavlov.core.models.tf_

INFO:tensorflow:Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_emb/model


2018-07-05 18:03:40.806 INFO in 'tensorflow'['tf_logging'] at line 116: Restoring parameters from /home/temkahap/Рабочий стол/CISS/DeepPavlov/download/gobot_dstc2_emb/model


In [None]:
# if the cell is running, please do not run other cells in parallel -- there is a possibility of a hangup

model.reset() # starting new dialog

utterance = ""
while utterance != 'exit':
    print(">> " + model([utterance])[0])
    utterance = input(':: ')

>> Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?
:: Italian restaurant
>> Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?
:: hello
>> Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?


## Appendix _(optional)_

### Additional materials

- [DataFest Presentation (RU)](https://docs.google.com/presentation/d/1PBPQp-wgQ6aRbm3MsuyGYB_TVg2c7Bf89lhdhbMtC2k)
- [Video Lecture "Hybrid dialog bot" (RU)](http://www.youtube.com/watch?v=JJCO7eWCy-M&t=331m19s)
- [Video Lecture "What's inside a dialog system?" (RU)](http://www.youtube.com/watch?v=JJCO7eWCy-M&t=259m55s)

### Model `gobot_dstc2_full`

Now let's train a very smart goal-oriented bot that uses an attention mechanism over input embeddings 

(see https://medium.com/syncedreview/a-brief-overview-of-attention-mechanism-13c578ba9129 for more details):

Initializing new config:

In [118]:
emb_config = json.load(open("gobot/emb_config.json", 'rt'))

full_config = copy.deepcopy(emb_config)
full_config['chainer']['pipe'] = [
    vocab_comp_config,
    db_comp_config,
    embedder_comp_config
]

Initializing bot component config:

In [119]:
bot_with_emb_comp_config = emb_config['chainer']['pipe'][-1]
bot_with_attn_comp_config = copy.deepcopy(bot_with_emb_comp_config)

bot_with_attn_comp_config['network_parameters']['load_path'] = 'gobot_dstc2_full/model'
bot_with_attn_comp_config['network_parameters']['save_path'] = 'gobot_dstc2_full/model'

Adding attention mechanism to bot:

In [120]:
attention_mechanism_config = {
    'action_as_key': True,
    'depth': 3,
    'hidden_size': 32,
    'max_num_tokens': 100,
    'projected_align': False,
    'type': 'cs_bahdanau'
}

In [121]:
bot_with_attn_comp_config['network_parameters']['attention_mechanism'] = attention_mechanism_config

Adding bot component to pipe:

In [122]:
full_config['chainer']['pipe'].append(bot_with_attn_comp_config)

In [123]:
json.dump(full_config, open("gobot/full_config.json", 'wt'))

In [None]:
train_evaluate_model_from_config("gobot/full_config.json")

In [None]:
model = build_model_from_config(full_config)

In [None]:
# if the cell is running, please do not run other cells in parallel -- there is a possibility of a hangup

model.reset() # starting new dialog

utterance = ""
while utterance != 'exit':
    print(">> " + model([utterance])[0])
    utterance = input(':: ')

### Another way of training and infering a component

Let's build response token vocabulary, but do it without using deeppavlov scripts (without `train_evaluate_model_from_config` and `build_model_from_config`).

In [160]:
from deeppavlov.core.data.vocab import DefaultVocabulary
from deeppavlov.dataset_readers.dstc2_reader import DSTC2Version2DatasetReader
from deeppavlov.dataset_iterators.dialog_iterator import DialogDatasetIterator

Initializing a `DefaultVocabulary` class:

In [161]:
y_vocab = DefaultVocabulary(level='token', 
                            load_path='vocabs/y_token.dict', # path is relative to DEEPPAVLOV_ROOT/../download/ 
                            save_path='vocabs/y_token.dict',
                            tokenizer=lambda s_batch: [s.split() for s in s_batch])

Important methods of any trained component are:

   - **\_\_init\_\_(self, *args, *\*kwargs)**
     - intializes a class instance

   - **fit(self, data, *args)** or **train_on_batch(self, batch, *args)**
     - fits on full data or makes one training step on a batch of data

   - **\_\_call\_\_(self, batch, \*\*kwargs)**
     - makes prediction (or infers) for each sample in a batch

Getting batches of data:

In [162]:
data = DSTC2Version2DatasetReader().read(data_path="tmp/my_download_of_dstc2")
data_samples = DialogDatasetIterator(data, seed=1443, shuffle=True).get_instances(data_type='all')
x_list, y_list = data_samples

2018-07-05 16:53:42.967 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from tmp/my_download_of_dstc2/dstc2-trn.jsonlist]
2018-07-05 16:53:43.589 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from tmp/my_download_of_dstc2/dstc2-val.jsonlist]
2018-07-05 16:53:43.710 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 214: [loading dialogs from tmp/my_download_of_dstc2/dstc2-tst.jsonlist]


Building vocabulary using y batches:

In [163]:
y_vocab.fit(y_list)

Infering from (using) built vocabulary:

In [164]:
y_vocab(['is', 'the', 'of', 'restaurant', 'hi'])

[43, 3, 26, 5, 0]

To call a model `x_vocab(batch)` is the same as to call a \_\_call\_\_ method `x_vocab.__call__(batch)`!

In [165]:
y_vocab(['hi']) == y_vocab.__call__(['hi']) == [141]

False