# Models serving with DeepPavlov

DeepPavlov supports out of the box serving for pre-trained models and custom ones.
Serving can be done with:
* [REST API](http://docs.deeppavlov.ai/en/master/intro/features.html#examples-of-some-components)
* [Telegram](http://docs.deeppavlov.ai/en/master/intro/features.html#examples-of-some-components)
* [Amazon Alexa](http://docs.deeppavlov.ai/en/master/devguides/amazon_alexa.html)
* [Microsoft Bot Framework](http://docs.deeppavlov.ai/en/master/devguides/ms_bot_integration.html)
  * Bing, Cortana, Email, Facebook Messenger, Slack, GroupMe, Microsoft Teams, Skype, Telegram, Twilio, Web Chat
* [Yandex Alice](http://docs.deeppavlov.ai/en/master/devguides/yandex_alice.html)


## Serving DeepPavlov pre-trained models


DeepPavlov has one-line commands to serve models:

Run model in CLI:
```
python -m deeppavlov interact model_config
```

Serve model with REST API:
```
python -m deeppavlov riseapi model_config
```

Serve model with Telegram:
```
python -m deeppavlov interactbot model_config -t <TELEGRAM_TOKEN>
```


Let's try some of them for Goal Oriented bot trained on DSTC 2 dataset. This bot is trained to suggest restaurants in Cambridge area.


Install DeepPavlov library

In [None]:
! pip install deeppavlov

In [None]:
import deeppavlov

Install requirements for Goal Oriented bot:

In [None]:
! python -m deeppavlov install gobot_dstc2

Download pre-trained model:

In [None]:
! python -m deeppavlov download gobot_dstc2

Run with CLI:

In [None]:
! python -m deeppavlov interact gobot_dstc2

Serving with Telegram:
```
python -m deeppavlov interactbot gobot_dstc2 -t <TELEGRAM_TOKEN>
```

Telegram token can be created with @BotFather bot. Details by this [link](https://core.telegram.org/bots#3-how-do-i-create-a-bot).

Once you got Telegram token you can run the Goal Oriented bot.

In [None]:
! python -m deeppavlov interactbot gobot_dstc2 -t <YOUR_TELEGRAM_TOKEN>

## Serving custom models

We have already discussed how to serve pre-trained DeepPavlov models. But how to use deeppavlov to serve custom ones?

### Say Hi Example

Let's consider simple example:

In [None]:
class SayHiModel:
  def __init__(self, *args, **kwargs):
    pass
  
  def __call__(self, input_texts):
    '''
    __call__ method should return responses for each utterance in input_texts
    '''
    output_text = []
    for text in input_texts:
      output_text.append('Hi!')
    return output_text

Here we define utilitary function to generate configuration file, we need such kind of configurations for DeepPavlov lib.

In [None]:
def generate_config(class_name):
  """generate minimal required DeepPavlov model configuration"""
  
  config = {
    'chainer': {
        'in': ['x'],
        'out': ['y'],
        'pipe': [
            {
                'class_name': f'__main__:{class_name}',
                'in': ['x'],
                'out': ['y']
            }
        ]
    }
  }
  return config

Serving with Python API:

In [None]:
# to interact with CLI
from deeppavlov.core.commands.infer import interact_model
# to interact with Telegram
from deeppavlov.utils.telegram.telegram_ui import interact_model_by_telegram


In [None]:
interact_model(generate_config('SayHiModel'))

In [None]:
interact_model_by_telegram(generate_config('SayHiModel'), token='YOUR_TOKEN')

### Serving BERT Generator from Day 4 Tutor

Install requirements and download model:

In [None]:
! pip install git+https://github.com/deepmipt/bert.git@feat/multi_gpu
! wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
! unzip uncased_L-12_H-768_A-12.zip

Define all required code from day 4 tutor in single cell:

In [None]:
import deeppavlov
from deeppavlov.models.preprocessors.bert_preprocessor import BertPreprocessor

from bert_dp import modeling


BERT_MODEL_PATH = './uncased_L-12_H-768_A-12/'

bert_config = modeling.BertConfig.from_json_file(BERT_MODEL_PATH + 'bert_config.json')

import tensorflow as tf

# we should define placeholders for BERT model
input_ids_ph = tf.placeholder(shape=(None, None), dtype=tf.int32)
input_masks_ph = tf.placeholder(shape=(None, None), dtype=tf.int32)
token_types_ph = tf.placeholder(shape=(None, None), dtype=tf.int32)
is_train_ph = tf.placeholder_with_default(False, shape=[])

# this will build Tensorflow graph for BERT model
bert_model = modeling.BertModel(config=bert_config,
                                is_training=is_train_ph,
                                input_ids=input_ids_ph,
                                input_mask=input_masks_ph,
                                token_type_ids=token_types_ph,
                                use_one_hot_embeddings=False)

def gather_indexes(sequence_tensor, positions):
    """Gathers the vectors at the specific positions over a minibatch."""
    sequence_shape = modeling.get_shape_list(sequence_tensor, expected_rank=3)
    batch_size = sequence_shape[0]
    seq_length = sequence_shape[1]
    width = sequence_shape[2]

    flat_offsets = tf.reshape(
      tf.range(0, batch_size, dtype=tf.int32) * seq_length, [-1, 1])
    flat_positions = tf.reshape(positions + flat_offsets, [-1])
    flat_sequence_tensor = tf.reshape(sequence_tensor,
                                    [batch_size * seq_length, width])
    output_tensor = tf.gather(flat_sequence_tensor, flat_positions)
    return output_tensor

def get_masked_lm_output(bert_config, input_tensor, output_weights, positions):
    """Get probabilies for the masked LM.
    
    bert_config - instance of BertConfig
    input_tensor - output of bert_model.get_sequence_output()
    output_weights - projection matrix, here we use embeddings matrix and then transpose it
    positions - posistions of MASKED tokens, i.e. at witch positions we want to make predictions
    """
    input_tensor = gather_indexes(input_tensor, positions)

    with tf.variable_scope("cls/predictions"):
        # We apply one more non-linear transformation before the output layer.
        with tf.variable_scope("transform"):
            input_tensor = tf.layers.dense(
              input_tensor,
              units=bert_config.hidden_size,
              activation=modeling.get_activation(bert_config.hidden_act),
              kernel_initializer=modeling.create_initializer(
                  bert_config.initializer_range))
            input_tensor = modeling.layer_norm(input_tensor)

        # The output weights are the same as the input embeddings, but there is
        # an output-only bias for each token.
        output_bias = tf.get_variable(
            "output_bias",
            shape=[bert_config.vocab_size],
            initializer=tf.zeros_initializer())
        logits = tf.matmul(input_tensor, output_weights, transpose_b=True)
        logits = tf.nn.bias_add(logits, output_bias)
        probs = tf.nn.softmax(logits, axis=-1)

    return probs
  
# define placeholder for MASKED tokens positions
masked_lm_positions_ph = tf.placeholder(shape=(None, None), dtype=tf.int32)

# define predictions for MASKED tokens 
masked_lm_probs = get_masked_lm_output(bert_config, 
                                       bert_model.get_sequence_output(),
                                       bert_model.get_embedding_table(),
                                       masked_lm_positions_ph)

# define TensorFlow session
sess_config = tf.ConfigProto(allow_soft_placement=True)
sess_config.gpu_options.allow_growth = True
sess = tf.Session(config=sess_config)

init_checkpoint = BERT_MODEL_PATH + 'bert_model.ckpt'

# load from checkpoint
tvars = tf.trainable_variables()
assignment_map, initialized_variable_names = modeling.get_assignment_map_from_checkpoint(tvars, init_checkpoint)
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)

sess.run(tf.global_variables_initializer())

from bert_dp import tokenization

tokenizer = tokenization.FullTokenizer(
    vocab_file=BERT_MODEL_PATH + 'vocab.txt',
    do_lower_case=True,
)

MASK_TOKEN = '[MASK]'
MASK_ID = tokenizer.convert_tokens_to_ids([MASK_TOKEN])[0]

from copy import deepcopy
import numpy as np

def append_tokens(input_example, token=MASK_TOKEN, token_id=MASK_ID, n=3):
    """
    This function appends `token` to `input_example` `n` times.
    Also, it maintains correct values for `input_mask`, `input_ids`, `input_type_ids`.
    Don't forget that [SEP] token is always the last token.
    
    input_example - result of BertPreprocessor with tokens, input_ids, ...
    token - token to append
    token_id - token id to append
    n - how many times to append token to input_example
    """
    input_example = deepcopy(input_example)
    max_seq_len = len(input_example.input_mask)
    input_len = sum(input_example.input_mask)
    
    # new_tokens = YOUR CODE HERE
    new_tokens = (input_example.tokens[:input_len - 1] + [token] * n + input_example.tokens[input_len-1:])[:max_seq_len]
    input_example.tokens = new_tokens
    assert len(new_tokens) <= max_seq_len
    
    # new_input_mask = YOUR CODE HERE
    new_input_mask = (input_example.input_mask[:input_len - 1] + [1] * n + input_example.input_mask[input_len-1:])[:max_seq_len]
    input_example.input_mask = new_input_mask
    assert len(new_input_mask) <= max_seq_len
    
    # new_input_ids = YOUR CODE HERE
    new_input_ids = (input_example.input_ids[:input_len - 1] + [token_id] * n + input_example.input_ids[input_len-1:])[:max_seq_len]
    input_example.input_ids = new_input_ids
    assert len(new_input_ids) <= max_seq_len
    
    # new_input_type_ids = YOUR CODE HERE
    new_input_type_ids = (input_example.input_type_ids[:input_len - 1] + [1] * n + input_example.input_type_ids[input_len-1:])[:max_seq_len]
    input_example.input_type_ids = new_input_type_ids
    assert len(new_input_type_ids) <= max_seq_len
    
    return input_example, [i for i in range(len(input_example.tokens)) if input_example.tokens[i] == MASK_TOKEN]
  

def generate_text(input_example, sampling_method='greedy', mask_tokens_n=3, max_generated_tokens=15):
    """
    This function generates text using input_example as initial text.
    
    Text generation stops when one of ['.', '?', '!'] symbols is predicted or 
    achieved number of `max_generated_tokens`
    """
    generated_example = deepcopy(input_example)
    for i in range(max_generated_tokens):
        # Firstly, we append [MASK] tokens to the end of a text.
        # If mask_tokens_n is too small (e.g., 1) then model will predict "." and generation will stop.
        # It happens because BERT learned that the last token in sentences is usually ".".
        masked_input_example, masked_lm_positions = append_tokens(generated_example, n=mask_tokens_n)
        
        # get distribution over vocabulary for the first masked token
        probs = sess.run(masked_lm_probs, feed_dict={
            input_ids_ph: [masked_input_example.input_ids],
            input_masks_ph: [masked_input_example.input_mask],
            token_types_ph: [masked_input_example.input_type_ids],
            masked_lm_positions_ph: [masked_lm_positions],
        })[0]
        
        # sample token from vocabulary using probs
        if sampling_method == 'greedy':
            next_token_id = np.argmax(probs)
        else:
            next_token_id = sampling_method(probs)
        
        # append generated token to text
        next_token = tokenizer.convert_ids_to_tokens([next_token_id])[0]    
        generated_example, _ = append_tokens(generated_example, token=next_token, token_id=next_token_id, n=1)
        
        if generated_example.tokens[-2] in ['.', '?', '!']:
            break

    return generated_example
  

def top_k_sampling(probs, k=10):
    """
    Sample from k tokens with the highest probabilities.
    Don't forget to re-normalize top k probs.
    """
    #### YOUR CODE HERE START ####
    # get top k indicies from probs
    top_k_tokens_ids = np.argsort(probs)[::-1][:k]
    # get top k probabilites using top_k_tokens_ids
    top_k_probs = probs[top_k_tokens_ids]
    # make sure that sum of top_k_probs == 1
    top_k_probs = top_k_probs / sum(top_k_probs)
    #### YOUR CODE HERE END ####
    return top_k_tokens_ids[np.argmax(np.random.multinomial(n=1, pvals=top_k_probs))]


Definge BERT generator model:

In [None]:
class BertGenerator:
  def __init__(self, *args, **kwargs):
    self.bp = BertPreprocessor(vocab_file=BERT_MODEL_PATH + 'vocab.txt', do_lower_case=True, max_seq_length=32)
  
  def __call__(self, input_texts):
    '''
    __call__ method should return responses for each utterance in input_texts
    '''
    output_text = []
    for text in input_texts:
      input_example = self.bp(texts_a = [f'- {text}'], texts_b = ['- '])[0]

      top_k_10_sampling = lambda x: top_k_sampling(x, 10)
      generated_example = generate_text(input_example, sampling_method=top_k_10_sampling)
      sep_index = generated_example.tokens.index('[SEP]')
      response = ' '.join(generated_example.tokens[sep_index + 2:-1]).replace(' ##', '').replace('##', '')
      output_text.append(response)
      return output_text

Interact with CLI:

In [None]:
interact_model(generate_config('BertGenerator'))

Interact with Telegram:

In [None]:
interact_model_by_telegram(generate_config('BertGenerator'), token='YOUR_TOKEN')