# Attention Based Classification Codelab

**Recommended time: 30 minutes**

**Contributors: nthain, you?**


This codelab provides an introduction to building text classification models in tensorflow that use attention to provide insight into how classification decisions are being made. We will build our tensorflow graph following the Embed - Encode - Attend - Predict paradigm introduced by Matthew Honnibal. For more information about this approach, you can refer to:

Slides: https://goo.gl/BYT7au

Video: https://youtu.be/pzOzmxCR37I


Figure 1 below provides a representation of the full tensorflow graph we will build in this codelab. In this codelab we will implement each piece of this model graph in a seperate function. The whole model will then simply be calling all of these functions in turn. 


![Figure 1](img/entire_model.png "Figure 1")




### Imports

In [1]:
%load_ext autoreload
%autoreload 2

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function


import pandas as pd
import tensorflow as tf
import numpy as np
import time
import os
from sklearn import metrics
from visualize_attention import attentionDisplay
from process_figshare import process_figshare

## Load & Explore Data

Let's begin by loading the data, which is available from [Figshare](https://figshare.com/articles/Wikipedia_Talk_Labels_Toxicity/4563973), into a pandas Dataframe. From the link, download toxicity_annotations.tsv and toxicity_annotated_comments.tsv to the data/ directory. We then run the prepare_figshare function to clean the data and split it into a train, dev, and test set.

In [2]:
process_figshare()

Processed files already exist.


In [2]:
SPLITS = ['train', 'dev', 'test']

wiki = {}
for split in SPLITS:
    wiki[split] = pd.read_csv('data/wiki_%s.csv' % split)

We display the top few rows of the dataframe to see what we're dealing with. The key columns are 'comment' which contains the text of a comment from a Wikipedia talk page and 'toxicity' which contains the fraction of annotators who found this comment to be toxic. More information about the other fields and how this data was collected can be found on [this wiki](https://meta.wikimedia.org/wiki/Research:Detox/Data_Release) and [research paper](https://arxiv.org/abs/1610.08914).


In [3]:
wiki['train'].head(5)

Unnamed: 0,rev_id,toxicity,comment,year,logged_in,ns,sample,split,is_toxic
0,2232.0,0.1,This: :One can make an analogy in mathematical...,2002,True,article,random,train,False
1,4216.0,0.0,` :Clarification for you (and Zundark's righ...,2002,True,user,random,train,False
2,26547.0,0.0,`This is such a fun entry. Devotchka I once...,2002,True,article,random,train,False
3,37330.0,0.3,` I fixed the link; I also removed ``homeopa...,2002,True,article,random,train,False
4,37346.0,0.1,`If they are ``indisputable`` then why does th...,2002,True,article,random,train,False


**TODO(nthain): Visualize the data (histogram)**

### Hyperparameters

**TODO(nthain): Move hyperparameters to a dict and put in an explanation**

In [4]:
MAX_DOCUMENT_LENGTH = 200
EMBEDDING_SIZE = 300
RNN_CELL_SIZE = 128
BATCH_SIZE = 256
ATTENTION_SIZE = 32
ATTENTION_DEPTH = 2

In [5]:
#n_words = 0
MAX_LABEL = 2
WORDS_FEATURE = 'words'
NUM_STEPS = 300

### Step 0: Text Preprocessing

Before we can build a neural network on comment strings, we first have to complete a number of preprocessing steps. In particular, it is important that we "tokenize" the string, splitting it into an array of tokens. In our case, each token will be a word in our sentence and they will be seperated by spaces and punctuation. Many alternative tokenizers exist, some of which use characters as tokens, and others which include punctuation, emojis, or even cleverly handle misspellings. 

Once we've tokenized the sentences, each word will be replaced with an integer representative. This will make the embedding (Step 1) much easier. 

Happily the tensorflow function VocabularyProcessor takes care of both the tokenization and integer mapping. We only have to give it the max_document_length argument which will determine the length of the output arrays. If sentences are shorter than this length, they will be padded and if they are longer, they will be trimmed.

In [6]:
# Initialize the vocabulary processor
vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(MAX_DOCUMENT_LENGTH)

def process_inputs(vocab_processor, df, train_label = 'train', test_label = 'test'):
    
    # For simplicity, we call our features x and our outputs y
    x_train = df['train'].comment
    y_train = df['train'].is_toxic
    x_test = df['test'].comment
    y_test = df['test'].is_toxic

    # Train the vocab_processor from the training set
    x_train = vocab_processor.fit_transform(x_train)
    # Transform our test set with the vocabulary processor
    x_test = vocab_processor.transform(x_test)

    # We need these to be np.arrays instead of generators
    x_train = np.array(list(x_train))
    x_test = np.array(list(x_test))
    y_train = np.array(y_train).astype(int)
    y_test = np.array(y_test).astype(int)

    n_words = len(vocab_processor.vocabulary_)
    print('Total words: %d' % n_words)

    # Return the transformed data and the number of words
    return x_train, y_train, x_test, y_test, n_words

x_train, y_train, x_test, y_test, n_words = process_inputs(vocab_processor, wiki)

Total words: 196427


### Step 1: Embed

Neural networks at their core are a composition of operators from linear algebra and non-linear activation functions. In order to perform these computations on our input sentences, we must first embed them as a vector of numbers. There are two main approaches to perform this embedding:


1.   **Pre-trained:** It is often beneficial to initialize our embedding matrix using pre-trained embeddings like [Word2Vec](??) or [GloVe](??). These embeddings are trained on a huge corpus of text with a general purpose problem so that they incorporate syntactic and semantic properties of the words being embedded and are amenable to transfer learning on new problems. Once initialized, you can optionally train them further for your specific problem by allowing the embedding matrix in the graph to be a trainable variable in our tensorflow graph. 
2.   **Random:** Alternatively, embeddings can be "trained from scratch" by initializing the embedding matrix randomly and then training it like any other parameter in the tensorflow graph.



In this notebook, we will be using a random initialization. To perform this embedding we use the embed_sequence function from the layers package. This will take our input features, which are the arrays of integers we produced in Step 0, and will randomly initialize a matrix to embed them into. The parameters of this matrix will then be trained with the rest of the graph.

**TODO(nthain): Add pre-trained initialization**

In [7]:
def embed(features):
    word_vectors = tf.contrib.layers.embed_sequence(
        features[WORDS_FEATURE], vocab_size=n_words, embed_dim=EMBEDDING_SIZE)
    
    return word_vectors

### Step 2: Encode

A recurrent neural network is a deep learning architecture that is useful for encoding sequential information like sentences. They are built around a single cell which contains one of several standard neural network architectures (e.g. simple RNN, GRU, or LSTM). We will not focus on the details of the architectures, but at each point in time the cell takes in two inputs and produces two outputs. The inputs are the input token for that step in the sequence and some state from the previous steps in the sequence. The outputs produced are the encoded vectors for the current sequence step and a state to pass on to the next step of the sequence. 

Figure 2 shows what this looks like for an unrolled RNN. Each cell (represented by a green square) has two input arrows and two output arrrows. Note that all of the green squares represent the same cell and share parameters. One major advantage of this cell replication is that, at inference time, it allows us to deal with arbitrary length input and not be restricted by the input sizes of our training set.

For our model, we will use a bi-directional RNN. This is simply the concatentation of two RNNs, one which processes the sequence from left to right (the "forward" RNN) and one which process from right to left (the "backward" RNN). By using both directions, we get a stronger encoding as each word can be encoded using the context of its neighbors on boths sides rather than just a single side.  For our cells, we use gated recurrent units (GRUs). 

![Figure 2](img/figure_2.png "Figure 2")


In [8]:
def encode(word_vectors):
  # Create a Gated Recurrent Unit cell with hidden size of RNN_SIZE.
  # Since the forward and backward RNNs will have different parameters, we instantiate two seperate GRUS.
  with tf.variable_scope('forward'):
    rnn_fw_cell = tf.contrib.rnn.GRUCell(RNN_CELL_SIZE)
  with tf.variable_scope('backward'):
    rnn_bw_cell = tf.contrib.rnn.GRUCell(RNN_CELL_SIZE)

  # Create an unrolled Bi-Directional Recurrent Neural Networks to length of
  # MAX_DOCUMENT_LENGTH and passes word_list as inputs for each unit.
  outputs, _ = tf.nn.bidirectional_dynamic_rnn(rnn_fw_cell, rnn_bw_cell, word_vectors, dtype=tf.float32, time_major=False)
  
  return outputs

### Step 3: Attend

There are a number of ways to use the encoded states of a recurrent neural network for prediction. One traditional approach is to simply use the final encoded state of the network, as seen in Figure 2. However, this could lose some useful information encoded in the previous steps of the sequence. In order to keep that information, one could instead use an average of the encoded states outputted by the RNN. There is not reason to believe, though, that all of the encoded states of the RNN are equally valuable. Thus, we arrive at the idea of using a weighted sum of these encoded states to make our prediction.

We will call the weights of this weighted sum "attention weights" as we will see below that they correspond to how important our model thinks each token of the sequence is in making a prediction decision. We compute these attention weights simply by building a small fully connected neural network on top of each encoded state. This network will have a single unit final layer which will correspond to the attention weight we will assign. As for RNNs, the parameters of this network will be the same for each step of the sequence, allowing us to accomodate variable length inputs. Figure 3 shows us what the graph would look like if we applied attention to a uni-directional RNN.

![Figure 3](img/figure_3.png "Figure 3")

In [9]:
def attend(inputs, attention_size, attention_depth):
  
  inputs = tf.concat(inputs, axis = 2)
  
  inputs_shape = inputs.shape
  sequence_length = inputs_shape[1].value
  final_layer_size = inputs_shape[2].value
  
  x = tf.reshape(inputs, [-1, final_layer_size])
  for _ in range(attention_depth-1):
    x = tf.layers.dense(x, attention_size, activation = tf.nn.relu)
  x = tf.layers.dense(x, 1, activation = None)
  logits = tf.reshape(x, [-1, sequence_length, 1])
  alphas = tf.nn.softmax(logits, dim = 1)
  
  output = tf.reduce_sum(inputs * alphas, 1)

  return output, alphas

### Step 4: Predict

To genereate a class prediction about whether a comment is toxic or not, the final part of our tensorflow graph takes the weighted average of hidden states generated in the attention step and uses a fully connected layer with a softmax activation function to generate probability scores for each of our prediction classes. While training, the model will use the cross-entropy loss function to train its parameters. 

As we will use the estimator framework to train our model, we write an estimator_spec function to specify how our model is trained and what values to return during the prediction stage. We also specify the evaluation metrics of accuracy and auc, which we will use to evaluate our model in Step 7.

In [10]:
def estimator_spec_for_softmax_classification(
    logits, labels, mode, alphas):
  """Returns EstimatorSpec instance for softmax classification."""
  predicted_classes = tf.argmax(logits, 1)
  if mode == tf.estimator.ModeKeys.PREDICT:
    export_outputs = {
        'predict_output': tf.estimator.export.PredictOutput({
            'class': predicted_classes,
            'prob': tf.nn.softmax(logits),
            'attention': alphas
        })}
    return tf.estimator.EstimatorSpec(
        mode=mode,
        predictions={
            'class': predicted_classes,
            'prob': tf.nn.softmax(logits),
            'attention': alphas
        },
        export_outputs=export_outputs)

  onehot_labels = tf.one_hot(labels, MAX_LABEL, 1, 0)
  loss = tf.losses.softmax_cross_entropy(
      onehot_labels=onehot_labels, logits=logits)
  if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
    train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

  eval_metric_ops = {
      'accuracy': tf.metrics.accuracy(
          labels=labels, predictions=predicted_classes),
      'auc': tf.metrics.auc(
          labels=labels, predictions=predicted_classes),    
  }
  return tf.estimator.EstimatorSpec(
      mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)


The predict component of our graph then just takes the output of our attention step, i.e. the weighted average of the bi-RNN hidden layers, and adds one more fully connected layer to compute the logits. These logits are fed into a our estimator_spec which uses a softmax to get the final class probabilties and a softmax_cross_entropy to build a loss function.

In [11]:
def predict(encoding, labels, mode, alphas):
    logits = tf.layers.dense(encoding, MAX_LABEL, activation=None)
    return estimator_spec_for_softmax_classification(
          logits=logits, labels=labels, mode=mode, alphas=alphas)

### Step 5: Complete Model Architecture

We are now ready to put it all together. As you can see from the bi_rnn_model function below, once you have the components for embed, encode, attend, and predict, putting the whole graph together is extremely simple!

In [12]:
def bi_rnn_model(features, labels, mode):
  """RNN model to predict from sequence of words to a class."""

  word_vectors = embed(features)
  outputs = encode(word_vectors)
  encoding, alphas = attend(outputs, ATTENTION_SIZE, ATTENTION_DEPTH)

  return predict(encoding, labels, mode, alphas)

![Figure 1](img/entire_model.png "Figure 1")

### Step 6: Train Model

We will use the estimator framework to train our model. To define our classifier, we just provide it with the complete model graph (i.e. the bi_rnn_model function) and a directory where the models will be saved.

In [13]:
current_time = str(int(time.time()))
model_dir = os.path.join('checkpoints', current_time)
classifier = tf.estimator.Estimator(model_fn=bi_rnn_model, model_dir=model_dir)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fde08fcedd0>, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': 'checkpoints/1521593965', '_save_summary_steps': 100}


The estimator framework also requires us to define an input function. This will take the input data and provide it during model training in batches. We will use the provided numpy_input_function, which takes numpy arrays as features and labels. We also specify the batch size and whether we want to shuffle the data between epochs.

In [14]:
# Train.
train_input_fn = tf.estimator.inputs.numpy_input_fn(
  x={WORDS_FEATURE: x_train},
  y=y_train,
  batch_size=BATCH_SIZE,
  num_epochs=None,
  shuffle=True)


Now, it's finally time to train our model! With estimator, this is as easy as calling the train function and specifying how long we'd like to train for.

In [15]:
classifier.train(input_fn=train_input_fn, steps=NUM_STEPS)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into checkpoints/1521593965/model.ckpt.
INFO:tensorflow:loss = 0.69085526, step = 1
INFO:tensorflow:global_step/sec: 2.45811
INFO:tensorflow:loss = 0.1331726, step = 101 (40.689 sec)
INFO:tensorflow:global_step/sec: 2.41729
INFO:tensorflow:loss = 0.136737, step = 201 (41.365 sec)
INFO:tensorflow:Saving checkpoints for 300 into checkpoints/1521593965/model.ckpt.
INFO:tensorflow:Loss for final step: 0.0813523.


<tensorflow.python.estimator.estimator.Estimator at 0x7fde09cd5d50>

### Step 7: Predict and Evaluate Model

To evaluate the function, we will use it to predict the values of examples from our test set. Again, we define a numpy_input_fn, for the test data in this case, and then have the classifier run predictions on this input function.

In [17]:
# Predict.
test_input_fn = tf.estimator.inputs.numpy_input_fn(
  x={WORDS_FEATURE: x_test},
  y=y_test,
  num_epochs=1,
  shuffle=False)

predictions = classifier.predict(input_fn=test_input_fn)

These predictions are returned to us as a generator. The code below gives an example of how we can extract the class and attention weights for each prediction.

In [18]:
y_predicted = []
alphas_predicted = []
for p in predictions:
    y_predicted.append(p['class'])
    alphas_predicted.append(p['attention'])

INFO:tensorflow:Restoring parameters from checkpoints/1521593965/model.ckpt-300


To evaluate our model, we can use the evaluate function provided by estimator to get the accuracy and auc scores as we defined them in our estimator_spec.

In [19]:
scores = classifier.evaluate(input_fn=test_input_fn)
print('Accuracy: {0:f}'.format(scores['accuracy']))
print('AUC: {0:f}'.format(scores['auc']))

INFO:tensorflow:Starting evaluation at 2018-03-21-01:06:51
INFO:tensorflow:Restoring parameters from checkpoints/1521593965/model.ckpt-300
INFO:tensorflow:Finished evaluation at 2018-03-21-01:07:21
INFO:tensorflow:Saving dict for global step 300: accuracy = 0.95521873, auc = 0.874904, global_step = 300, loss = 0.12261739
Accuracy: 0.955219
AUC: 0.874904


### Step 8: Display Attention

Now that we have a trained attention based toxicity model, let's use it to visualize how our model makes its classification decisions. We use the helpful attentionDisplay class from the visualize_attention package. Given any sentence, this class uses our trained classifier to determine whether the sentence is toxic and also returns a representation of the attention weights. In the arrays below, the more red a word is, the more weight classifier puts on encoded word. Try it out on some sentences of your own and see what patterns you can find!

Note: If you are viewing this on Github, the colors in the cells won't display properly. We recommend viewing it locally or with [nbviewer](https://nbviewer.jupyter.org/github/conversationai/conversationai-models/blob/nthain-initial/attention-codelab/Attention_Model_Codelab.ipynb) to see the correct rendering of the attention weights.

In [19]:
display = attentionDisplay(vocab_processor, classifier)

In [22]:
input_string = "Fuck off, you idiot"

In [26]:
pred, attn = display._predict_sentence(input_string)
tokenized_string = display._resize_and_tokenize(input_string)
wordvals = [wordVal(w,v) for w, v in zip(tokenized_string, attn[0])]
word_df = pd.DataFrame(wordvals).transpose()

INFO:tensorflow:Restoring parameters from checkpoints/1521574227/model.ckpt-300


In [21]:
display.display_prediction_attention("Fuck off, you idiot.")

INFO:tensorflow:Restoring parameters from checkpoints/1521574227/model.ckpt-300
Toxic


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,Fuck,off,you,idiot,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [33]:
display.display_prediction_attention("Thanks for your help editing this.")

INFO:tensorflow:Restoring parameters from checkpoints/1520622701/model.ckpt-300
Not toxic


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,Thanks,for,your,help,editing,this,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [34]:
display.display_prediction_attention("You're such an asshole. But thanks anyway.")

INFO:tensorflow:Restoring parameters from checkpoints/1520622701/model.ckpt-300
Toxic


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,You're,such,an,asshole,But,thanks,anyway,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [35]:
display.display_prediction_attention("I'm going to shoot you!")

INFO:tensorflow:Restoring parameters from checkpoints/1520622701/model.ckpt-300
Not toxic


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,I'm,going,to,shoot,you,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [36]:
display.display_prediction_attention("Oh shoot. Well alright.")

INFO:tensorflow:Restoring parameters from checkpoints/1520622701/model.ckpt-300
Not toxic


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,Oh,shoot,Well,alright,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [37]:
display.display_prediction_attention("First of all who the fuck died and made you the god.")

INFO:tensorflow:Restoring parameters from checkpoints/1520622701/model.ckpt-300
Toxic


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,First,of,all,who,the,fuck,died,and,made,you,the,god,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [38]:
display.display_prediction_attention("Gosh darn it!")

INFO:tensorflow:Restoring parameters from checkpoints/1520622701/model.ckpt-300
Toxic


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,Gosh,darn,it,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [39]:
display.display_prediction_attention("God damn it!")

INFO:tensorflow:Restoring parameters from checkpoints/1520622701/model.ckpt-300
Toxic


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,God,damn,it,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [40]:
display.display_prediction_attention("Or else I'll cure your virginity from the wrong hole")

INFO:tensorflow:Restoring parameters from checkpoints/1520622701/model.ckpt-300
Toxic


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,Or,else,I'll,cure,your,virginity,from,the,wrong,hole,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [41]:
display.display_prediction_attention("You're not that smart are you?")

INFO:tensorflow:Restoring parameters from checkpoints/1520622701/model.ckpt-300
Not toxic


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59
0,You're,not,that,smart,are,you,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


# Save and Load Model

In [20]:
def serving_input_receiver_fn():
    inputs = {WORDS_FEATURE: tf.placeholder(shape=[None, MAX_DOCUMENT_LENGTH], dtype=tf.int64)}
    return tf.estimator.export.ServingInputReceiver(inputs, inputs)

current_time = str(int(time.time()))
export_dir_base = "models/200_length_300_embeding/"

full_model_dir = classifier.export_savedmodel(export_dir_base=export_dir_base, serving_input_receiver_fn=serving_input_receiver_fn)

INFO:tensorflow:Restoring parameters from checkpoints/1521593965/model.ckpt-300
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: models/200_length_300_embeding/temp-1521594535/saved_model.pb


In [None]:
with open("tweets.json", "r") as f:
    tweets = json.load(f)
    x_test = self.vocab_processor.transform(tweets)
    x_test = np.array(list(x_test))

In [43]:
full_model_dir = "models/200_length_300_embeding/1521594535/0"

with tf.Session() as sess:
    tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], full_model_dir)
    predictor   = tf.contrib.predictor.from_saved_model(full_model_dir)
    output_dict = predictor({WORDS_FEATURE:x_test})
    predicted_label = output_dict["class"] # label
    attention = output_dict["attention"]
    prob = output_dict["prob"]