# Deploy for Online Prediction

Because our model requires some python-level preprocessing to embed our requests, we will take advantage of [AI Platforms Custom Prediction Routines](https://cloud.google.com/ml-engine/docs/tensorflow/custom-prediction-routines) which allows us to execute custom python code in response to every online prediction request. There are 5 steps to creating a custom prediction routine:

1. Upload Model Artifacts to GCS
2. Implement Predictor interface 
3. Package the prediction code and dependencies
4. Deploy
5. Invoke API


### AI Platform 2.0 Dependency

This code relies on TF 2.0 which AI Platform online prediction doesn't support yet. Prediction works locally, and the model deploys, but online prediction fails because the AI Platform nodes are running TF 1.13.

**We need to re-test this notebook once TF 2.0 is supported for online prediction.** With any luck it will work without any further changes.

In [1]:
PROJECT_ID = 'vijays-sandbox'
BUCKET = 'vijays-sandbox-ml'
MODEL_PATH = '.'
MODEL_NAME = 'headlines'
VERSION_NAME = 'v1'

## 1. Upload Model Artifacts to GCS

Here we upload our model to GCS so that AI Platform can access them.

In [2]:
!gsutil cp $MODEL_PATH/headline_classification_model.h5 gs://$BUCKET/headlines/model/

Copying file://./headline_classification_model.h5 [Content-Type=application/octet-stream]...
/ [1 files][617.9 KiB/617.9 KiB]                                                
Operation completed over 1 objects/617.9 KiB.                                    


## 2. Implement Predictor Interface

Interface Spec: https://cloud.google.com/ml-engine/docs/tensorflow/custom-prediction-routines#predictor-class

This tells AI Platform how to load the model artifacts, and is where we specify our custom prediction code.

In [3]:
%%writefile predictor.py
import os

import tensorflow as tf
import tensorflow_hub as hub

MAX_SEQUENCE_LENGTH = 50

class HeadlinesPredictor(object):
    def __init__(self, model):
      self.model = model
    
    def predict(self, instances, **kwargs):
        texts = instances
        #split sentences into lists of words
        texts = [sentence.split() for sentence in texts] 
        # pad to constant length
        texts = [(sentence + MAX_SEQUENCE_LENGTH * ['<PAD>'])[:MAX_SEQUENCE_LENGTH] for sentence in texts] 
        #embed
        embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim128-with-normalization/1")
        texts = [embed(sentence) for sentence in texts]
        
        dataset = tf.data.Dataset.from_tensor_slices(texts).batch(len(texts))
        return self.model.predict(dataset)
    

    @classmethod
    def from_path(cls, model_dir):
        model = tf.keras.models.load_model(os.path.join(model_dir,'headline_classification_model.h5'))
    
        return cls(model)

Overwriting predictor.py


### Test Predictor Class Works Locally

In [4]:
techcrunch=[
  'Uber shuts down self-driving trucks unit',
  'Grover raises €37M Series A to offer latest tech products as a subscription',
  'Tech companies can now bid on the Pentagon’s $10B cloud contract'
]
nytimes=[
  '‘Lopping,’ ‘Tips’ and the ‘Z-List’: Bias Lawsuit Explores Harvard’s Admissions',
  'A $3B Plan to Turn Hoover Dam into a Giant Battery',
  'A MeToo Reckoning in China’s Workplace Amid Wave of Accusations'
]
github=[
  'Show HN: Moon – 3kb JavaScript UI compiler',
  'Show HN: Hello, a CLI tool for managing social media',
  'Firefox Nightly added support for time-travel debugging'
]
requests = (techcrunch+nytimes+github)

**Warning**: In the subsequent cell, if you get a GPU related error, it's likely because the GPU is still bound to the previous notebook. Release it by shutting down that previous notebook session. Then restart the kernel of the current notebook and the error should resolve.

<img src='assets/shutdown_session.png' width=400>

In [5]:
import predictor

predictor = predictor.HeadlinesPredictor.from_path(MODEL_PATH)
predictor.predict(requests)

W0815 19:14:57.433857 139649201391360 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


array([[6.4379624e-03, 3.5369781e-01, 6.3986421e-01],
       [1.9121055e-05, 1.7003361e-02, 9.8297751e-01],
       [2.0236766e-03, 1.2312700e-01, 8.7484932e-01],
       [1.7740212e-02, 5.8616209e-01, 3.9609775e-01],
       [7.3872650e-01, 6.9254123e-02, 1.9201937e-01],
       [5.9970737e-02, 4.9645030e-01, 4.4357905e-01],
       [9.9892384e-01, 3.9826475e-05, 1.0363412e-03],
       [9.2509478e-01, 3.8148116e-03, 7.1090408e-02],
       [9.7563666e-01, 6.0183660e-04, 2.3761475e-02]], dtype=float32)

## 3. Package Predictor Class and Dependencies

We must package the predictor as a tar.gz source distribution package.

In [None]:
%%writefile setup.py
from setuptools import setup

setup(
    name='headlines_custom_predict_code',
    version='0.1',
    scripts=['predictor.py'],
    install_requires=[
        'tensorflow_hub',
    ])

In [None]:
!python setup.py sdist --formats=gztar

In [None]:
!gsutil cp dist/headlines_custom_predict_code-0.1.tar.gz gs://$BUCKET/headlines/predict_code/

## 4. Deploy

This is similar to how we deploy standard models to AI Platform, with a few extra command line arguments.

In [None]:
!gcloud beta ai-platform models create $MODEL_NAME --regions us-central1 --enable-logging --enable-console-logging

#Change --runtime-version to 2.0 when supported
!gcloud beta ai-platform versions create $VERSION_NAME \
  --model $MODEL_NAME \
  --runtime-version 1.14 \
  --python-version 3.5 \
  --origin gs://$BUCKET/headlines/model/ \
  --package-uris gs://$BUCKET/headlines/predict_code/headlines_custom_predict_code-0.1.tar.gz \
  --prediction-class predictor.HeadlinesPredictor

## 5. Invoke API (this will fail because AI Platform doesn't support 2.0 yet)

In [None]:
import googleapiclient.discovery

service = googleapiclient.discovery.build('ml', 'v1')
name = 'projects/{}/models/{}/versions/{}'.format(PROJECT_ID, MODEL_NAME, VERSION_NAME)

response = service.projects().predict(
    name=name,
    body={'instances': requests}
).execute()

if 'error' in response:
    raise RuntimeError(response['error'])
else:
  print(response['predictions'])