# RISE Camp Capstone Exercise

In this exercise, you will see how many of the projects you've learned about in the last couple days fit together. Those of you who attended last year's RISE Camp will remember the Pong integration exercise that trained an RL policy in Ray and deployed it in Clipper. Today, we're going to extend that verison by tracking some experiments in Flor and encrypting our models with Wave.

We will train models to play Pong. The first two will use [imitation learning](https://blog.statsbot.co/introduction-to-imitation-learning-32334c3b1e7a) to learn how to play, and the third will train a reinforcement learning policy using RLlib and Ray. Flor will track the training processes for all three models. We will also encrypt each one of these models with WAVE and deploy & serve the models in Clipper.

For those of you unfamiliar with imitation learning, we will simply take the state of the game (location of the ball, location of the paddle, etc.) combined with the labeled action of a human player and train a classifier that responds ot the state of the game board with the action to take.

Finally, you'll play a game (or more!) against each of the three models. We'll aggregate the results to see which agent performs the best.

In [None]:
# Python compatibility imports
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import gym
import pong_py
import cloudpickle

# ray imports
import ray
from ray.tune.registry import register_env
from ray.rllib.agents import ppo

import flor

In [None]:
# set Flor metadata for the notebook
flor.setNotebookName('integration.ipynb')

In a separate notebook, we have set up a WAVE client and defined some helper functions that we'll use below. Feel free to look at the `wave-setup.ipynb` file in this directory if you'd like to dig in.

In [None]:
%run wave-setup.ipynb

In [None]:
# call Wave helper function to create granting and receiving entities
orgEntity, recipientEntity = createWaveEntities()

# Imitation Learning

## Model Training

First, we're going to define three functions---`preproc_imitation`, `train_imitation_model`, and `encrypt_model`, which clean the input data, train an imitation learning model, and encrypt that model using Wave, respectively. 

The preprocessing function reads an input CSV and converts the `up`, `down`, and `stay` labels into numerical values. It also normalizes the all the numerical values (the location of the controlled paddle, the location & velocity of the ball, and the previous location of the ball).

In [None]:
# PRE-DEFINED FLOR FUNCTION. PLEASE DO NOT CHANGE.

@flor.func
def preproc_imitation(imitation_data, procd_imitation_data, **kwargs):
    import pandas as pd
    df_data = pd.read_csv(imitation_data)
    df_data.columns = ["label", "paddle_y", "ball_x", "ball_y", "ball_dx", "ball_dy", "x_prev", "y_prev", "user"]
    
    # drop the user column because we don't want to train on it
    df_data = df_data.drop(labels="user", axis=1)

    # discretize the labels
    def convert_label(label):
        """Convert labels into numeric values"""
        if(label=="down"):
            return 1
        elif(label=="up"):
            return 2
        else:
            return 0

    df_data['label'] = df_data['label'].apply(convert_label)
    df_data.loc[:, "paddle_y":"y_prev"] = df_data.loc[:, "paddle_y":"y_prev"]/500.0
    df_data.to_json(procd_imitation_data)

The model training function takes a JSON blob of the cleaned data and fits a SciKit Learn logistic regression model to classify the action to take based on the input features. The model is pickled and dumped into a file.

In [None]:
# PRE-DEFINED FLOR FUNCTION. PLEASE DO NOT CHANGE.

@flor.func
def train_imitation_model(procd_imitation_data, model, **kwargs):
    import cloudpickle
    import pandas as pd
    from sklearn import linear_model
    df_data = pd.read_json(procd_imitation_data)
    
    labels = df_data['label']
    training_data= df_data.drop(['label'], axis=1)

    skmodel = linear_model.LogisticRegression()
    skmodel.fit(training_data, labels)
    with open(model, 'wb') as f:
        cloudpickle.dump(skmodel, f)

Finally, the `encrypt_model` function takes the model we trained above and a handle to a WAVE entity that has access to all models. It uses the WAVE entity to encrypt the model and serializes the ciphered model into a file.

In [None]:
# PRE-DEFINED FLOR FUNCTION. PLEASE DO NOT CHANGE.

@flor.func
def encrypt_model(granting_entity, model, model_tag, encrypted_model, **kwargs):
    import wave3 as wv
    granting_entity = deserializeEntity(granting_entity)
    
    # read the model binary, so we can encrypt it
    with open(model, 'rb') as f:
        model = f.read()
    
    # NOTE: We are relying on a global handle to WAVE here. 
    # In practice, we would have to recreate this handle explicitly.
    encrypt_response = wave.EncryptMessage(
        wv.EncryptMessageParams(
            # the namespace is the organization
            namespace=granting_entity.hash,
            resource="models/pong/" + model_tag,
            content=model))
    
    with open(encrypted_model, 'wb') as f:
        f.write(encrypt_response.ciphertext)

Next, we define a Flor experiment called `pong-imitation` and link together the input data and the functions defined above. 

In [None]:
# CHANGE ME LATER
DATA_FILE = 'imitation-small.csv'

# get the small or large tag from the DATA_FILE variable
model_tag = DATA_FILE.split('.')[0].split('-')[1]

ENTITY_FILE = 'org_entity.bin'

with flor.Experiment('pong-imitation') as ex:
    # load data into an artifact
    imitation_data = ex.artifact(DATA_FILE, 'imitation_data')
    
    # call preprocessing function
    do_preproc_imitation = ex.action(preproc_imitation, [imitation_data])
    procd_imitation_data = ex.artifact('imitation_data.json', 'procd_imitation_data', do_preproc_imitation)
    
    # train the model 
    do_train_imitation_model = ex.action(train_imitation_model, [procd_imitation_data])
    model = ex.artifact('model.pkl', 'model', do_train_imitation_model)
    
    model_tag = ex.literal(name='model_tag', v=model_tag)
    
    # serialize the wave entity, so we can track it as an artifact
    serializeEntity(orgEntity, ENTITY_FILE)
    granting_entity = ex.artifact(ENTITY_FILE, 'granting_entity')
    
    do_encrypt_model = ex.action(encrypt_model, [granting_entity, model, model_tag])
    encrypted_model = ex.artifact('encrypted_model.bin', 'encrypted_model', do_encrypt_model)

In [None]:
encrypted_model.pull(utag=model_tag)
model_location = encrypted_model.resolve_location()

## Model Deployment

In [None]:
# Make logging work correctly in the Jupyter notebook and set up Clipper
import logging
import sys
import subprocess

from clipper_admin import DockerContainerManager, ClipperConnection
from clipper_admin.deployers import python as py_deployer
from clipper_util.auth_deployer import auth_deploy_python_model

logger = logging.getLogger()
logger.setLevel(logging.INFO)

clipper_conn = ClipperConnection(DockerContainerManager())
clipper_conn.stop_all()
clipper_conn.start_clipper()

In [None]:
model_name = "pong-policy"
app_name = "pong-" + model_tag.v

# load the encrypted model into memory
with open(model_location, 'rb') as f:
    ciphered_model = f.read()

# 
auth_deploy_python_model(
    clipper_conn,
    model_name,
    predict,
    wave,
    recipientEntity,
    ciphered_model,    
    version=1,
    input_type="doubles"
)

clipper_conn.register_application(name=app_name, default_output="0", input_type="doubles", slo_micros=100000)
clipper_conn.link_model_to_app(app_name=app_name, model_name=model_name)

In [None]:
clipper_addr = clipper_conn.get_query_addr()

import subprocess32 as subprocess
server_handle = subprocess.Popen(["./start_webserver.sh", clipper_addr], stdout=subprocess.PIPE)
print(server_handle.stdout.readline().strip())

# Reinforcement Learning

In [None]:
@flor.func
def start_ray(**kwargs):
    try:
        ray.get([])
    except:
        ray.init()    
    return {'exit_code': 0}

Instantiate an agent that can be trained using Proximal Policy Optimization (PPO).

Train the `PPOAgent` for some number of iterations.

**EXERCISE:** You will need to experiment with the number of iterations as well as with the configuration to get the agent to learn something reasonable.

Checkpoint the agent so that the relevant model can be saved and deployed to Clipper. We save the name of the checkpoint file in `metadata.json` so the model container knows how to restore the policy checkpoint.

In [None]:
@flor.func
def train_agent(env_config, num_iterations, **kwargs):
    register_env("my_env", lambda ec: pong_py.PongJSEnv())
    agent = ppo.PPOAgent(env="my_env", config={"env_config": {}})
    for i in range(num_iterations):
        result = agent.train()
    checkpoint_path = agent.save()
    return {'checkpoint_path': checkpoint_path}

In [None]:
with flor.Experiment('rl-pong') as ex:
    # make sure that Ray is running before attempting to train a model
    do_start_ray = ex.action(start_ray, [])
    exit_code = ex.literal(name='exit_code', parent=do_start_ray)
    
    # define configurations variables relevant to training the RL model
    env_config = ex.literal({}, 'env_config') # TODO: Fill env_config
    num_iterations = ex.literal(2, 'num_iterations')

    # setup the training action and the save location of the checkpoint
    do_train_agent = ex.action(train_agent, [env_config, num_iterations, exit_code])
    checkpoint = ex.literal(name='checkpoint_path', parent=do_train_agent)

In [None]:
checkpoint.plot()

In [None]:
checkpoint.pull()