# TrackML Kubeflow Pipeline

This notebook assumes that you have already set up a GKE cluster with Kubeflow installed. Currently, this notebook must be run from the Kubeflow JupyterHub installation.

In this notebook, we will show how to:

* Interactively define a Kubeflow Pipeline using the Pipelines Python SDK
* Submit and run the pipeline

## Setup

Do some imports and set some variables.  Set the `WORKING_DIR` to a path under the Cloud Storage bucket you created earlier.

In [None]:
import kfp  # the Pipelines SDK.  This library is included with the notebook image.
from kfp import compiler
import kfp.dsl as dsl
import kfp.gcp as gcp
import kfp.notebook

In [None]:
KUBECTL_IMAGE = "gcr.io/mcas-195423/trackml_master_kfp_kubectl"
KUBECTL_IMAGE_VERSION = "1"
TRACKML_TRAIN_IMAGE = "gcr.io/mcas-195423/trackml_master_trackml"
TRACKML_TRAIN_VERSION = "1"
TRACKML_RESULTSGEN_IMAGE = "gcr.io/mcas-195423/trackml_master_trackml"
TRACKML_RESULTSGEN_VERSION = "1"
TRACKML_SCORE_IMAGE = "gcr.io/mcas-195423/trackml_master_trackml"
TRACKML_SCORE_VERSION = "1"

## Create an *Experiment* in the Kubeflow Pipeline System

The Kubeflow Pipeline system requires an "Experiment" to group pipeline runs. You can create a new experiment, or call `client.list_experiments()` to get existing ones.

In [None]:
# Note that this notebook should be running in JupyterHub in the same cluster as the pipeline system.
# Otherwise, additional config would be required to connect.
client = kfp.Client()
client.list_experiments()

In [None]:
exp = client.create_experiment(name='trackml_notebook')

## Define a Pipeline

Authoring a pipeline is like authoring a normal Python function. The pipeline function describes the topology of the pipeline. 

Each step in the pipeline is typically a `ContainerOp` --- a simple class or function describing how to interact with a docker container image. In the pipeline, all the container images referenced in the pipeline are already built. 

The pipeline starts by training a model. When it finishes, it exports the model in a form suitable for serving by [TensorFlow serving](https://github.com/tensorflow/serving/).

The next step deploys a TF-serving instance with that model.

The last step generates a results file.

In [None]:
def train_op():
  return dsl.ContainerOp(
    name='train',
    image="{}:{}".format(TRACKML_TRAIN_IMAGE, TRACKML_TRAIN_VERSION),
    command=["python"],
    arguments=["train.py"],
  ).apply(gcp.use_gcp_secret()
  )#.set_gpu_limit(1)

def serve_op():
  return dsl.ContainerOp(
    name='serve',
    image="{}:{}".format(KUBECTL_IMAGE, KUBECTL_IMAGE_VERSION),
    arguments=[
      "/src/set_kubectl.sh",
      "--namespace", "kubeflow",
      "--command", "apply -f /src/k8s/serve.yaml",
    ]
  ).apply(gcp.use_gcp_secret())

def resultsgen_op():
  return dsl.ContainerOp(
    name='resultsgen',
    image="{}:{}".format(TRACKML_RESULTSGEN_IMAGE, TRACKML_RESULTSGEN_VERSION),
    command=["python"],
    arguments=["resultsgen.py"],
  ).apply(gcp.use_gcp_secret())

In [None]:
@dsl.pipeline(
  name='trackml',
  description='A pipeline that predicts particle tracks'
)
def trackml():
  train = train_op()

  serve = serve_op()
  serve.after(train)

  resultsgen = resultsgen_op()
  resultsgen.after(serve)

## Submit an experiment *run*

In [None]:
compiler.Compiler().compile(trackml, 'trackml.tar.gz')

The call below will run the compiled pipeline.

In [None]:
run = client.run_pipeline(exp.id, 'trackml', 'trackml.tar.gz')