# Using MLRUN function locally, as a Kubernetes Job, and in a Workflow
  --------------------------------------------------------------------

#### **notebook how-to's**
* Write and test code in a notebook.
* Convert it to a containerized image.
* Run it on a Kubernetes cluster with shared file or object storage.
* Run it in an automated workflow.

<a id='top'></a>
#### **steps**
**[intall mlrun](#install)**<br>
**[define a new function and its dependencies](#define-function)**<br>
**[test the function code and pipeline locally](#test-locally)**<br>
**[define cluster jobs and build images](#build)**<br>
**[deploy (build) the function container](#deploy-build)**<br>
**[run the function on the cluster](#run-on-cluster)**<br>
**[create and run a KubeFlow Pipeline](#create-pipeline)**<br>

<a id="install" ></a>
______________________________________________
### **install mlrun**

In [1]:
# Uncomment this to install mlrun package, restart the kernel after

# !pip install -U mlrun

In [2]:
# set the UI external URL (will generate ui hyperlinks)
# %env MLRUN_UI_URL=http://<mlrun-ui-url>:<port>

______________________________________________

<a id='define-function'></a>
### **define a new function and its dependencies**

In [1]:
# nuclio: ignore
# do not remove the comment above (it is a directive to nuclio, ignore that cell during build)
# if the nuclio-jupyter package is not installed run !pip install nuclio-jupyter and restart the kernel 
import nuclio 

We use `%nuclio` magic commands to set package dependencies and configuration:

In [2]:
%nuclio cmd -c pip install pandas
%nuclio config spec.build.baseImage = "mlrun/mlrun"

%nuclio: setting spec.build.baseImage to 'mlrun/mlrun'


The ```DataItem```s and the ```context``` within which they are logged are described in the following ```mlrun``` modules (they are included here only for type clarity).

In [3]:
from mlrun.execution import MLClientCtx
from mlrun.datastore import DataItem
from mlrun.artifacts import get_model, update_model

In [4]:
import time
import pandas as pd

def training(
    context: MLClientCtx,
    p1: int = 1,
    p2: int = 2
) -> None:
    """Train a model.

    :param context: The runtime context object.
    :param p1: A model parameter.
    :param p2: Another model parameter.
    """
    # access input metadata, values, and inputs
    print(f'Run: {context.name} (uid={context.uid})')
    print(f'Params: p1={p1}, p2={p2}')
    context.logger.info('started training')
    
    # <insert training code here>
    
    # log the run results (scalar values)
    context.log_result('accuracy', p1 * 2)
    context.log_result('loss', p1 * 3)
    
    # add a lable/tag to this run 
    context.set_label('category', 'tests')
    
    # log a simple artifact + label the artifact 
    # If you want to upload a local file to the artifact repo add src_path=<local-path>
    context.log_artifact('somefile', 
                          body=b'abc is 123', 
                          local_path='myfile.txt')
    
    # create a dataframe artifact 
    df = pd.DataFrame([{'A':10, 'B':100}, {'A':11,'B':110}, {'A':12,'B':120}])
    context.log_dataset('mydf', df=df)
    
    # Log an ML Model artifact, add metrics, params, and labels to it
    # and place it in a subdir ('models') under artifacts path 
    context.log_model('mymodel', body=b'abc is 123', 
                      model_file='model.txt', 
                      metrics={'accuracy':0.85}, parameters={'xx':'abc'},
                      labels={'framework': 'xgboost'},
                      artifact_path=context.artifact_subpath('models'))


In [5]:
def validation(
    context: MLClientCtx,
    model: DataItem
) -> None:
    """Model validation.
    
    Dummy validation function.
    
    :param context: The runtime context object.
    :param model: The extimated model object.
    """
    # access input metadata, values, files, and secrets (passwords)
    print(f'Run: {context.name} (uid={context.uid})')
    context.logger.info('started validation')
    
    # get the model file, class (metadata), and extra_data (dict of key: DataItem)
    model_file, model_obj, _ = get_model(model)

    # update model object elements and data
    update_model(model_obj, parameters={'one_more': 5})

    print(f'path to local copy of model file - {model_file}')
    print('parameters:', model_obj.parameters)
    print('metrics:', model_obj.metrics)
    context.log_artifact('validation', 
                         body=b'<b> validated </b>', 
                         format='html')

The following end-code annotation tells ```nuclio``` to stop parsing the notebook from this cell. _**Please do not remove this cell**_:

In [6]:
# nuclio: end-code

______________________________________________

<a id='test-locally'></a>
### **test the function code and pipeline locally**
The functions above can be tested locally. Parameters, inputs, and outputs can be specified in the API or the `Task` object.

We create a ```function``` which defines the runtime environment (type, code, image, ..) and ```run()``` a job or experiments using that function.

We use the ```local``` runtime by default, later on we will use a ```job``` runtime for running containers, and can use other distributed runners like MpiJob, Spark, Dask, and Nuclio.

In each run we can specify the function, inputs, parameters/hyper-parameters, etc... For more details, see the [mlrun_basics notebook](mlrun_basics.ipynb).

In [7]:
from mlrun import run_local, code_to_function, mlconf, NewTask
from mlrun.platforms.other import auto_mount
mlconf.dbpath = mlconf.dbpath or 'http://mlrun-api:8080'

<b> define the artifact location</b>

In [8]:
from os import path
out = mlconf.artifact_path or path.abspath('./data')
# {{run.uid}} will be substituted with the run id, so output will be written to different directoried per run
artifact_path = path.join(out, '{{run.uid}}')

#### _running and linking multiple tasks_
In this example we run two functions, ```training``` and ```validation``` and we pass the result from one to the other.
We will see in the ```job``` example that linking works even when the tasks are run in a workflow on different processes or containers.

```run_local()``` will run our task on a local function:

Run the training function. Functions can have multiple handlers/methods, here we call the ```training``` handler:

In [9]:
train_run = run_local(NewTask(handler=training, params={'p1': 5}, artifact_path=out))

> 2020-07-28 19:08:17,721 [info] starting run mlrun-20fb64-training uid=b5c028b6a3ad4aa188ffd39a8dd6b4f9  -> http://mlrun-api:8080
Run: mlrun-20fb64-training (uid=b5c028b6a3ad4aa188ffd39a8dd6b4f9)
Params: p1=5, p2=2
> 2020-07-28 19:08:17,790 [info] started training


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...8dd6b4f9,0,Jul 28 19:08:17,completed,mlrun-20fb64-training,v3io_user=adminkind=handlerowner=adminhost=jupyter-7cd6869d6d-kqrtdcategory=tests,,p1=5,accuracy=10loss=15,somefilemydfmymodel


to track results use .show() or .logs() or in CLI: 
!mlrun get run b5c028b6a3ad4aa188ffd39a8dd6b4f9 --project default , !mlrun logs b5c028b6a3ad4aa188ffd39a8dd6b4f9 --project default
> 2020-07-28 19:08:18,032 [info] run executed, status=completed


After the function runs it generates the result widget, you can click the `model` artifact to see its content.

In [10]:
train_run.outputs

{'accuracy': 10,
 'loss': 15,
 'somefile': '/User/workshop/mlrun/basics/data/myfile.txt',
 'mydf': 'store://default/mlrun-20fb64-training_mydf#b5c028b6a3ad4aa188ffd39a8dd6b4f9',
 'mymodel': 'store://default/mlrun-20fb64-training_mymodel#b5c028b6a3ad4aa188ffd39a8dd6b4f9'}

The output from the first training function is passed to the validation function, let's run it:

In [11]:
model = train_run.outputs['mymodel']

validation_run = run_local(NewTask(handler=validation, inputs={'model': model}, artifact_path=out))

> 2020-07-28 19:08:20,067 [info] starting run mlrun-c1cc3c-validation uid=273dd1aea72d4ad0b2f1add49e800300  -> http://mlrun-api:8080
Run: mlrun-c1cc3c-validation (uid=273dd1aea72d4ad0b2f1add49e800300)
> 2020-07-28 19:08:20,164 [info] started validation
path to local copy of model file - /User/workshop/mlrun/basics/data/models/model.txt
parameters: {'xx': 'abc', 'one_more': 5}
metrics: {'accuracy': 0.85}


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...9e800300,0,Jul 28 19:08:20,completed,mlrun-c1cc3c-validation,v3io_user=adminkind=handlerowner=adminhost=jupyter-7cd6869d6d-kqrtd,model,,,validation


to track results use .show() or .logs() or in CLI: 
!mlrun get run 273dd1aea72d4ad0b2f1add49e800300 --project default , !mlrun logs 273dd1aea72d4ad0b2f1add49e800300 --project default
> 2020-07-28 19:08:20,313 [info] run executed, status=completed


______________________________________________

<a id="build"></a>
### **define cluster jobs and build images**

In order to use our function in a cluster we need to package our code and dependencies.

The ```code_to_function``` call will automatically generate a ```function``` object from the current notebook (or a specified file) with its list of dependencies and runtime configuration.

In [12]:
# create an ML function from the notebook, attache it to iguazio data fabric (v3io)
trainer = code_to_function(name='my-trainer', kind='job')

The functions need shared storage (file or object) media to pass and store artifacts.

You can add _**Kubernetes**_ resources like volumes, environment variables, secrets, cpu/mem/gpu, etc. to a function.

```mlrun``` uses _**KubeFlow**_ modifiers (apply) to configure resources, you can build your own or use predefined ones e.g. for [AWS resources](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/aws.py).


##### _**Option 1: Using file volumes for artifacts**_
If your are using [Iguazio data science platform](https://www.iguazio.com/) use the `mount_v3io()` auto-mount modifier.<br>
if you use other k8s PVC volumes you can use the `mlrun.platforms.mount_pvc(..)` modifier with the requiered params.

We will use the `auto_mount()` modifier which auto selects between the options, you can set the PVC volume config via env var:
```
    MLRUN_PVC_MOUNT=<pvc-name>:<mount-path>
```

Applying ```mount_v3io()``` will attach the function to Iguazio's real-time data fabric (mounted by default to _**home**_ of the current user).

**Note**: if the notebook is not on the managed platform (running remotely) you need to create and use a v3io secret, run:

`kubectl create -n <namespace> secret generic my-v3io --from-literal=accessKey=<your access key> --from-literal=username=<your user name> --type v3io/fuse`

and use: `trainer.apply(mount_v3io(user='admin', secret='my-v3io'))`.

So for our current ```training``` function, when using Iguazio data science platform run:

In [13]:
# for auto choice between Iguazio platform and k8s PVC
# should set the env var for PVC: MLRUN_PVC_MOUNT=<pvc-name>:<mount-path>, or use mount_pvc() 
trainer.apply(auto_mount())

<mlrun.runtimes.kubejob.KubejobRuntime at 0x7f99caf95240>

In [14]:
# Uncomment for use with shared PVC volumes, e.g. using the NFS Share in the local k8s install
# from mlrun.platforms import mount_pvc
# trainer.apply(mount_pvc('nfsvol', 'nfsvol', '/home/jovyan/data'))

##### _**Option 2: Using AWS S3 for artifacts**_

In AWS you can use S3 and need to have a `secret` with AWS credentials. An AWS secret can be created with the following command line:

`kubectl create -n <namespace> secret generic my-aws --from-literal=AWS_ACCESS_KEY_ID=<access key> --from-literal=AWS_SECRET_ACCESS_KEY=<secret key>`

To use the secret:

In [15]:
# from kfp.aws import use_aws_secret

In [16]:
# trainer.apply(use_aws_secret(secret_name='my-aws'))
# out = 's3://<your-bucket-name>/jobs/{{run.uid}}'

______________________________________________

<a id="deploy-build"></a>
### **deploy (build) the function container**

The `deploy()` command will build a custom container image (create a cluster build job) from the outlined function dependencies.

If a pre-built container image already exists, pass the `image` name instead. _**Note that the code and params can be updated per run without building a new image**_.

The image is stored in a container repository, and by default it uses the repository configured on the MLRun API service, you can specify your own docker registry by first creating a secret, and adding that secret name to the build configuration:

`kubectl create -n <namespace> secret docker-registry my-docker --docker-server=https://index.docker.io/v1/ --docker-username=<your-user> --docker-password=<your-password> --docker-email=<your-email>`

and run this: `trainer.build_config(image='target/image:tag', secret='my_docker')`

In [17]:
trainer.deploy()

> 2020-07-28 19:08:38,188 [info] starting remote build, image: .mlrun/func-default-my-trainer-latest
[36mINFO[0m[0000] Resolved base name mlrun/mlrun:0.5.1-rc1 to mlrun/mlrun:0.5.1-rc1 
[36mINFO[0m[0000] Resolved base name mlrun/mlrun:0.5.1-rc1 to mlrun/mlrun:0.5.1-rc1 
[36mINFO[0m[0000] Retrieving image manifest mlrun/mlrun:0.5.1-rc1 
[36mINFO[0m[0000] Retrieving image manifest mlrun/mlrun:0.5.1-rc1 
[36mINFO[0m[0000] Built cross stage deps: map[]                
[36mINFO[0m[0000] Retrieving image manifest mlrun/mlrun:0.5.1-rc1 
[36mINFO[0m[0000] Retrieving image manifest mlrun/mlrun:0.5.1-rc1 
[36mINFO[0m[0000] Unpacking rootfs as cmd RUN pip install pandas requires it. 
[36mINFO[0m[0023] Taking snapshot of full filesystem...        
[36mINFO[0m[0132] Resolving paths                              
[36mINFO[0m[0137] RUN pip install pandas                       
[36mINFO[0m[0137] cmd: /bin/sh                                 
[36mINFO[0m[0137] args: [-c pip inst

True

<a id="run-on-cluster"></a>
### **run the function on the cluster**


In case we made changes to the code, ```with_code``` will inject the latest code into the function (it doesn't require a new build).

In [18]:
trainer.with_code()

<mlrun.runtimes.kubejob.KubejobRuntime at 0x7f99caf95240>

In [19]:
# create the base task (common to both steps), and set the output path and experiment label
base_task = NewTask(artifact_path=out).set_label('stage', 'dev')

In [20]:
# run our training task, with hyper params, and select the one with max accuracy
train_task = NewTask(name='my-training', handler='training', params={'p1': 9}, base=base_task)
train_run = trainer.run(train_task)

> 2020-07-28 19:11:22,455 [info] starting run my-training uid=1c6513fdc0b244a69aac07c9e52c73aa  -> http://mlrun-api:8080
> 2020-07-28 19:11:22,613 [info] Job is running in the background, pod: my-training-nhzwt
Run: my-training (uid=1c6513fdc0b244a69aac07c9e52c73aa)
Params: p1=9, p2=2
> 2020-07-28 19:11:34,240 [info] started training
> 2020-07-28 19:11:34,450 [info] run executed, status=completed
final state: succeeded


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...e52c73aa,0,Jul 28 19:11:34,completed,my-training,stage=devv3io_user=adminkind=jobowner=adminhost=my-training-nhzwtcategory=tests,,p1=9,accuracy=18loss=27,somefilemydfmymodel


to track results use .show() or .logs() or in CLI: 
!mlrun get run 1c6513fdc0b244a69aac07c9e52c73aa --project default , !mlrun logs 1c6513fdc0b244a69aac07c9e52c73aa --project default
> 2020-07-28 19:11:41,809 [info] run executed, status=completed


In [21]:
# running validation, use the model result from the previos step 
model = train_run.outputs['mymodel']
trainer.run(base_task, handler='validation', inputs={'model': model}, watch=True)

> 2020-07-28 19:11:41,814 [info] starting run my-trainer-validation uid=e0201392626549d3b76b07d471a45468  -> http://mlrun-api:8080
> 2020-07-28 19:11:42,009 [info] Job is running in the background, pod: my-trainer-validation-cm87x
Run: my-trainer-validation (uid=e0201392626549d3b76b07d471a45468)
> 2020-07-28 19:11:46,670 [info] started validation
path to local copy of model file - /User/workshop/mlrun/basics/data/models/model.txt
parameters: {'xx': 'abc', 'one_more': 5}
metrics: {'accuracy': 0.85}
> 2020-07-28 19:11:46,807 [info] run executed, status=completed
final state: succeeded


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...71a45468,0,Jul 28 19:11:46,completed,my-trainer-validation,stage=devv3io_user=adminkind=jobowner=adminhost=my-trainer-validation-cm87x,model,,,validation


to track results use .show() or .logs() or in CLI: 
!mlrun get run e0201392626549d3b76b07d471a45468 --project default , !mlrun logs e0201392626549d3b76b07d471a45468 --project default
> 2020-07-28 19:11:51,161 [info] run executed, status=completed


<mlrun.model.RunObject at 0x7f99caf87710>

______________________________________________

<a id="create-pipeline"></a>
### **create and run a KubeFlow pipeline**

KubeFlow pipelines are used for workflow automation--we compose a graph of functions and specify parameters, inputs and outputs.

As ilustrated below, we can chain the outputs and inputs of the pipeline steps.

In [45]:
import kfp
from kfp import dsl
from mlrun import run_pipeline

In [46]:
@dsl.pipeline(
    name = 'job test',
    description = 'demonstrating mlrun usage'
)
def job_pipeline(
   p1: int = 9
) -> None:
    """Define our pipeline.
    
    :param p1: A model parameter.
    """

    train = trainer.as_step(handler='training',
                            params={'p1': p1},
                            outputs=['mymodel'])
    
    validate = trainer.as_step(handler='validation',
                               inputs={'model': train.outputs['mymodel']},
                               outputs=['validation'])
    

The job pipeline can compiled to a yaml file that can be used for debugging:

In [47]:
kfp.compiler.Compiler().compile(job_pipeline, 'jobpipe.yaml')

#### running the pipeline

Pipeline results are stored at the `artifact_path` location:

However, by adding ```/{{workflow.uid}}``` to the path ```mlrun``` will generate a unique folder per workflow.

In [48]:
artifact_path = 'v3io:///users/admin/kfp/{{workflow.uid}}/'

In [49]:
arguments = {'p1': 8}
run_id = run_pipeline(job_pipeline, arguments, experiment='my-job', artifact_path=artifact_path)

[mlrun] 2020-06-08 21:21:01,934 Pipeline run id=a9243134-039a-4e38-9de3-dbcb90b54864, check UI or DB for progress


[top](#top)

In [51]:
from mlrun import wait_for_pipeline_completion, get_run_db
wait_for_pipeline_completion(run_id)
db = get_run_db().connect()
db.list_runs(project='default', labels=f'workflow={run_id}').show()

project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...06335fc8,0,Jun 08 21:21:19,completed,my-trainer-validation,v3io_user=adminowner=adminworkflow=a9243134-039a-4e38-9de3-dbcb90b54864kind=jobhost=my-trainer-validation-gtbdf,model,,,validation
default,...0121511c,0,Jun 08 21:21:09,completed,my-trainer-training,v3io_user=adminowner=adminworkflow=a9243134-039a-4e38-9de3-dbcb90b54864kind=jobhost=my-trainer-training-fr5fjcategory=tests,,p1=8,accuracy=16loss=24,somefilemydfmymodel
