## Toy notebook for Kubeflow pipeline V1 on KF 1.7.0 manifests deployement

* Python function based componet https://www.kubeflow.org/docs/components/pipelines/v1/sdk/python-function-components/
* Build pipeline v1 https://www.kubeflow.org/docs/components/pipelines/v1/sdk/build-pipeline/


The default Kubeflow 1.7.0 Notebooks shall use the following KF SDKs libs

```console
kfp                      1.8.22
kfp-pipeline-spec        0.1.16
kfp-server-api           1.8.5
```

In [1]:
import sys

In [2]:
from platform import python_version
print (f"current platform python version: {python_version()}")

current platform python version: 3.8.10


In [58]:
# safely remove old kfp sdk dependency just in case update doesn't work
!{sys.executable} -m pip uninstall -y kfp-server-api kfp-kubernetes kfp-pipeline-spec

Found existing installation: kfp-server-api 1.8.5
Uninstalling kfp-server-api-1.8.5:
  Successfully uninstalled kfp-server-api-1.8.5
[0mFound existing installation: kfp-pipeline-spec 0.1.16
Uninstalling kfp-pipeline-spec-0.1.16:
  Successfully uninstalled kfp-pipeline-spec-0.1.16


In [59]:
# KF 1.7.0 uses kfp==1.8.22 kfp-server-api==1.8.5 kfp-pipeline-spec==0.1.16
# !{sys.executable} -m pip install --upgrade --user kfp==1.8.22 kfp-server-api==1.8.5 kfp-pipeline-spec==0.1.16
!{sys.executable} -m pip install --upgrade --user kfp==1.8.22



In [5]:
!{sys.executable} -m pip list | grep kfp

kfp                      1.8.22
kfp-pipeline-spec        0.1.16
kfp-server-api           1.8.5


## (Optional) Restart kernel

Restart kernel, after update the KFP SDK

In [6]:
# run kubectl command line to see the quota in the name space
# !kubectl describe quota

### Define ContinerOp Res Helper Function

To define a `set_res_limit` helper function to set the ContainerOp resource for the python function-component in a Quota activated kubernetes namespace

* Difference between 2Gi and 2G 
https://stackoverflow.com/questions/50804915/kubernetes-size-definitions-whats-the-difference-of-gi-and-g/50805048#50805048

In [45]:
from kfp.dsl import ContainerOp

def set_res_limit(op: ContainerOp, mem_req="200Mi", cpu_req="2000m", mem_lim="4000Mi", cpu_lim='4000m'):
    """
    this function helps to set the resource limit for container operators
    op.set_memory_limit('1000Mi') = 1GB
    op.set_cpu_limit('1000m') = 1 cpu core
    """
    return op.set_memory_request(mem_req)\
            .set_memory_limit(mem_lim)\
            .set_cpu_request(cpu_req)\
            .set_cpu_limit(cpu_lim)

### Create a "compiled" folder to save the .yaml

In [46]:
import os

# change this folder path, if necessary
pipeline_path_dir = "/home/jovyan/kf-examples/sdkV1/compiled"

if not os.path.exists(pipeline_path_dir):
    os.makedirs(pipeline_path_dir)

## Building Python function-based components

Reference:
* python function-based components https://www.kubeflow.org/docs/components/pipelines/v1/sdk/python-function-components/

In [47]:
from functools import partial
from kfp.components import create_component_from_func

'''
The kfp.components.create_component_from_func() create a ContainerOp obj, which is used in pipeline
'''
@partial(
    create_component_from_func,
    output_component_file=f"{pipeline_path_dir}/square_component.yaml",
    base_image='python:3.10.11', # a base image with python installed
    # packages_to_install=[
    #     "scipy==1.10.1",
    # ],
    # adding additional libs
)
def square(x: float) -> float:
    # need to cast to float otherwise will run into an error
    # return float(x ** 2)
    return x ** 2

@partial(
    create_component_from_func,
    output_component_file=f"{pipeline_path_dir}/add_component.yaml",
    base_image='python:3.10.11', # a base image with python installed
    # packages_to_install=[
    #     "scipy==1.10.1",
    # ],
    # adding additional libs
)
def add(x: float, z: float) -> float:
    # for some strange reason, the y param name is changed to "true", if i rename to z, it works
    # return float(x + z)
    return x + z

@partial(
    create_component_from_func,
    output_component_file=f"{pipeline_path_dir}/square_root_component.yaml",
    base_image='python:3.10.11', # a base image with python installed
    # packages_to_install=[
    #     "scipy==1.10.1",
    # ],
    # adding additional libs
)
def square_root(x: float) -> float:
    # return float(x ** .5)
    return x ** .5

## Define Pipeline
* Intro Kubeflow pipeline: https://v1-5-branch.kubeflow.org/docs/components/pipelines/introduction/
* Kubeflow pipeline SDK v1: https://v1-5-branch.kubeflow.org/docs/components/pipelines/sdk/sdk-overview/

In [49]:
from kfp.dsl import pipeline
from kfp import dsl

EXPERIMENT_NAME = "demo"
EXPERIMENT_DESC = 'toy pythagorean pipeline'

@pipeline(
    name = EXPERIMENT_NAME,
    description = EXPERIMENT_DESC
)
def pythagorean(a: float, b: float):
    '''local variable'''
    no_artifact_cache = "P0D"
    artifact_cache_today = "P1D"
    # cache_setting = artifact_cache_today
    cache_setting = no_artifact_cache
    
    a_sq_task = square(a)
    # if the limit is set to small, will be OOMKilled by kubernetes
    a_sq_task = set_res_limit(a_sq_task, cpu_req='500m', mem_req='200M')
    a_sq_task.execution_options.caching_strategy.max_cache_staleness = cache_setting
    
    b_sq_task = square(x=b)
    b_sq_task = set_res_limit(b_sq_task, cpu_req='500m', mem_req='200M')
    b_sq_task.execution_options.caching_strategy.max_cache_staleness = cache_setting

    sum_task = add(x=a_sq_task.output, z=b_sq_task.output)
    sum_task = set_res_limit(sum_task, cpu_req='1', mem_req='500M')
    sum_task.execution_options.caching_strategy.max_cache_staleness = cache_setting

    result_task = square_root(sum_task.output)
    result_task = set_res_limit(result_task, cpu_req='1', mem_req='500M')
    result_task.execution_options.caching_strategy.max_cache_staleness = cache_setting

my_pipeline = pythagorean

### (optional) pipeline compile step
use the following command to compile the pipeline to serialized yaml representation

In [50]:
from kfp import compiler

PIPE_LINE_FILE_NAME=f"{pipeline_path_dir}/pythagorean"
compiler.Compiler().compile(my_pipeline, f"{PIPE_LINE_FILE_NAME}.yaml")

### Create Experiment Run

create run label with current data time
```python
from datetime import datetime
from pytz import timezone as ptimezone
ts = datetime.strftime(datetime.now(ptimezone("Europe/Berlin")), "%Y-%m-%d %H-%M-%S")
print(ts)
```

Reference:
* https://stackoverflow.com/questions/25837452/python-get-current-time-in-right-timezone/25887393#25887393

In [51]:
from datetime import datetime
from pytz import timezone as ptimezone

def get_local_time_str(target_tz_str: str = "Europe/Berlin", format_str: str = "%Y-%m-%d %H-%M-%S") -> str:
    """
    this method is created since the local timezone is miss configured on the server
    @param: target timezone str default "Europe/Berlin"
    @param: "%Y-%m-%d %H-%M-%S" returns 2022-07-07 12-08-45
    """
    target_tz = ptimezone(target_tz_str) # create timezone, in python3.9 use standard lib ZoneInfo
    # utc_dt = datetime.now(datetime.timezone.utc)
    target_dt = datetime.now(target_tz)
    return datetime.strftime(target_dt, format_str)

### Config pipeline run
* Setting imagePullSecretes for Pipeline with SDK: https://github.com/kubeflow/pipelines/issues/5843#issuecomment-859799181

In [52]:
# from kubernetes import client as k8s_client
pipeline_config = dsl.PipelineConf()

# pipeline_config.set_image_pull_secrets([k8s_client.V1ObjectReference(name=K8_GIT_SECRET_NAME, namespace=NAME_SPACE)])
# pipeline_config.set_image_pull_policy("Always")
pipeline_config.set_image_pull_policy("IfNotPresent")

pipeline_args = {"a": 1, "b": 2},

In [53]:
import kfp

EXPERIMENT_NAME = "demo"

client = kfp.Client()

NAMESPACE = client.get_user_namespace()
print(NAMESPACE)

kubeflow-kindfor


In [54]:
RUN_NAME = f"pythagorean {get_local_time_str()}"
print(EXPERIMENT_NAME)
print(NAMESPACE)

# client = kfp.Client()
run = client.create_run_from_pipeline_func(
    pipeline_func=my_pipeline,
    arguments = pipeline_args,
    run_name = RUN_NAME,
    pipeline_conf=pipeline_config,
    experiment_name=EXPERIMENT_NAME,
    namespace=NAMESPACE,
)

demo
kubeflow-kindfor


AttributeError: 'tuple' object has no attribute 'items'