How to propagate mlpipeline-metrics from custom Python function TFX component? #3094

axeltidemann · 2021-01-20T14:26:46Z

I want to export mlpipeline-metrics from my custom Python function TFX component so that it is displayed in the KubeFlow UI, as described here: https://www.kubeflow.org/docs/pipelines/sdk/pipelines-metrics/

This is a minimal example of what I am trying to do:

import json

from tfx.dsl.component.experimental.annotations import OutputArtifact
from tfx.dsl.component.experimental.decorators import component
from tfx.types.standard_artifacts import Artifact

class Metric(Artifact):
    TYPE_NAME = 'Metric'

@component
def ShowMetric(MLPipeline_Metrics: OutputArtifact[Metric]):

    rmse_eval = 333.33

    metrics = {
        'metrics':[
            {
                'name': 'RMSE-validation',
                'numberValue': rmse_eval,
                'format': 'RAW'
            }
        ]
    }

    path = '/tmp/mlpipeline-metrics.json'
    
    with open(path, 'w') as _file:
        json.dump(metrics, _file)

    MLPipeline_Metrics.uri = path

In the KubeFlow UI, the "Run output" tab says "No metrics found for this run." However, the output artefact shows up in the ML MetaData (see screenshot). Any help on how to accomplish this would be greatly appreciated. Thanks!

The text was updated successfully, but these errors were encountered:

arghyaganguly · 2021-01-21T11:37:24Z

@axeltidemann, this issue seems relevant to Kubeflow.
Please raise this in Kubeflow issues.
Confirm once , if this seems okay.
Thanks.

axeltidemann · 2021-01-21T11:45:07Z

@arghyaganguly But I see mlpipeline-ui-metadata showing up automatically in the KubeFlow UI, which also is coming from TFX (see https://github.com/tensorflow/tfx/blob/e0cb043ff5d3a9fc33f20b1ce6348518e68352ff/tfx/orchestration/kubeflow/base_component.py). Given that TFX is built on top of KubeFlow and I am using a TFX custom component, it must be a TFX relevant issue, no? How would the KubeFlow team know TFX specific questions? (There could be some overlap of course, I am happy to stand corrected.)

axeltidemann · 2021-01-27T07:25:21Z

Sorry to bother you @jiyongjung0, but I'd really appreciate your input when you have the time. Thanks.

jiyongjung0 · 2021-01-28T00:21:11Z

I'm sorrry for late response. I'm not very familiar with Kubeflow stuff and was finding a better person to respond. @neuromage could you give some help on this issue?

axeltidemann · 2021-01-29T08:58:17Z

It seems mlpipeline-metrics does not get propagated at all, if so it would have been added to the output_artifact_paths dictionary:

tfx/tfx/orchestration/kubeflow/base_component.py

Line 131 in e0cb043

'mlpipeline-ui-metadata': '/mlpipeline-ui-metadata.json',

In addition, it should have been dealt with in the container entry point, like mlpipeline-ui-metadata:

tfx/tfx/orchestration/kubeflow/container_entrypoint.py

Line 291 in 5117638

with open('/mlpipeline-ui-metadata.json', 'w') as f:

Is there a specific reason for this omission? Or maybe something for a pull request?

axeltidemann · 2021-02-04T05:43:56Z

I tried to make changes to the source code of TFX itself (following the instructions here), where I basically implemented the changes above, i.e.

output_artifact_paths={
            'mlpipeline-ui-metadata': '/mlpipeline-ui-metadata.json',
            'mlpipeline-metrics': '/mlpipeline-metrics.json'
        }

in tfx/tfx/orchestration/kubeflow/base_component.py and also hardcoded metrics and file output in tfx/tfx/orchestration/kubeflow/container_entrypoint.py like so:

metrics = {
        'metrics':[
            {
                'name': 'RMSE-validation',
                'numberValue': 777.77,
                'format': 'RAW'
            }
        ]
    }
    
with open('/mlpipeline-metrics.json', 'w') as _file:
        json.dump(metrics, _file)

This was still not picked up by the KubeFlow UI. I assume there are some deeper changes needed, then. Maybe @neuromage can shed some light on this?

neuromage · 2021-02-04T23:38:40Z

Hi @axeltidemann, those changes look correct to me.

/cc @numerology and @chensun, any ideas why the above may not be working?

numerology · 2021-02-05T00:26:19Z

Changing output_artifact_paths in base_component.py should suffice. If that is not picked up by the UI then it seems like a bug to me.

May I ask which KFP version are you using (both SDK and deployment)?

axeltidemann · 2021-02-05T07:26:41Z

Good question. I don't specify which KFP version to use in deployment, I use the tfx CLI. It was my assumption that it creates a docker image from my local installation and uploads that to eu.gcr.io and therefore would use my local KFP version, but I can't figure out how to determine which KFP version is actually used on the cluster. Is there a way to find that out?

These are my local versions, in any case:

>python
Python 3.7.9 (default, Nov 20 2020, 18:45:38) 
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tfx
>>> tfx.__version__
'0.28.0.dev'
>>> import kfp
>>> kfp.__version__
'1.3.0'

chensun · 2021-02-05T23:43:48Z

but I can't figure out how to determine which KFP version is actually used on the cluster. Is there a way to find that out?

If you have access to KFP UI, it's shown in the bottom left corner, for instance:

or if you have kubectl connected to your cluster, you can describe any KFP pod, for example:
kubectl describe pod ml-pipeline-76fddff986-h7hsh -n kubeflow

and the container image label (1.2.0 shown in the below output) is the KFP backend version.

Containers:
  ml-pipeline-api-server:
    Container ID:   docker://a84dc475d6b6fb6e9dc58204e58e6c606498239f38fa12145e93953458bdd045
    Image:          gcr.io/ml-pipeline/api-server:1.2.0

axeltidemann · 2021-02-09T08:32:21Z

Thanks, @chensun. The version displayed is indeed 1.0.4, and the container image label is in the YAML file in the KubeFlow UI: ml-pipeline-api-server: gcr.io/cloud-marketplace/google-cloud-ai-platform/kubeflow-pipelines/apiserver:1.0.4.

However, could it be that local changes I make to TFX are not packaged and uploaded to the KubeFlow cluster in any case?

axeltidemann · 2021-02-17T06:55:54Z

@numerology I suppose I should create a separate Docker image with my changes to TFX, push that to Docker hub, and make the tfx cli use that image. I see when running

tfx pipeline update --pipeline-path=kubeflow_runner.py --endpoint=$ENDPOINT

that the tensorflow/tfx:0.25.0 image is used:

[truncated]
[Skaffold] #3 [internal] load metadata for docker.io/tensorflow/tfx:0.25.0
[Skaffold] #3 sha256:0de1d35ca0abce93f6f1d57543269f062bb56777e77abd8be41593a801cd2d61
[Skaffold] #3 DONE 2.8s
[Skaffold]
[Skaffold] #7 [1/3] FROM docker.io/tensorflow/tfx:0.25.0@sha256:0700c27c6492b8b2998e7d543ca13088db8d40ef26bd5c6eec58245ff8cdec35
[Skaffold] #7 sha256:8e5e2c00eb5ed31ca14860fd9aa40e783fe78ad12be31dc9da89ddad19876dc9
[Skaffold] #7 DONE 0.0s
[truncated]

However, I cannot figure out where to set which Docker image to use. I have even tried searching the repository for load metadata for, but no results came up. Any ideas?

numerology · 2021-02-17T07:05:33Z

@axeltidemann Indeed, in order to do that I believe you'll need to specify the base image when running the CLI command. For example:

tfx pipeline create --pipeline-path=kubeflow_runner.py --endpoint=$ENDPOINT --build_base_image your-docker-hub-repo/your-tfx-image --build_target_image your-docker-hub-repo/your-image-for-this-pipeline

Also please refer to the help message for --build_target_image option in https://github.com/tensorflow/tfx/blob/HEAD/tfx/tools/cli/commands/pipeline.py for advanced image building options.

easadler · 2021-02-17T15:15:24Z

I wanted to mention how important getting Kubeflow metrics into TFX is for my team. I curious if this is no longer an issue in the kubeflow v2 runner? I haven't been able to try it out.

numerology · 2021-02-18T01:05:02Z

@easadler

Kubeflow v2 runner is still being developed. Currently it only compiles TFX DSL objects into KFP IR spec. The story of visualization in Kubeflow v2 runner is being discussed.

/cc @neuromage

axeltidemann · 2021-02-24T12:39:56Z

@numerology I was able to create a custom build-base-image of TFX with the changes I referenced above:

Pulled the TFX code from GitHub.
Made changes as I wrote above.
Created image by following these instructions (in essence ./tfx/tools/docker/build_docker_image.sh)
Renamed (i.e. gave the image the tag eu.gcr.io/my-project/custom-tfx-image), pushed it to GCR.
Specified both build and target image:
tfx pipeline create --engine kubeflow --build-target-image eu.gcr.io/my-project/my-tfx-pipeline --build-base-image eu.grc.io/my-project/custom-tfx-image --endpoint $ENDPOINT --pipeline-path kubeflow_runner.py
I can verify that the changes above are applied, because I can see
[Skaffold] Step 1/4 : FROM eu.gcr.io/my-project/custom-tfx-image
when creating the pipeline.
Another verification step: I print out the hardcoded JSON file in the container_entrypoint.py after writing it, so I am sure it is successfully written.
But alas, no mlpipeline-metrics in the KubeFlow UI.

SDK KFP version is 1.4, and KubeFlow (deployment) is 1.0.4, could this be an issue? It was my understanding that the KubeFlow Pipelines (1.4) and deployment of KubeFlow on Kubernetes (1.0.4) are two different things, and that the version number comparison is meaningless (but please correct me if I am wrong). Do you have any other ideas?

axeltidemann · 2021-03-03T07:53:57Z

@neuromage maybe you have some ideas why the above approach does not work?

axeltidemann · 2021-04-08T15:52:51Z

@neuromage @numerology sorry to bother you again, but do you have any thoughts on this?

ConverJens · 2022-02-15T11:36:36Z

@neuromage @numerology @axeltidemann
What is the status on this? Is it possible to export metrics and custom metadata with TFX in KubeFlow nowadays?

axeltidemann · 2022-02-15T11:49:34Z

No progress from my side, when I have time I'd like to re-try the suggestions I outlined above, just to verify.

github-actions · 2023-04-06T01:52:40Z

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

axeltidemann · 2023-04-06T07:47:18Z

I still haven't had the time, but I'd very much like to keep this issue open.

axeltidemann changed the title ~~How to propagate mlpipeline-metrics from custom Python function component?~~ How to propagate mlpipeline-metrics from custom Python function TFX component? Jan 20, 2021

axeltidemann mentioned this issue Jan 20, 2021

output_artifact_paths defaults aren't actually set in dsl.ContainerOp kubeflow/pipelines#2268

Closed

arghyaganguly self-assigned this Jan 20, 2021

arghyaganguly added type:others type:support and removed type:others labels Jan 20, 2021

arghyaganguly added stat:awaiting response type:others and removed type:support labels Jan 21, 2021

arghyaganguly removed the stat:awaiting response label Jan 21, 2021

arghyaganguly assigned rmothukuru and unassigned arghyaganguly Jan 21, 2021

arghyaganguly added stat:awaiting tensorflower type:bug and removed type:others labels Jan 21, 2021

rmothukuru assigned jiyongjung0 and unassigned rmothukuru Jan 21, 2021

jiyongjung0 assigned neuromage Jan 28, 2021

arghyaganguly added stat:awaiting response and removed stat:awaiting tensorflower labels Feb 9, 2021

singhniraj08 mentioned this issue Jan 13, 2023

Custom visualization in Vertex AI (KubeflowV2DagRunner) #5640

Open

github-actions bot added the stale label Apr 6, 2023

google-ml-butler bot removed stale stat:awaiting response labels Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to propagate mlpipeline-metrics from custom Python function TFX component? #3094

How to propagate mlpipeline-metrics from custom Python function TFX component? #3094

axeltidemann commented Jan 20, 2021

arghyaganguly commented Jan 21, 2021

axeltidemann commented Jan 21, 2021 •

edited

axeltidemann commented Jan 27, 2021

jiyongjung0 commented Jan 28, 2021

axeltidemann commented Jan 29, 2021

axeltidemann commented Feb 4, 2021

neuromage commented Feb 4, 2021

numerology commented Feb 5, 2021

axeltidemann commented Feb 5, 2021

chensun commented Feb 5, 2021

axeltidemann commented Feb 9, 2021

axeltidemann commented Feb 17, 2021

numerology commented Feb 17, 2021

easadler commented Feb 17, 2021

numerology commented Feb 18, 2021

axeltidemann commented Feb 24, 2021 •

edited

axeltidemann commented Mar 3, 2021

axeltidemann commented Apr 8, 2021

ConverJens commented Feb 15, 2022

axeltidemann commented Feb 15, 2022

github-actions bot commented Apr 6, 2023

axeltidemann commented Apr 6, 2023

How to propagate mlpipeline-metrics from custom Python function TFX component? #3094

How to propagate mlpipeline-metrics from custom Python function TFX component? #3094

Comments

axeltidemann commented Jan 20, 2021

arghyaganguly commented Jan 21, 2021

axeltidemann commented Jan 21, 2021 • edited

axeltidemann commented Jan 27, 2021

jiyongjung0 commented Jan 28, 2021

axeltidemann commented Jan 29, 2021

axeltidemann commented Feb 4, 2021

neuromage commented Feb 4, 2021

numerology commented Feb 5, 2021

axeltidemann commented Feb 5, 2021

chensun commented Feb 5, 2021

axeltidemann commented Feb 9, 2021

axeltidemann commented Feb 17, 2021

numerology commented Feb 17, 2021

easadler commented Feb 17, 2021

numerology commented Feb 18, 2021

axeltidemann commented Feb 24, 2021 • edited

axeltidemann commented Mar 3, 2021

axeltidemann commented Apr 8, 2021

ConverJens commented Feb 15, 2022

axeltidemann commented Feb 15, 2022

github-actions bot commented Apr 6, 2023

axeltidemann commented Apr 6, 2023

axeltidemann commented Jan 21, 2021 •

edited

axeltidemann commented Feb 24, 2021 •

edited