Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting run_id from Pipeline class or InteractiveContext and using it in components #2773

Closed
calvinleungyk opened this issue Nov 6, 2020 · 5 comments

Comments

@calvinleungyk
Copy link

calvinleungyk commented Nov 6, 2020

For a Pipeline class, the run_id is generated when BeamDagRunner().run() is called, and for InteractiveContext, the run_id is generated when context.run() is called each time and is not exposed as an instance attribute.

Is there a recommended way to use run_id within the components in the pipeline? We would like to use it as an identifier for some Artifacts (e.g. embeddings) that we push to a DB.

@casassg
Copy link
Member

casassg commented Nov 6, 2020

Specifically, we would like to access it from @component decorator. This allows us to push stuff to BigQuery (Driver doesn't know how to version it) and version our outputs there as well.

Any alternative approaches?

Use case specifically: We generate embedding vectors regularly as part of one pipeline. One of our push destinations is BigQuery/DB, however we would like to avoid replacing embeddings to allow for user to query old version of the embedding vectors. We are using datetime and a few hacks to generate a unique id to avoid them being replaced. However it's not ideal.

Our idea was that if we can pull from inside the decorated function the run_id for the current run, then we can use it for versioning ourselves. Something like apache/airflow#8058

@singhniraj08 singhniraj08 self-assigned this Feb 22, 2023
@singhniraj08
Copy link
Contributor

@calvinleungyk,

When defining the pipeline's components, you can use .with_id() method and give the component a custom name which can be used as identifier for that component later as shown here.
Thank you!

Example:


# Normal component definition:
stats_gen = tfx.components.StatisticsGen(...)
# Custom component definition:
stats_gen = tfx.components.StatisticsGen(...).with_id("raw_stats_gen")

@github-actions
Copy link
Contributor

github-actions bot commented Apr 6, 2023

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Apr 6, 2023
@github-actions
Copy link
Contributor

This issue was closed due to lack of activity after being marked stale for past 7 days.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants