-
Notifications
You must be signed in to change notification settings - Fork 724
Verify output artifacts as found from cache before making execution decision #5093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify output artifacts as found from cache before making execution decision #5093
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that here needs similar updates too.
base_driver is the old path, new orchestrator like LocalDagRunner uses above portable code
thanks for the review, will make those changes in a bit! |
@1025KB I'm looking at the differences between base_driver and cache_utils. I was hoping that base_driver could be refactored to use cache_utils (in the same way it is used by tfx.orchestration.portable.launcher), but it seems there are some significant differences, such as the way they compute cache keys from context and the steps in get_cached_context. unless you tell me otherwise (i.e. that I should look further into refactoring them together), I'm just going to copy/paste my new logic into cache_utils |
yep, it's different code path at this point, so logic duplication sounds good. Thanks!! |
@1025KB ready for re-review!
note that the logic differs between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the great contribution!
It seems that we always assume that artifact payloads are in file systems. Should we? |
at this point it's still the case |
PiperOrigin-RevId: 466239336
PiperOrigin-RevId: 466239336
If cache is enabled, and if matching artifacts are found in the cache for a component's outputs, verify the output artifacts before making an execution decision. This avoids a situation in which the artifacts are missing (no object exists at the URI associated with the artifact due to a deletion process outside of the pipeline) and the downstream component that uses this component's outputs fails.
BaseDriver.verify_input_artifacts
to verify the artifacts, use the same method when checking the cache to "fail fast" (avoid using the cache)cc @1025KB to follow up on our discussion on Wednesday