Skip to content

Conversation

micahjsmith
Copy link
Contributor

If cache is enabled, and if matching artifacts are found in the cache for a component's outputs, verify the output artifacts before making an execution decision. This avoids a situation in which the artifacts are missing (no object exists at the URI associated with the artifact due to a deletion process outside of the pipeline) and the downstream component that uses this component's outputs fails.

  • since the downstream consumer of the cached artifacts will use BaseDriver.verify_input_artifacts to verify the artifacts, use the same method when checking the cache to "fail fast" (avoid using the cache)
  • note that MLMD ArtifactState is not used at all in the BaseDriver logic. perhaps in a follow-up PR, we can add (1) checks when getting artifacts from cache that the objects are marked LIVE/PUBLISHED and (2) for artifacts from cache that are deemed to be missing given the logic in the current PR, update their state in MLMD to MISSING

cc @1025KB to follow up on our discussion on Wednesday

@gbaned gbaned self-assigned this Jul 29, 2022
@gbaned gbaned requested a review from rthadur July 29, 2022 14:40
@rthadur rthadur requested a review from 1025KB July 29, 2022 14:45
Copy link
Collaborator

@1025KB 1025KB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

Copy link
Collaborator

@1025KB 1025KB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here needs similar updates too.

base_driver is the old path, new orchestrator like LocalDagRunner uses above portable code

@micahjsmith
Copy link
Contributor Author

Note that here needs similar updates too.

base_driver is the old path, new orchestrator like LocalDagRunner uses above portable code

thanks for the review, will make those changes in a bit!

@micahjsmith
Copy link
Contributor Author

@1025KB I'm looking at the differences between base_driver and cache_utils. I was hoping that base_driver could be refactored to use cache_utils (in the same way it is used by tfx.orchestration.portable.launcher), but it seems there are some significant differences, such as the way they compute cache keys from context and the steps in get_cached_context. unless you tell me otherwise (i.e. that I should look further into refactoring them together), I'm just going to copy/paste my new logic into cache_utils

@1025KB
Copy link
Collaborator

1025KB commented Aug 3, 2022

yep, it's different code path at this point, so logic duplication sounds good. Thanks!!

@micahjsmith
Copy link
Contributor Author

@1025KB ready for re-review!

  • refactored the logic of checking whether artifacts exist into artifact_utils.verify_artifacts
  • use artifact_utils.verify_artifacts in three locations: BaseDriver.verify_input_artifacts, BaseDriver.pre_execution, and cache_utils.get_cached_outputs
  • add tests
  • fix all extant lint issues in touched files, sorry for the large diff as a result!

note that the logic differs between cache_utils.get_cached_outputs and BaseDriver.pre_execution/metadata.get_cached_outputs. in the former case, all the matching executions are checked for valid artifacts. in the latter case, only the most recent matching execution is checked for valid artifacts.

@1025KB 1025KB requested a review from jiyongjung0 August 4, 2022 16:56
@1025KB 1025KB requested a review from rcrowe-google August 4, 2022 22:26
Copy link
Contributor

@rcrowe-google rcrowe-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link

@jiyongjung0 jiyongjung0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great contribution!

@rcrowe-google
Copy link
Contributor

It seems that we always assume that artifact payloads are in file systems. Should we?

@1025KB
Copy link
Collaborator

1025KB commented Aug 5, 2022

It seems that we always assume that artifact payloads are in file systems. Should we?

at this point it's still the case

copybara-service bot pushed a commit that referenced this pull request Aug 10, 2022
copybara-service bot pushed a commit that referenced this pull request Aug 10, 2022
@copybara-service copybara-service bot merged commit 4c84135 into tensorflow:master Aug 10, 2022
@micahjsmith micahjsmith deleted the check-cache-hits branch August 11, 2022 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants