Skip to content

Extract checksums to a common state file #2940

@dmpetrov

Description

@dmpetrov

Now all the checksums are scattered among DVC-files. It was a design decision to simplify git merge for ML experiments when a single data-file/dvc-stage changes were localized. However, we learned that in many cases -X theirs strategy is the best way to bring ML experiments to another branch without a manual merging and it is a good time to revisit this design decision.

There are two issues with checksums in many DVC-files:

  1. It makes DVC-files not readable by users
  2. DVC (a tool) has to modify files - not the best practice
  3. It could be convenient to have all the changes as a single file for automation tools (like CD4ML) which usually cannot make a Git commit (after dvc repro). The changes in repo (changed dvc-files) need to be copied to somewhere (e.g. GitLab artifacts).

To solve the issues from the above - it might worth to extract all the checksums into a separate "State"-file. For example: Dvc.state or <anyname>.dvcstate or .dvc/state

Note, this is not the same as the current .dvc/state which is an ephemeral (not committed to Git) DB file. The state file needs to be committed to Git.

Example: Terraform keeps all the infrastructure configuration in *.tf files but stores state in a single, separate file terraform.tfstate.

Related issues: This FR might be related to a single dag FR #1871

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionrequires active participation to reach a conclusionfeature requestRequesting a new feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions