Skip to content

track field instead of cache in dvc.yaml outputs #6549

@iesahin

Description

@iesahin

Currently dvc.yamls cache: false has multiple meanings:

  • Don't track this file at all, I don't intend to track it, it will be discarded.
  • Don't track this file at all, I'm tracking it in Git.
  • Don't track this file, I'm using it in dvc exp as an intermediate file.

It seems having a more granular way to express the intent to keep the file away from the cache is needed.

Instead of

extract:
     ...
    outs:
    - models/mymodel.h5
       cache: false

we can have

extract:
     ...
    outs:
    - models/mymodel.h5
       track: {cache, git, exp, lock, ignore}

which defaults to cache, (= cache: true in the current version.)

For backward compatibility, cache: true can work as track: cache, and cache: false as cache: git.

  • track: cache means use a .dvc file to track the changes and use the cache. This is the default.
  • track: git means don't use the cache and don't update the .gitignore file. Do nothing about this file in dvc exp run
  • track: exp means don't use the cache but dvc exp run will checkout this file to .dvc/tmp/exps in dvc exp run.
  • track: lock means no need to create a .dvc file but the changes will be tracked in dvc.lock file. This is for the intermediate files produced in the pipeline.
  • track: ignore means both DVC and Git will ignore this file. DVC will ensure that the file is in .gitignore and won't track it.

There may be more granular options in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: object-storageRelated to the object/content-addressable storage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions