Skip to content

Future dvc.yaml 2.0 improvements #5312

@jorgeorpinel

Description

@jorgeorpinel

UPDATE: Jump to #5312 (comment) for remaining discussions

Consolidates #5164, #5165, and #5180.
For context, see existing docs in https://dvc.org/doc/user-guide/dvc-files/advanced-dvc-yaml.

Some remaining concerns in order or relevance (only 1 and 2.1 seem more or less worth considering in the immediate term).

The min. purpose of this issue is to at least decide which, if any, to address, and that we can do it in the future without breaking backward-compatibility.

1. Behavior: default params file & merging of values

We decided to always include * from params.yaml in vars. Trying to redefine values (via other file includes or write-in vars) fails except if they're objects that can be merged (no leaf node conflicts).

What about reconsidering the "merging" of vals/objects? Instead we could rename tree paths so there's no possibility of conflicts (@shcheklein's idea). E.g.:

# params.yaml
foo: "foo"
// params.json
{"foo": "bar"}
vars:
- foo # use as ${foo}, ${params.yaml.foo}, or ${paramsyaml.foo}
- params.json:foo # use as ${params.json.foo} or ${paramsjson.foo}
vars:
- foo # use as ${params.yaml:foo} or ${paramsyaml.foo}
- params.json:foo # use as ${params.json.foo} or ${paramsjson.foo}
- foo: "baz" # use as ${foo} or ${globals.foo}

2. Syntax (minor)

Is it worth renaming certain keywords for accuracy? The suggestions below come from the terminology that we ended up using (so far) in docs:

2.1 vars: aren't really variables (not using that term in docs). How about include: or load: to invoke an action; values: (we use that term a lot in docs),const:, or globals/locals: for a descriptive term?

The most accurate term would be params:, but that is taken for the local scope... I wonder if we could overload the section for both things (when it assigns a value it's a SET, then it just lists the key it's a GET).

2.2 The $ sign in the ${} expression makes it extra tricky to use in cmd (at least on Linux). I understand this was discussed already so no strong opinion, but from our docs-related research, the most common syntax for this is {{ }}. Of course {} (or any other brackets) can also be problematic, so users will need to worry about escaping anyway...
UPDATE: {} has a meaning in YAML so we can't use that anyway.

2.3 foreach: "for each {items} do {stage details}" is a great construct. My only concern is that the given order of the items is not respected (according to #5181 (comment)) so the "loop" analogy isn't that precise. My only alternative ideas are items:, set:, multi: (we currently use term "mutli-stage" in docs).

2.3.2 do: If we keep foreach, maybe gen/yield: could a) be slightly more accurate and b) hint that you're not using a typical imperative language loop.
If we depart from the loop analogy (use set/multi/expand:) do could just stay, or just be skipped to keep the YAML structure a bit shorter.

A possible recombination:

include:
- foo
- params.json:foo,bar
- foo: "value"

stages:
  mystage:
    wdir: {{foo}}
    include:
      - qux
      - params.toml:quux
      - quuz: "eulav"
    cmd: echo {{qux}} {{globals.foo}}
    deps:
      - {{qux}}
      - {{globals.foo}}
  mystages:
    items:
      - {{paramsjson.foo}}
      - {{bar}}
    cmd: echo {{item}}

3. Current/Known limitations (future)

3.1 It's not possible to put vars (or any other field for that matter) before foreach. dvc repro gives the following error msg: format error: extra keys not allowed @ data['stages']['mystage']['foreach']
Perhaps we can improve the message so users get that they should either use foreach/do OR a regular stage structure.

Depending on 2.3.2 this may not be relevant in the future.

3.2. wdir can't use ${values} from local write-in vars (because wdir is evaluated first, needed for file-based local vars)? But if we address 1 (no merging of objects) then maybe this can be implemented?

3.3 dvc run doesn't pre-process the commands sent to it (not compatible with vars). Should it? Probably not (stating here mainly for the record).

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: templatingRelated to the templating featurediscussionrequires active participation to reach a conclusionenhancementEnhances DVCproduct: VSCodeIntegration with VSCode extension

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions