-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
A: templatingRelated to the templating featureRelated to the templating featurefeature requestRequesting a new featureRequesting a new feature
Description
With the introduction of the new multiple-stage pipeline, we will need to find a way of defining variables in the pipeline. For example, the intermediate file name cleansed.csv is used from two stages in the following pipeline and it needs to be defined into a variable:
stages:
process:
cmd: "./process.bin --input data --output cleansed.csv"
deps:
- path: data/
outs:
- path: cleansed.csv
train:
cmd: "python train.py"
deps:
- path: cleansed.csv
- path: train.py
- path: params.yaml
params:
lr: 0.042
layers: 8
classes: 4
outs:
- path: model.pkl
- path: log.csv
cache: true
- path: summary.jsonWe need to solve two problems here:
- Define a variable in one place and reuse it from multiple places/stages.
- Often users prefer to read file names from config files (like in the
trainstage), not from the command line (like in theprocessstage).
We can solve both of the problems using a single abstraction - parameters file variable:
stages:
process:
cmd: ./process.bin
outs:
- path: "params.yaml:cleansed_file_name"
....
train:
cmd: "python train.py"
deps:
- path: "params.yaml:cleansed_file_name"This feature is useful in the current DVC design as well. It is convenient to read file names from params file and still define dependency properly like dvc run -d params.yaml:input_file -o params.yaml:model.pkl
skshetry, tall-josh, karajan1001, dsuess, dbuades and 5 more
Metadata
Metadata
Assignees
Labels
A: templatingRelated to the templating featureRelated to the templating featurefeature requestRequesting a new featureRequesting a new feature