Skip to content

Conversation

@casperdcl
Copy link
Contributor

@casperdcl casperdcl commented Aug 8, 2020

  • modify schema
    • dvc.yaml
    • dvc.lock
  • support running (from dvc.yaml:stages.<stage>.deps.[].<path>.cmd)
  • add to dvc.lock
  • add way to add user cmd filter via CLI API
    • maybe dvc run -d "utils.py:extract_function.py --name check_db"
  • add tests
  • fixes Support function specific dependencies #3439

Note that this implementation

  • uses PARAM_FILTER = "cmd"
  • passes path as a positional argument to the user-defined cmd
  • computes the md5sum of the user-defined cmd output
  • only works on local files (not dirs & not remote paths)
  • only works for dependencies (not outputs)
  • would produce the old behaviour (whole-file hash) if setting cmd: cat
  • was tested using https://github.com/casperdcl/dvc-udf

schema:

# in dvc.yaml
utils.py:
  cmd: python extract_function.py --name check_db

# in dvc.lock
path: utils.py
cmd: python extract_function.py --name check_db
md5: s0m3h45h # computed via `{cmd} {path} | md5sum`

testing:

git clone https://github.com/casperdcl/dvc-udf
cd dvc-udf
pip install -r requirements.txt
dvc repro -f -v out  # assuming dvc installed from this PR

@casperdcl casperdcl marked this pull request as draft August 8, 2020 22:26
@casperdcl casperdcl self-assigned this Aug 8, 2020
@casperdcl casperdcl requested a review from efiop August 8, 2020 22:26
@casperdcl casperdcl added enhancement Enhances DVC research ui user interface / interaction labels Aug 8, 2020
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if there are aspects of stage.run.cmd_run which should be used here

Comment on lines +102 to +105
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is required - maybe automatically handled elsewhere (i.e. entire tmpdir deleted before exit)

@casperdcl casperdcl force-pushed the user_filter branch 2 times, most recently from a469eb4 to aa4cd31 Compare August 8, 2020 22:47
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe assert not required (should be handled by schema)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other CLI commands can create/load dependencies, skipping the schema. Good to have an assert.

@efiop efiop changed the title dependency: fine grained (user cmd filter) [WIP] dependency: fine grained (user cmd filter) Aug 9, 2020
@efiop
Copy link
Contributor

efiop commented Mar 23, 2021

Closing for now, we'll get back after dep/out refactor to properly accommodate this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhances DVC research ui user interface / interaction

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support function specific dependencies

3 participants