Skip to content

option in dvc run to specify class or method within file as dependency #1572

@brbarkley

Description

@brbarkley

Thanks your work on the dvc project. I've found it very useful!

I have a single file make_dataset.py that compiles data from disparate database sources and usually multiple views within each database. For each database, I bundle data extraction tasks from the various views as methods under a single class. This organizes my code base in a logical manner and allows the primary methods in each class to share common data cleaning operations that are particular to certain databases or views (the common data cleaning operations being housed in a single method at the top of the class).

I can then call make_dataset.py from cmd specifying options of the datasource I want to build, e.g., python make_dataset.py --build_source1 or python make_dataset.py --build_source2. I feed these respective commands to dvc run specifying the relevant dependencies, one of them of course being make_dataset.py. However, the option --build_source1 does not depend on the entire contents of make_dataset.py, only the contents within the specific class or method it is referencing.

Feature request: Is it possible to add an option to dvc enabling a user to specify a class instance (or the like) within a file as a dependency instead of the entire contents of a file?

It seems this could make certain workflows more efficient. For example, if I want to dvc status to determine if the output of python make_dataset.py --build_source1 needs to be rebuilt, dvc will tell me a rebuild is required based on content changes in make_dataset.py even if those content changes were specific only to the class associated with python make_dataset.py --build_source2.

I suppose I could unbundle all the methods and classes into separate files or simply tolerate the lack of specificity within my pipeline, but that seems less than ideal in my case. There, of course, could be complexities I am overlooking in implementing such an option or other solutions that I'm not considering. I appreciate your thoughts either way.

Thanks again!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions