Skip to content

Conversation

@daavoo
Copy link
Contributor

@daavoo daavoo commented Apr 4, 2023

List of paths to copy inside the temp directory. Only used if --temp or --queue is specified.

Closes #5800
Closes https://github.com/iterative/cse/issues/99

@daavoo daavoo added enhancement Enhances DVC A: experiments Related to dvc exp labels Apr 4, 2023
@daavoo daavoo requested review from dberenbaum and pmrowla April 4, 2023 10:37
@daavoo daavoo self-assigned this Apr 4, 2023
@daavoo daavoo linked an issue Apr 4, 2023 that may be closed by this pull request
@daavoo daavoo force-pushed the 5800-exp-run-copy-certain-git-ignored-files-to-tmp-folder-on-run-all branch 3 times, most recently from e82bb35 to d9d9a99 Compare April 4, 2023 16:20
for path in copy_paths:
if os.path.isfile(path):
shutil.copy(
os.path.realpath(path), os.path.join(dvc.root_dir, path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since defining copy-paths expected format is in our hands, I though that making it relative to the dvc.root_dir made sense for simplicity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. When will it matter?

Also, should we be using follow_symlinks=False (see https://github.com/iterative/cse/issues/99)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. When will it matter?

It would matter depending on how the paths are defined/used inside the dvc.yaml vs how they are provided as an argument to copy-paths.

Also, should we be using follow_symlinks=False (see https://github.com/iterative/cse/issues/99)?

os.path.realpath is used so I think it should not be needed (https://github.com/iterative/cse/issues/99 works)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.realpath is used so I think it should not be needed (iterative/cse#99 works)

Does it create an actual copy? I think the desired behavior would be to create another symlink to the same source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use follow_symlinks=False. If the original link uses a relative path, the newly copied symlink will point to the wrong location

(shutil.copy(..., follow_symlinks=False) creates a new link using the same relative path, not a new link pointing to the resolved absolute path of the original target)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be using links here anyways. If the file ends up being modified by the experiment pipeline, the tempdir executor should be modifying a completely independent copy of that file. (We should not be writing to the original file via a symlink)

Copy link
Contributor Author

@daavoo daavoo Apr 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be using links here anyways. If the file ends up being modified by the experiment pipeline, the tempdir executor should be modifying a completely independent copy of that file. (We should not be writing to the original file via a symlink)

Unless I am missing something, this is already the behavior present in the P.R:

  • os.path.realpath resolves all symlinks and give the "true" absolute path.
  • shutil.copy / shutil.copytree make actual copies into the tempdir.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the PR should be fine as-is, I was just commenting in response to @dberenbaum's question

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the explanation. It doesn't fully resolve https://github.com/iterative/cse/issues/99 in that case since they want to create links to their large data dependency, but it's not a blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also manually tested https://github.com/iterative/cse/issues/99 . Not sure if its worth adding a test for the specific scenario though

@codecov
Copy link

codecov bot commented Apr 4, 2023

Codecov Report

Patch coverage: 95.83% and no project coverage change.

Comparison is base (32bebc0) 92.96% compared to head (2795421) 92.96%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9302   +/-   ##
=======================================
  Coverage   92.96%   92.96%           
=======================================
  Files         461      461           
  Lines       37213    37270   +57     
  Branches     5359     5368    +9     
=======================================
+ Hits        34596    34649   +53     
- Misses       2085     2089    +4     
  Partials      532      532           
Impacted Files Coverage Δ
dvc/repo/experiments/run.py 97.91% <ø> (ø)
tests/func/experiments/test_set_params.py 100.00% <ø> (ø)
tests/unit/command/test_experiments.py 99.61% <ø> (ø)
dvc/repo/experiments/queue/tasks.py 90.74% <40.00%> (-5.34%) ⬇️
dvc/commands/experiments/exec_run.py 100.00% <100.00%> (ø)
dvc/commands/experiments/run.py 83.33% <100.00%> (+0.40%) ⬆️
dvc/repo/experiments/__init__.py 86.21% <100.00%> (ø)
dvc/repo/experiments/executor/base.py 85.13% <100.00%> (+1.02%) ⬆️
dvc/repo/experiments/executor/ssh.py 29.62% <100.00%> (ø)
dvc/repo/experiments/queue/base.py 86.30% <100.00%> (ø)
... and 5 more

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@daavoo daavoo force-pushed the 5800-exp-run-copy-certain-git-ignored-files-to-tmp-folder-on-run-all branch 2 times, most recently from ff212f1 to 5864eec Compare April 5, 2023 11:54
@daavoo daavoo requested a review from dberenbaum April 5, 2023 11:55
@daavoo daavoo force-pushed the 5800-exp-run-copy-certain-git-ignored-files-to-tmp-folder-on-run-all branch 3 times, most recently from 3f2de39 to c46fa7c Compare April 5, 2023 17:08
@dberenbaum
Copy link
Contributor

Could you also please open a docs issue/PR?

@daavoo daavoo enabled auto-merge (rebase) April 5, 2023 20:44
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified.

Closes #5800
@daavoo daavoo force-pushed the 5800-exp-run-copy-certain-git-ignored-files-to-tmp-folder-on-run-all branch from c46fa7c to 2795421 Compare April 5, 2023 20:45
@daavoo daavoo merged commit 3312a90 into main Apr 5, 2023
@daavoo daavoo deleted the 5800-exp-run-copy-certain-git-ignored-files-to-tmp-folder-on-run-all branch April 5, 2023 21:17
daavoo added a commit to treeverse/dvc.org that referenced this pull request Apr 19, 2023
dberenbaum pushed a commit to treeverse/dvc.org that referenced this pull request Apr 19, 2023
* exp run: Add `--copy-paths`.

Per treeverse/dvc#9302

* reword

* Update usage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: experiments Related to dvc exp enhancement Enhances DVC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

exp run: copy (certain) git-ignored files to tmp folder on --run-all

3 participants