-
Notifications
You must be signed in to change notification settings - Fork 1.3k
exp run: Add --copy-paths arg.
#9302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp run: Add --copy-paths arg.
#9302
Conversation
e82bb35 to
d9d9a99
Compare
| for path in copy_paths: | ||
| if os.path.isfile(path): | ||
| shutil.copy( | ||
| os.path.realpath(path), os.path.join(dvc.root_dir, path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since defining copy-paths expected format is in our hands, I though that making it relative to the dvc.root_dir made sense for simplicity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow. When will it matter?
Also, should we be using follow_symlinks=False (see https://github.com/iterative/cse/issues/99)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow. When will it matter?
It would matter depending on how the paths are defined/used inside the dvc.yaml vs how they are provided as an argument to copy-paths.
Also, should we be using follow_symlinks=False (see https://github.com/iterative/cse/issues/99)?
os.path.realpath is used so I think it should not be needed (https://github.com/iterative/cse/issues/99 works)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.realpathis used so I think it should not be needed (iterative/cse#99 works)
Does it create an actual copy? I think the desired behavior would be to create another symlink to the same source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use follow_symlinks=False. If the original link uses a relative path, the newly copied symlink will point to the wrong location
(shutil.copy(..., follow_symlinks=False) creates a new link using the same relative path, not a new link pointing to the resolved absolute path of the original target)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be using links here anyways. If the file ends up being modified by the experiment pipeline, the tempdir executor should be modifying a completely independent copy of that file. (We should not be writing to the original file via a symlink)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be using links here anyways. If the file ends up being modified by the experiment pipeline, the tempdir executor should be modifying a completely independent copy of that file. (We should not be writing to the original file via a symlink)
Unless I am missing something, this is already the behavior present in the P.R:
os.path.realpathresolves all symlinks and give the "true" absolute path.shutil.copy/shutil.copytreemake actual copies into the tempdir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the PR should be fine as-is, I was just commenting in response to @dberenbaum's question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for the explanation. It doesn't fully resolve https://github.com/iterative/cse/issues/99 in that case since they want to create links to their large data dependency, but it's not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also manually tested https://github.com/iterative/cse/issues/99 . Not sure if its worth adding a test for the specific scenario though
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #9302 +/- ##
=======================================
Coverage 92.96% 92.96%
=======================================
Files 461 461
Lines 37213 37270 +57
Branches 5359 5368 +9
=======================================
+ Hits 34596 34649 +53
- Misses 2085 2089 +4
Partials 532 532
... and 1 file with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
ff212f1 to
5864eec
Compare
3f2de39 to
c46fa7c
Compare
|
Could you also please open a docs issue/PR? |
List of paths to copy inside the temp directory. Only used if `--temp` or `--queue` is specified. Closes #5800
c46fa7c to
2795421
Compare
* exp run: Add `--copy-paths`. Per treeverse/dvc#9302 * reword * Update usage
List of paths to copy inside the temp directory. Only used if
--tempor--queueis specified.Closes #5800
Closes https://github.com/iterative/cse/issues/99