-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix file transfers from windows to linux #196
Conversation
Thanks for this. I'll test it out on Windows and get back. |
Vendored and tested against repro for moby/buildkit#4741 -- fixes 馃憤
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, this change is wrong (cc @crazy-max): https://github.com/tonistiigi/fsutil/pull/173/files#diff-7dd945edf522324729284de47c3bdb062155cc945eaec235fc564631ae4e91b5
The validator consumes data directly off of the wire, which is guaranteed to be in unix-path form (see moby/buildkit#4806 (review)). Therefore, we should avoid using the platform-specific
filepath
package, and instead use the unix-specificpath
package.
I picked these changes from https://github.com/tonistiigi/fsutil/pull/148/files#diff-7dd945edf522324729284de47c3bdb062155cc945eaec235fc564631ae4e91b5 but overlooked it was done on receiver side.
LGTM thanks
validator.go
Outdated
@@ -27,18 +27,18 @@ func (v *Validator) HandleChange(kind ChangeKind, p string, fi os.FileInfo, err | |||
if v.parentDirs == nil { | |||
v.parentDirs = make([]parent, 1, 10) | |||
} | |||
if p != filepath.Clean(p) { | |||
if p != path.Clean(p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we somehow need to make sure that there are no \
in this path (at least for wcow). Otherwise we do the validation only based on /
but then in diskwriter these are converted to local paths in FromSlash()
and and the previous \
become active.
Eg. lets say remote party sends:
dir foo
symlink foo/sym -> /
dir file foo\sym
file foo\sym/bar
I think that would pass validator atm. but would result in breakout in diskwriter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do - I guess a linux buildkit attempting to export a path like foo\sym/bar
to a windows client just isn't supported, since it's not possible to repesent the filename foo\sym
on windows.
I guess the reverse is also true? If a windows buildkit attempts to export foo/sym\bar
to a linux client, this should also fail for the same reason? Though because of the use of ToSlash
, I'm not actually sure if it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sym\bar
is a valid filename in linux but I guess can't be in windows. So if validator/diskwriter run on windows then the paths should not contain \
and it should cause a validator error if there is such path. In Linux, if there is \
then it should be just written out verbose without making a directory separator.
If there is a backwards compatibility issue then it would also be possible that in wcow we convert all \
to /
before we run the validator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check for this case now - the check is actually quite simple - we just convert from and to the slash format, and if there's a difference, the path can't be represented on the target.
This detects the path "lossiness".
Note: unlike I mentioned above, I don't think the reverse is an issue, since /
isn't a valid filename character in windows? So we would never be in a scenario of trying to send a file foo/bar\baz
from windows to linux (so the reverse issue doesn't apply).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^
PTAL @tonistiigi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still somewhat confusing to me how this works.
Afaics there is no guarantee that fs.Walk
returns unix-only paths. The conversion seems to be in send.go
. On the receiver side getWalkerFn
uses fs.Walk
that is then then sent to ComparePath
through differ that now with this implementation only understands /
separators.
The same seems to be true for the remote paths via newDynamicWalker
as well. The validator does do FromSlash
before validation etc conversion but once it is done, it seems that rest of the functions still use the original form and there is no guarantee what format the path is. This includes the hardlink validation as well as the differ where the format of the path seems to be undefined.
In diskwriter there is FromSlash
conversion. It is before the Filter
gets called. Is this intentional? If yes, it should be documented as such but I think unix-style makes more sense.
I think I'm starting to understand how the differ part works. There is extra Line 114 in 53090fe
doubleWalkDiff takes two walker functions, first one is expected to be unix style (although I don't think there is validation, per earlier comment) and second one is expected to be filepath style and gets converted internally. I think it would make more sense if the conversion for the second one was already part of walker.
For the first walker with files from the remote side, I think the |
Then, inside all our validations, and our double walking, we don't have to worry about translating between different path formats. Additionally, we ensure that all received paths can be represented on the target host system. A path like `foo/bar\baz` can only be stored on linux (a dir with `foo`, containing a file `bar\baz`). On windows, this would later be interpreted to be `foo\bar\baz` (a dir with `foo`, containing a dir `bar`, containing a file `baz`). Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
53090fe
to
86c9caf
Compare
Okay, so I've changed this pretty substantially - now, we perform translations immediately off the wire, and then the rest of the validations/transformations apply in platform-specific style. This keeps validation the same, and means that both funcs in |
if fileCanRequestData(os.FileMode(p.Stat.Mode)) { | ||
r.mu.Lock() | ||
r.files[p.Stat.Path] = i | ||
r.mu.Unlock() | ||
} | ||
i++ | ||
cp := ¤tPath{path: p.Stat.Path, stat: p.Stat} | ||
|
||
cp := ¤tPath{path: path, stat: p.Stat} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the only thoughts I have left with this - I think we might be able to get rid of currentPath
- there's no longer any need for it, since stat.Path
should always be the same as path
?
But also, not going to fill up the diff with that - that's a follow-up potentially, I'd like to get this merged asap, and so don't want to increase the scope.
@jedevc Can you leave a comment on top of some file explaining where are all the conversions from filepath to unix and unix to filepath, and that the stat receiver guarantees conversion to unix if it is possible. |
Signed-off-by: Justin Chadwell <me@jedevc.com>
@tonistiigi have added a high-level description of the protocol (which I've also realized would be useful), and in that described where the path conversions are performed, right before/after sending/receiving. Additionally, I've explained some of the caveats of cross-platform file transfers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Also tested against moby/buildkit#4741 and fixes. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how I feel about Stat
proto type being allowed to be both unix or windows style. I would have preferred if there was a guarantee that this type is only uni unix-style.
But functionally this looks correct, so LGTM.
Things that should be checked in follow-up:
- Filter now doesn't take unix-style path. I think this may be problematic.
- Hardlink detection uses
Linkname == filepath(path)
condition that is likely breaking as it can't be predicted from the client side.
So, some amount of investigation later, I don't actually think that anyone uses the first argument of this - so while this is a problem, I don't think it will actually have an effect. I suppose the correct thing to do though is to preserve the original path off the wire, and then make sure we can pass it into places like this though. |
Also for filter ... something seems weird. We pass the same I can't quite understand the logic that requires filter to be everywhere like that. |
Full diff: https://github.com/tonistiigi/fsutil/compare/7525a1af2bb5..497d33b Summary changes: - tonistiigi/fsutil#195 receive: ensure callback errors are propagated - tonistiigi/fsutil#196 Fix file transfers from windows to linux fixes moby#4741 - tonistiigi/fsutil#197 recv: translate linkname to wire format Signed-off-by: Anthony Nandaa <profnandaa@gmail.com>
Full diff: https://github.com/tonistiigi/fsutil/compare/7525a1af2bb5..497d33b Summary changes: - tonistiigi/fsutil#195 receive: ensure callback errors are propagated - tonistiigi/fsutil#196 Fix file transfers from windows to linux fixes moby#4741 - tonistiigi/fsutil#197 recv: translate linkname to wire format Signed-off-by: Anthony Nandaa <profnandaa@gmail.com>
Full diff: https://github.com/tonistiigi/fsutil/compare/7525a1af2bb5..497d33b Summary changes: - tonistiigi/fsutil#195 receive: ensure callback errors are propagated - tonistiigi/fsutil#196 Fix file transfers from windows to linux fixes moby#4741 - tonistiigi/fsutil#197 recv: translate linkname to wire format Signed-off-by: Anthony Nandaa <profnandaa@gmail.com>
Full diff: https://github.com/tonistiigi/fsutil/compare/7525a1af2bb5..497d33b Summary changes: - tonistiigi/fsutil#195 receive: ensure callback errors are propagated - tonistiigi/fsutil#196 Fix file transfers from windows to linux fixes moby#4741 - tonistiigi/fsutil#197 recv: translate linkname to wire format Signed-off-by: Anthony Nandaa <profnandaa@gmail.com> (cherry picked from commit 64ea9da)
Full diff: https://github.com/tonistiigi/fsutil/compare/7525a1af2bb5..497d33b Summary changes: - tonistiigi/fsutil#195 receive: ensure callback errors are propagated - tonistiigi/fsutil#196 Fix file transfers from windows to linux fixes moby#4741 - tonistiigi/fsutil#197 recv: translate linkname to wire format Signed-off-by: Anthony Nandaa <profnandaa@gmail.com> (cherry picked from commit 64ea9da)
This PR should fix file transfers from Windows to Linux (as is the case with docker-on-windows), and fixes moby/buildkit#4806 moby/buildkit#4741 (cc @profnandaa)
Also fixes dagger/dagger#6923.
There are two parts, split into separate commits. This also could probably do with some tests 馃槃
Reading unix paths as windows ones
Essentially, this change is wrong (cc @crazy-max): https://github.com/tonistiigi/fsutil/pull/173/files#diff-7dd945edf522324729284de47c3bdb062155cc945eaec235fc564631ae4e91b5
The validator consumes data directly off of the wire, which is guaranteed to be in unix-path form (see moby/buildkit#4806 (review)). Therefore, we should avoid using the platform-specific
filepath
package, and instead use the unix-specificpath
package.Keeping file map keys consistent
The change here https://github.com/tonistiigi/fsutil/pull/173/files#diff-7e712d320828705efdc283be20c75b6afc94b7327362cdd0e4e96fc6c72e645cR114 causes the
receiver.files
map to be indexed inasyncDataFunc
by a windows path, but we explicitly store the data using a unix path.The solution here is to make sure we propagate the correct path downwards, always using unix paths.