Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid cross-device: copy_file_range changed in Kernel 5.18 #3654

Closed
Xist12gh opened this issue Aug 16, 2022 · 4 comments · Fixed by #3756
Closed

Invalid cross-device: copy_file_range changed in Kernel 5.18 #3654

Xist12gh opened this issue Aug 16, 2022 · 4 comments · Fixed by #3756

Comments

@Xist12gh
Copy link
Contributor

Hit this recently (few weeks ago) with a Debian sid upgrade

Given
Load and Finish folder on different drives (ext4)

Getting
Unable to copy: Unable to read/write: Invalid cross-device link (18) (libtransmission/torrent.cc:2127)

From
copy_file_range in file-posix.cc:500

Seems to stem from recent kernel shenanigans
See qbittorrent/qBittorrent#17352

I took a look at file-posix.cc and noticed that the "Fallback" userspace copy is only entered when the kernel functions are unavailable.
How about replacing the #else in
https://github.com/transmission/transmission/blob/main/libtransmission/file-posix.cc#L519
with the #endif in
https://github.com/transmission/transmission/blob/main/libtransmission/file-posix.cc#L548

Maybe it would be a good idea to restore file_size to its original size when an error occurs in
https://github.com/transmission/transmission/blob/main/libtransmission/file-posix.cc#L508
so that the fallback copies the entire file in case it's not because of EXDEV

The buffer creation in
https://github.com/transmission/transmission/blob/main/libtransmission/file-posix.cc#L523
could be guarded with a file_size >0 check if necessary

Maybe it's smarter to check for EXDEV errno in L508 explicitly? But it will probably fix itself with a newer kernel.

For now I disabled USE_COPY_FILE_RANGE in my build.

@ckerr
Copy link
Member

ckerr commented Aug 17, 2022

@Xist12gh sounds reasonable. Thanks for the good job of describing the issue and context.

Even if this does get fixed in a newer kernel, you just know there will be someone out there with the wrong kernel, so better to fix it in Transmission anyway 😸

Want to make a PR?

@lukts30
Copy link

lukts30 commented Aug 31, 2022

The way I understand this the macro has always been a bit broken e.g if I compiled on a system with kernel >= 5.3 but ran the binary on a system with LTS kernel 4.19 it would have the same Invalid cross-device problem.
Maybe that is expected but usually compiling an application on a newer kernel and using it on an older but not ancient kernel is not unusual.

Either way, the return value of EXDEV is an intentional change and not "kernel shenanigans" that has also been backported (5.15 LTS also contains the patch).
So the "wrong kernel" will be in the majority and I would not expect a change in that regard in newer kernels.

Linus himself stated that reintroducing the return value EXDEV is ok since userspace should have always handled the pre-5.3 case where EXDEV was to be expected.

@Xist12gh
Copy link
Contributor Author

@ckerr Sry haven't gotten around making a PR yet, way busy atm. I'll maybe take a go at it this weekend if still necessary.

Yeah I agree that it should've always be handled by userspace, and it did .. just kinda in a bad way, The fallback is missing entirely here so nothing got copied.

While typing I even contemplated if the use in transmission is even useful. Sure FS coouulld implement a faster way to copy the file, but there are even advantages to not doing so, namely fragmentation when the file was not preallocated or at least fallocated. And in case of zfs that wouldn't even do anything because of CoW. Copying the file "the usual way" may even defragment it, well at least when enough unfragmented space is left to begin with.
So there are arguments for both keeping it as a feature (and handling eventual EXDEVs) or removing it alltogether.
For now I'll say keep it, might come in handy for someone with a supported FS, just handle the return value more gracefully.

Though, as you said, this call looks kinda broken from the get go. Changing a somewhat core dynamic of a call that many times (basically can't do at all to can do to can do maybe), though correct with the reasons clearly stated in the commits, looks kinda wimpy to me... fool me twice, make up your mind etc. Remind you the 5.3 state got into the documentation (taken from man page currently in debian sid):

A major rework of the kernel implementation occurred in 5.3. Areas of the API
that weren't clearly defined were clarified and the API bounds are much more
strictly checked than on earlier kernels. Applications should target the behav‐
iour and requirements of 5.3 kernels.

Now that didn't age too well ey? Took me a while to even figure out when and why it got changed again.
That one got on my blacklist now until further need arises and I didn't even know it existed 14 days ago.

I guess it has always be fine to use for simple copies in the same directory at least.

@lukts30
Copy link

lukts30 commented Aug 31, 2022

That the 5.3 state got into the documentation which implies the absence of EXDEV as a return value is for sure unfortunate.

Personally, I would also consider the reintroduction of EXDEV a violation of the "do not break userspace" rule IF the check for kernel >=5.3 happened at runtime.
But because this check is done at compile time this is already broken on pre-5.3 kernels and therefore I can understand Linus's argumentation.

Xist12gh added a commit to Xist12gh/transmission that referenced this issue Sep 4, 2022
…ssion#3654)

* allow fallback to other copy routines based on errno

* tiered kernel copy routines are tried in runtime now when available
Xist12gh added a commit to Xist12gh/transmission that referenced this issue Sep 5, 2022
…ssion#3654)

* allow fallback to other copy routines based on errno

* tiered kernel copy routines are tried in runtime now when available
ckerr pushed a commit to Xist12gh/transmission that referenced this issue Sep 6, 2022
…ssion#3654)

* allow fallback to other copy routines based on errno

* tiered kernel copy routines are tried in runtime now when available
ckerr pushed a commit to Xist12gh/transmission that referenced this issue Sep 6, 2022
…ssion#3654)

* allow fallback to other copy routines based on errno

* tiered kernel copy routines are tried in runtime now when available
ckerr pushed a commit that referenced this issue Sep 7, 2022
* Invalid cross-device: copy_file_range changed in Kernel 5.18 (#3654)

* allow fallback to other copy routines based on errno

* tiered kernel copy routines are tried in runtime now when available
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants