-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update xrdcp to emit a warning if the server tries to deliever too much data #1492
Comments
Hi @paulmillar, thanks for your comment! In practice there's no way for the client to know whether the source would sent some extra data compared to the file size yielded by the stat operation, because, if necessary, the client trims the last chunk so the total number of read bytes equals to the file size. For example, let us assume we are copying a 12MB file and we are using the default chunk size of 8MB. The client will read first data chunk of 8MB and second data chunk of 4MB (in total 12MB). The server won't send any extra data (past the end of file) because the client asks only for 12MB. If the file size is expected to change (which in most case it is not), there is the |
Hi @simonmichal , To be a bit more specific, we have the suspicion that the RAL server (under some currently unclear conditions) will respond to the final read request with an Using your example but targeting a 9MB file and with the server replying with 2MB chunks. The client requests 4MB at offset 0. The server delivers 4MB of data but split into two chunks, each 2MB in size. The first has the status The second read request (at offset 4MB) shows the same behaviour: two responses, each 2MB in size with The third read request (at offset 8MB) is for 4MB. The server replies with the remaining 1MB (as expected) but with the status In total, the server tried to deliver 11MB of data, despite the file being 9MB in size. I believe that, when I think it is fine for |
Hi @paulmillar Sticking to your example, in case of |
I think what Paul is asking for it for xrdcp to issue full-sized reads and
trim after the fact so that it issue a warning. Right Paul?
Andy
…On Tue, 10 Aug 2021, simonmichal wrote:
Hi @paulmillar
Sticking to your example, in case of `xrdcp` the third read request won't be for 4MB. Since the file is 9MB big and we already requested 8MB (2x 4MB chunk), `xrdcp` will trim the size of the 3rd chunk to 1MB. What I mean here is that `xrdcp` will send a request for 1MB of data (and not trim the data it received from the server). That's why, `xrdcp` never sees the extra data that RAL/Ceph is sending.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#1492 (comment)
|
Hi @abh3, To be honest, I think what I'm after is something a little higher-level: if a transfer fails (or succeeds with a warning) with dCache's embedded client then there is a similar behaviour with
However, I'm now wondering if we should simply adopt the Currently the dCache embedded client always makes fixed size read requests and stops once it has received enough bytes. This should work, but it relies on the server correctly delivering fewer bytes than were requested in the final read request -- which is (apparently) problematic. Perhaps there's an issue of making it easy for a site to test and diagnose problems with their endpoint (particularly if they are developing their own storage plugin). That might be addressed with some set of conformance tests that a site could run. One of those tests could try making repeated reads with a fixed (perhaps configurable) size and checking the server yields the expected number of bytes. More tests could be added when other problems surface. I'll talk with Al and see how he feels about updating the dCache embedded client so it limits the size of the final request. |
@paulmillar : IMHO you guys will need to adopt the |
@paulmillar : can we close this one? As discussed before, |
Hi guys, Sorry for the delay in getting back. Yes, we can close this ticket. After a (short) dCache-internal discussion we decided to follow the same strategy as xrdcp: ensure we know the file's expected size (which we already do) and truncate the final read to the expected size. |
... and thanks again for all your help in resolving this issue. |
When asked to copy a file from a remote server, the
xrdcp
client first does a "stat" to establish the file's size (see this comment ). If the server tries to deliver too much data, the client will silently ignore any excess data. The transfer will complete successfully with no indication of the problem.If the remote server is delivering too much data then it is broken. By limiting the downloaded data to the expected file size,
xrdcp
is making an assumption about the failure mode; specifically, that data delivered within the expected byte-range is correct and that data outside of this range may be rejected.While this failure mode is quite likely, it isn't guaranteed. For example, the final (short) frame may be built by inserting data at the wrong offset, so the final frame would contain some incorrect data followed by the true data. This would result in data corruption if the transferred data were simply cropped to the expected size.
The current behaviour is (probably) the best choice; however, the user should be warned that the remote endpoint is broken and tried to deliver too much data. The user may then take extra steps to verify the data integrity (e.g., comparing checksum against a known value)
The text was updated successfully, but these errors were encountered: