Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update xrdcp to emit a warning if the server tries to deliever too much data #1492

Closed
paulmillar opened this issue Aug 9, 2021 · 9 comments
Assignees

Comments

@paulmillar
Copy link
Contributor

When asked to copy a file from a remote server, the xrdcp client first does a "stat" to establish the file's size (see this comment ). If the server tries to deliver too much data, the client will silently ignore any excess data. The transfer will complete successfully with no indication of the problem.

If the remote server is delivering too much data then it is broken. By limiting the downloaded data to the expected file size, xrdcp is making an assumption about the failure mode; specifically, that data delivered within the expected byte-range is correct and that data outside of this range may be rejected.

While this failure mode is quite likely, it isn't guaranteed. For example, the final (short) frame may be built by inserting data at the wrong offset, so the final frame would contain some incorrect data followed by the true data. This would result in data corruption if the transferred data were simply cropped to the expected size.

The current behaviour is (probably) the best choice; however, the user should be warned that the remote endpoint is broken and tried to deliver too much data. The user may then take extra steps to verify the data integrity (e.g., comparing checksum against a known value)

@simonmichal
Copy link
Contributor

simonmichal commented Aug 10, 2021

Hi @paulmillar, thanks for your comment!

In practice there's no way for the client to know whether the source would sent some extra data compared to the file size yielded by the stat operation, because, if necessary, the client trims the last chunk so the total number of read bytes equals to the file size.

For example, let us assume we are copying a 12MB file and we are using the default chunk size of 8MB. The client will read first data chunk of 8MB and second data chunk of 4MB (in total 12MB). The server won't send any extra data (past the end of file) because the client asks only for 12MB.

If the file size is expected to change (which in most case it is not), there is the -Z | --dynamic-src option that allows to read the file until EOF is reached.

@paulmillar
Copy link
Contributor Author

Hi @simonmichal ,

To be a bit more specific, we have the suspicion that the RAL server (under some currently unclear conditions) will respond to the final read request with an kXR_oksofar status. It then delivers an additional response. This causes the dCache embedded client to accept data that is too long; however, because xrdcp truncates the file to the expected size, the correct file size is delivered the to local filesystem.

Using your example but targeting a 9MB file and with the server replying with 2MB chunks. The client requests 4MB at offset 0. The server delivers 4MB of data but split into two chunks, each 2MB in size. The first has the status kXR_oksofar while the second is kXR_ok.

The second read request (at offset 4MB) shows the same behaviour: two responses, each 2MB in size with kXR_oksofar and kXR_ok respectively.

The third read request (at offset 8MB) is for 4MB. The server replies with the remaining 1MB (as expected) but with the status kXR_oksofar, indicating that further data will be deliverd. It then replies with an additional response which is 2MB with the status kXR_ok.

In total, the server tried to deliver 11MB of data, despite the file being 9MB in size.

I believe that, when xrdcp sees this behaviour from the server, it will write a file of the expected file size. It will also not issue a warning to the user that the server tried to deliver too much data.

I think it is fine for xrdcp to truncate the file; however, I also think that xrdcp should emit a warning if the server includes the kXR_oksofar status when sending what should be the final response and/or otherwise tries to deliver too much data.

@simonmichal
Copy link
Contributor

Hi @paulmillar

Sticking to your example, in case of xrdcp the third read request won't be for 4MB. Since the file is 9MB big and we already requested 8MB (2x 4MB chunk), xrdcp will trim the size of the 3rd chunk to 1MB. What I mean here is that xrdcp will send a request for 1MB of data (and not trim the data it received from the server). That's why, xrdcp never sees the extra data that RAL/Ceph is sending.

@abh3
Copy link
Member

abh3 commented Aug 10, 2021 via email

@paulmillar
Copy link
Contributor Author

Hi @abh3,

To be honest, I think what I'm after is something a little higher-level: if a transfer fails (or succeeds with a warning) with dCache's embedded client then there is a similar behaviour with xrdcp. This is for two reasons:

  1. It gives a clearer impression that the problem is with the endpoint, rather than dCache.
  2. It gives a site (e.g., RAL) an easier way to test their endpoint (when trying to fix the problem).

However, I'm now wondering if we should simply adopt the xrdcp strategy in dCache's embedded client: ensure that we know the file's size and only request the outstanding data for the final read.

Currently the dCache embedded client always makes fixed size read requests and stops once it has received enough bytes. This should work, but it relies on the server correctly delivering fewer bytes than were requested in the final read request -- which is (apparently) problematic.

Perhaps there's an issue of making it easy for a site to test and diagnose problems with their endpoint (particularly if they are developing their own storage plugin). That might be addressed with some set of conformance tests that a site could run. One of those tests could try making repeated reads with a fixed (perhaps configurable) size and checking the server yields the expected number of bytes. More tests could be added when other problems surface.

I'll talk with Al and see how he feels about updating the dCache embedded client so it limits the size of the final request.

@simonmichal
Copy link
Contributor

@paulmillar : IMHO you guys will need to adopt the xrdcp behavior, AFAIK the problem RAL has is due to a bug in Ceph rados striper and fixing something that has been pushed to the Ceph codebase (like rados striper) is difficult and time consuming.

@simonmichal
Copy link
Contributor

@paulmillar : can we close this one? As discussed before, xrdcp due to its design will never receive additional and I would rather not change that.

@simonmichal simonmichal self-assigned this Sep 2, 2021
@paulmillar
Copy link
Contributor Author

Hi guys,

Sorry for the delay in getting back.

Yes, we can close this ticket.

After a (short) dCache-internal discussion we decided to follow the same strategy as xrdcp: ensure we know the file's expected size (which we already do) and truncate the final read to the expected size.

@paulmillar
Copy link
Contributor Author

... and thanks again for all your help in resolving this issue.

@paulmillar paulmillar reopened this Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants