Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python bindings: passing a checksum with a File object write #2071

Closed
jackleland opened this issue Aug 17, 2023 · 11 comments
Closed

python bindings: passing a checksum with a File object write #2071

jackleland opened this issue Aug 17, 2023 · 11 comments

Comments

@jackleland
Copy link

I'm using the python bindings. Is there a way to pass a value for the checksum of a file you are writing with a File() object, to the xrd server for corruption validation, as per the -C flag on xrdcp? The only checksum feature I can find documented is the checksumprint kwarg for on the CopyProcess.add() method, and this doesn't allow a value to be passed but merely outputs the auto-calculated value.

Apologies if there is a better place to ask these kinds of questions (a discord or slack channel maybe?), I could not find one.

@adriansev
Copy link
Contributor

Hi! i think what you want to look at is https://github.com/xrootd/xrootd/blob/master/bindings/python/libs/client/copyprocess.py#L73

i'm using it this way for uploads:
https://github.com/adriansev/jalien_py/blob/master/alienpy/xrd_core.py#L820
then in the add_job
https://github.com/adriansev/jalien_py/blob/master/alienpy/xrd_core.py#L835

but for a generic check i think that you need checkSumMode = 'end2end' and checkSumType = 'auto' with checkSumPreset = ''
this way a common hash will be negotiated with the server (EOS knows only adler32)

HTH,
Adrian

@jackleland
Copy link
Author

Ooh thanks for the example. Is there any way to do this with the File object workflow that you know of? I can't use CopyProcess as the data is being streamed and never touches the filesystem

@jackleland
Copy link
Author

Also it looks like there's a load of additional kwargs to CopyProcess.add() that are not in the documentation

@adriansev
Copy link
Contributor

Ooh thanks for the example. Is there any way to do this with the File object workflow that you know of? I can't use CopyProcess as the data is being streamed and never touches the filesystem

I can search for one, but IMHO AFAIK there cannot be a concept of checksumming for streaming as the checksum is relative to the end of transfer (which is always apriori unknown for streaming). If by any chance you are worried by inflight integrity, at some point it was told to me that per chunk checksumming is part of the protocol, so for streaming you have nothing to do.

@jackleland
Copy link
Author

So I have a checksum precalculated for the file being streamed, the option to pass in a checksum is to compare to the one that's calculated chunk-wise by xrootd and used to error check – as far as I understand it anyway. I agree if I didn't have a pre-calculated checksum this would not be a useful endeavour.

I can do without passing it to xrootd and just query the checksum after the fact, was just wondering if the option is available. Seems like a no.

@adriansev
Copy link
Contributor

Hi @jackleland So, when you do a stream access you cannot have a whole file pre-computed checksum. I mean you can do it at source (where you can see the whole file), but streaming is just remote access on range of bytes.. there is no concept to pass a whole file pre-computed checksum when the access no only that read only some ranges of bytes, but the ranges can also be non contiguous (like reading only some branches on a TTree, so only a certain baskets are actually transferred). Also when streaming, there is no data written at destination, so again you cannot have the concept of having a checksum at destination. I apologize if i'm being redundant or not getting what you try to do..
As for documentation, as in many cases, the best documentation is the source :)
https://github.com/xrootd/xrootd/blob/master/bindings/python/libs/client/file.py#L30
https://github.com/xrootd/xrootd/blob/master/bindings/python/src/PyXRootDFile.hh
https://github.com/xrootd/xrootd/blob/master/src/XrdCl/XrdClFile.hh

@jackleland
Copy link
Author

I think we're getting quite distracted by the fact that I'm streaming the file.

The crux of it is, if I am making a new file that I am writing to xrootd with the File() object, using the File.open command with OpenFlags.NEW, and I know what the checksum of the file should be, it seems reasonable to me to be able to specify what that checksum should be so that xrootd can do the same verification it would during a xrdcp command. From the man page: "[the -C flag] obtains the checksum of type (i.e. adler32, crc32, or md5) from the source, computes the checksum at the destination, and verifies that they are the same. If a value is specified, it is used as the source checksum" There is no option to do this at the moment.

As you mention, for the case where you're not creating a new file it would not make sense, but for my case I believe it does - there's no functional difference between (a) copying a file from a source to a location and (b) writing known data to a new file at a location. The fact I am streaming the data from source as part of (b) is merely because it is too big to guarantee fitting into memory, it is irrelevant to the rest of the argument.

If my use case is not general enough that's fine, we can just close the issue.

And you're definitely right, reading the source has indeed proved to be more useful in many cases than reading the documentation :)

@smithdh
Copy link
Contributor

smithdh commented Aug 21, 2023

Hello @jackleland

I don't believe it's possible with the xroot protocol to supply a file checksum at file close (or open) and request that the server report/ensure that the file at time of close matches that checksum. The python File object options are reflecting that.

For xrdcp, the '-C' option causes the xrdcp application to add a step after the file is created, written and closed. It sends a 'query' command to the destination to request the checksum of the file is returned (so it has to be calculated or possibly fetched from file extended attributes on the server) and returned to the xrdcp application. The server has to be configured to allow that and to support a given type of checksum algorithm. xrdcp then compares the returned value to whatever reference value it has - it has some different ways of getting the value to match against. If it does mismatch it reports an error, but xrdcp doesn't take other action on the new file. One can do a similar query from python like this, e.g. python3:

from XRootD import client
fs = client.FileSystem("root://the.server.edu:1094")
status, resp = fs.query(client.flags.QueryCode.CHECKSUM,"/filename?cks.type=adler32");

if status.ok:
  resp = str(resp, "utf-8")
  vals = resp.split()
  print('checksum type='+vals[0], 'checksum value='+vals[1])

(selecting the checksum type with 'cks.type' has some subtleties, the server may not use this value, so you have to check the 'type' which is returned in the result). The python program would have to do the compare and take some action if the checksum is wrong.

(This is all separate from block level integrity checking, where for certain operations the client/server can supply the checksum of each, individually small, block of data being transmitted and expect the the peer will verify them).

@jackleland
Copy link
Author

Cool, so manually doing a query to the checksum and comparing it yourself would be an adequate recreation of the -C utility for xrdcp. Good to know, happy to close the issue :)

@jackleland
Copy link
Author

Just fyi, doing the above I get a hanging "\x00" on the end of vals[1] which I have to strip off.

@abh3
Copy link
Member

abh3 commented Oct 12, 2023

I think we can close this ticket.

@abh3 abh3 closed this as completed Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants