-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python bindings: passing a checksum with a File object write #2071
Comments
Hi! i think what you want to look at is https://github.com/xrootd/xrootd/blob/master/bindings/python/libs/client/copyprocess.py#L73 i'm using it this way for uploads: but for a generic check i think that you need HTH, |
Ooh thanks for the example. Is there any way to do this with the File object workflow that you know of? I can't use CopyProcess as the data is being streamed and never touches the filesystem |
Also it looks like there's a load of additional kwargs to |
I can search for one, but IMHO AFAIK there cannot be a concept of checksumming for streaming as the checksum is relative to the end of transfer (which is always apriori unknown for streaming). If by any chance you are worried by inflight integrity, at some point it was told to me that per chunk checksumming is part of the protocol, so for streaming you have nothing to do. |
So I have a checksum precalculated for the file being streamed, the option to pass in a checksum is to compare to the one that's calculated chunk-wise by xrootd and used to error check – as far as I understand it anyway. I agree if I didn't have a pre-calculated checksum this would not be a useful endeavour. I can do without passing it to xrootd and just query the checksum after the fact, was just wondering if the option is available. Seems like a no. |
Hi @jackleland So, when you do a stream access you cannot have a whole file pre-computed checksum. I mean you can do it at source (where you can see the whole file), but streaming is just remote access on range of bytes.. there is no concept to pass a whole file pre-computed checksum when the access no only that read only some ranges of bytes, but the ranges can also be non contiguous (like reading only some branches on a TTree, so only a certain baskets are actually transferred). Also when streaming, there is no data written at destination, so again you cannot have the concept of having a checksum at destination. I apologize if i'm being redundant or not getting what you try to do.. |
I think we're getting quite distracted by the fact that I'm streaming the file. The crux of it is, if I am making a new file that I am writing to xrootd with the File() object, using the As you mention, for the case where you're not creating a new file it would not make sense, but for my case I believe it does - there's no functional difference between (a) copying a file from a source to a location and (b) writing known data to a new file at a location. The fact I am streaming the data from source as part of (b) is merely because it is too big to guarantee fitting into memory, it is irrelevant to the rest of the argument. If my use case is not general enough that's fine, we can just close the issue. And you're definitely right, reading the source has indeed proved to be more useful in many cases than reading the documentation :) |
Hello @jackleland I don't believe it's possible with the xroot protocol to supply a file checksum at file close (or open) and request that the server report/ensure that the file at time of close matches that checksum. The python File object options are reflecting that. For xrdcp, the '-C' option causes the xrdcp application to add a step after the file is created, written and closed. It sends a 'query' command to the destination to request the checksum of the file is returned (so it has to be calculated or possibly fetched from file extended attributes on the server) and returned to the xrdcp application. The server has to be configured to allow that and to support a given type of checksum algorithm. xrdcp then compares the returned value to whatever reference value it has - it has some different ways of getting the value to match against. If it does mismatch it reports an error, but xrdcp doesn't take other action on the new file. One can do a similar query from python like this, e.g. python3:
(selecting the checksum type with 'cks.type' has some subtleties, the server may not use this value, so you have to check the 'type' which is returned in the result). The python program would have to do the compare and take some action if the checksum is wrong. (This is all separate from block level integrity checking, where for certain operations the client/server can supply the checksum of each, individually small, block of data being transmitted and expect the the peer will verify them). |
Cool, so manually doing a query to the checksum and comparing it yourself would be an adequate recreation of the -C utility for xrdcp. Good to know, happy to close the issue :) |
Just fyi, doing the above I get a hanging "\x00" on the end of vals[1] which I have to strip off. |
I think we can close this ticket. |
I'm using the python bindings. Is there a way to pass a value for the checksum of a file you are writing with a File() object, to the xrd server for corruption validation, as per the -C flag on
xrdcp
? The only checksum feature I can find documented is thechecksumprint
kwarg for on theCopyProcess.add()
method, and this doesn't allow a value to be passed but merely outputs the auto-calculated value.Apologies if there is a better place to ask these kinds of questions (a discord or slack channel maybe?), I could not find one.
The text was updated successfully, but these errors were encountered: