Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metalink :: cksum should parse hash declaration #1172

Closed
adriansev opened this issue Apr 1, 2020 · 9 comments
Closed

metalink :: cksum should parse hash declaration #1172

adriansev opened this issue Apr 1, 2020 · 9 comments

Comments

@adriansev
Copy link
Contributor

Hi! It would be great if cksum for both python bindings and xrdcp could parse the field of metalink. I think that the behaviour could be like this: if not type is specified just use the first hast field; if type is specified use the specified hash (if not existing throw an error about wrong hash type). I think that if cksum is enabled but no hash is found in metalink it would be enough to print an warning that "no hash found, checking cannot be done"
Thank you!

@simonmichal
Copy link
Contributor

@adriansev : I pushed today: 769182c 7c5e272 eb0e7be

this should allow you to use the checksum present in a metalink while extracting a file from ZIP archive.

From comandline (assuming file.meta4 points to a ZIP archive containing file.txt and a md5 checksum for the file):

  • xrdcp --cksum md5 -f --zip=file.txt --zip-mtln-cksum file.meta4 .
    or with an envar
  • XRD_ZIPMTLNCKSUM=1 xrdcp --cksum md5 -f --zip=file.txt file.meta4 .

You can also enable the functionality from Python by setting ZipMtlnCksum in the environment, e.g.:
EnvPutInt( 'ZipMtlnCksum', 1 )
for more details see: https://github.com/xrootd/xrootd/blob/master/bindings/python/libs/client/env.py#L41

Could you give it a try?

@adriansev
Copy link
Contributor Author

@simonmichal well, it seems that is not doing anything ...
xrd_log.txt
i attached the dump .. also i checked the status of variable:

XRD_LOGLEVEL=Dump XRD_LOGFILE="xrd_log.txt" j cp -cksum /alice/cern.ch/user/a/alitrain/PWGCF/CF_pPb/3221_20200319-0029/merge/AnalysisResults.root file:
ZipMtlnCksum = 1
jobID: 1/1 >>> Start
jobID: 1/1 >>> ERRNO/CODE/XRDSTAT 0/0/0 >>> STATUS OK >>> SPEED 56.67 MiB/s MESSAGE: [SUCCESS]

XRootD version: v20200403-eb0e7bea7

@adriansev
Copy link
Contributor Author

also, what will happen when i have in copy queue multiple files, some from zips and others not?

@simonmichal
Copy link
Contributor

simonmichal commented Apr 3, 2020

@adriansev : well, you still need to ask the copy job to do the checksum:

[2020-04-03 22:00:39.992457 +0300][Dump ][Utility ] Adding job with properties: 'checkSumMode' = 'none', 'checkSumPreset' = '', 'checkSumType' = '', 'chunkSize' = '8388608', 'coerce' = '0', 'dynamicSource' = '0', 'force' = '0', 'initTimeout' = '600', 'makeDir' = '1', 'parallelChunks' = '4', 'posc' = '1', 'preserveXAttr' = '0', 'source' = '/home/adrian/tmp/f26e8b0c-c2c1-5b2e-809d-f703ba4f61f6.meta4?xrdcl.unzip=AnalysisResults.root', 'target' = '/home.hdd/adrian/work-GRID/jalien_py/md5_test/AnalysisResults.root', 'thirdParty' = 'none', 'tpcTimeout' = '1800', 'xcp' = '0', 'xcpBlockSize' = '134217728', 'xrate' = '0', 'zipArchive' = '0'

please set checkSumMode to end2end and checkSumType to md5 (I suppose ;-)

Maybe it would be a good idea to run first a test with plain xrdcp.

To explain more the mechanics, if you want the client to do the checksumming you always need to give it the checkSumMode and checkSumType. For normal files (not extracted from ZIP) using the checksum from a metalink was already supported.
Files extracted from ZIP are special because one would expect that the checksum in the metalink refers to the whole ZIP archive and not to the extracted file and that's why by default we say that checksumming for those files is not supported (expect zcrc32, but that's a different story).
The ZipMtlnCksum setting allows you to out rule the default behaviour and assume that the checksum provided in the metalink is actually for the extracted file (which I believe is true in your case). I hope now it's more clear.

To answer your second question: the ZipMtlnCksum has no effect on normal files (not extracted from ZIP archives).

@adriansev
Copy link
Contributor Author

@simonmichal sorry, about that, i keep expecting for the software to read my mind :)
So, it seems to work, but there is no positive message like "check successful" or similar so i can only assume that the check is successful
xrd_log.txt
i see that even if the check fails, the file is kept .. is it intentional?
Thank you!!!

@simonmichal
Copy link
Contributor

@adriansev : that's correct, there's no cleanup functionality, however in case of xrdcp the return code will indicate there was a failure, in case of CopyProcess the returned status will indicate there was a checksum error.

I'll add more verbose logs so its clear what's happening.

Regarding the automatic cleanup on failure, if you're interested in this kind of functionality please create a separate issue and indicate whether you would like this to be included in the next release.

@adriansev
Copy link
Contributor Author

@simonmichal Thanks!!! i added the issue about deleting the target if checksum fails: #1173

@simonmichal
Copy link
Contributor

@adriansev : I suppose we can close this one?

@adriansev
Copy link
Contributor Author

@simonmichal sure, thanks a lot!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants