Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xrdcp] option to continue after errors? #305

Closed
cshimmin opened this issue Nov 23, 2015 · 4 comments
Closed

[xrdcp] option to continue after errors? #305

cshimmin opened this issue Nov 23, 2015 · 4 comments
Assignees

Comments

@cshimmin
Copy link

Hi,
I'm using xrdcp client v4.2.3. Is it possible to continue running in the case of errors (specifically, when running in --recursive mode)? The two scenarios where I find this would be useful are:

Corrupted data

You have 1000+ large files in some directory, that you'd like to copy. So you spin up xrdcp in a tmux session and let it go overnight.
If there is an error for some reason (e.g. [ERROR] Received corrupted data) on say, file number 5, you're hosed (and not very happy in the morning). This is because currently xrdcp just calls it quits and doesn't even try to transfer the remaining 994 files.

Existing data

Sometimes I add new data to a directory somewhere, and want to sync that to xrootd, again using the --recursive option. Currently xrootd will quit at the first file that it encounters which already exists at the destination. It would be useful if you could ask it to simply skip these files, and only transfer the new ones.
This could even be a less general case of allowing errors, e.g. by adding a --skipexisting option.

@abh3
Copy link
Member

abh3 commented Nov 25, 2015

No, we don't have such an option. I will tag this as a requested enhancement.

@cshimmin
Copy link
Author

Note that I found a partial work-around; by using the --parallel N option, it works as desired. If files already exist (or if there's any error for that matter), the parallel job slot just advances to the next job.

However --parallel 1 seems to be detected by xrdcp as a special case, and is run the normal way (without job slots) and will halt upon any error. Hence it is only a partial work-around since sometimes it's not appropriate to use parallel jobs (i.e. when disk I/O bound).

@simonmichal
Copy link
Contributor

Should this be an option or default behaviour ? Since currently the behaviour is bit inconsistent (with --parallel xrdcp tries to copy all the files, without aborts after first failure) it would be good to standardize ?

@simonmichal
Copy link
Contributor

Fixed by 92bdf14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants