Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xrdcp with single socket vs parallel sockets vs extreme copy mode #1938

Closed
apeters1971 opened this issue Mar 1, 2023 · 7 comments
Closed
Assignees

Comments

@apeters1971
Copy link
Contributor

I did some 100GE benchmarks and I notice the following when comparing parallel sockets vs extreme copy.

The single stream copy :

[root@node]# time xrdcp -y 1 e.meta4 /dev/null -f 
[9.766GB/9.766GB][100%][==================================================][2.441GB/s]  

real	0m4.887s
user	0m1.523s
sys	0m4.945s

The parallel socket implementation using 10 sockets:


[root@node]# time xrdcp -S 10 e.meta4 /dev/null -f 
[9.766GB/9.766GB][100%][==================================================][1.953GB/s]  

real	0m4.964s
user	0m1.688s
sys	0m4.998s

An extreme copy by using 10 named connections to the same xrootd server:

[root@node]# time xrdcp -y 10 e.meta4 /dev/null -f 
[9.766GB/9.766GB][100%][==================================================][9.766GB/s]  

real	0m0.947s
user	0m2.685s
sys	0m7.357s

It is fantastic, that I can run a single copy with 10 GB/s but I have to do some gymnastic to get this, while the easy defaults with single or parallel sockets have a much lower limit.

Maybe one could use the same switch for implicit extreme copy mode to specify the number of connections if the source is not a meta link file?
E.g.

xrdcp -y 10 root://server//10G /dev/null 

does implicit extreme copy from 10 named connections:

root://1@server//10G 
root://2@server//10G
root://3@server/10G
...
root://10@server//10G

Or do a fix to get the same result using the -S switch?

@xrootd-dev
Copy link

xrootd-dev commented Mar 1, 2023 via email

@apeters1971
Copy link
Contributor Author

Thanks, did that varying number of chunks in flight and size:

1 standard socket:

[root@node]# for chunks in 4 8 16; do for chunksize in 4194304 16777216 67108864 134217728; do echo chunks:$chunks size:$chunksize;  env XRD_CPPARALLELCHUNKS=$chunks XRD_CPCHUNKSIZE=$chunksize time -f%es xrdcp e.meta4 /dev/null -f --nopbar; done; done
chunks:4 size:4194304
5.13s
chunks:4 size:16777216
4.65s
chunks:4 size:67108864
5.03s
chunks:4 size:134217728
5.14s
chunks:8 size:4194304
5.14s
chunks:8 size:16777216
4.56s
chunks:8 size:67108864
4.78s
chunks:8 size:134217728
5.42s
chunks:16 size:4194304
4.55s
chunks:16 size:16777216
4.52s
chunks:16 size:67108864
5.26s
chunks:16 size:134217728
5.54s

4 parallel sockets:

[root@node]# for chunks in 4 8 16; do for chunksize in 4194304 16777216 67108864 134217728; do echo chunks:$chunks size:$chunksize;  env XRD_CPPARALLELCHUNKS=$chunks XRD_CPCHUNKSIZE=$chunksize time -f%es xrdcp -S 4 e.meta4 /dev/null -f --nopbar; done; done
chunks:4 size:4194304
4.55s
chunks:4 size:16777216
3.92s
chunks:4 size:67108864
4.58s
chunks:4 size:134217728
4.87s
chunks:8 size:4194304
4.45s
chunks:8 size:16777216
4.03s
chunks:8 size:67108864
4.78s
chunks:8 size:134217728
5.09s
chunks:16 size:4194304
4.26s
chunks:16 size:16777216
5.25s
chunks:16 size:67108864
5.00s
chunks:16 size:134217728
5.22s

@xrootd-dev
Copy link

xrootd-dev commented Mar 1, 2023 via email

@apeters1971
Copy link
Contributor Author

I tried client/server combinations from 5.5.3 and our build of the master branch from last week. The results are the same.
I tried with defaults, xrootd.async off, xrootd.async off nosf ... results are almost always the same. There must be some bottlenecks server side with multiplexed/parallel connections, because with individual connections it is so much faster ... ok, when my target can write 'only' 1 GB/s I actually don't see a difference between the three ways to copy, but with NVMes as target it makes already a difference.

@apeters1971
Copy link
Contributor Author

Isn't it probably just, that for a single/parallel connection requests are serialized in the server, while for independent connections they can run in parallel?

@xrootd-dev
Copy link

xrootd-dev commented Mar 3, 2023 via email

@abh3 abh3 self-assigned this Mar 8, 2023
@abh3 abh3 added the Discussion label Mar 8, 2023
@abh3
Copy link
Member

abh3 commented Oct 12, 2023

Yes, this has been observed by many people and simply is something that naturally happens in these kinds of scenarios. The only exception is that bound sockets should have performed much better than shown here and we will be reviewing the implementation on the client side a possible source of the performance problem.

@abh3 abh3 closed this as completed Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants