-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xrdcp with single socket vs parallel sockets vs extreme copy mode #1938
Comments
Likely the largest effect for the 100GE connection is the chunk size.
There simply is not enough data in the pipe to drive that connection. I
would increase the chunk size to the largest possible value and see how
that goes with 1 and up to 4 connections.
Andy
…On Wed, 1 Mar 2023, Andreas-Joachim Peters wrote:
I did some **100GE** benchmarks and I notice the following when comparing parallel sockets vs extreme copy.
The single stream copy :
```
***@***.***# time xrdcp -y 1 e.meta4 /dev/null -f
[9.766GB/9.766GB][100%][==================================================][2.441GB/s]
real 0m4.887s
user 0m1.523s
sys 0m4.945s
```
The parallel socket implementation using 10 sockets:
```
***@***.***# time xrdcp -S 10 e.meta4 /dev/null -f
[9.766GB/9.766GB][100%][==================================================][1.953GB/s]
real 0m4.964s
user 0m1.688s
sys 0m4.998s
```
An extreme copy by using 10 named connections to the same xrootd server:
```
***@***.***# time xrdcp -y 10 e.meta4 /dev/null -f
[9.766GB/9.766GB][100%][==================================================][9.766GB/s]
real 0m0.947s
user 0m2.685s
sys 0m7.357s
```
It is fantastic, that I can run a single copy with 10 GB/s but I have to do some gymnastic to get this, while the easy defaults with single or parallel sockets have a much lower limit.
Maybe one could use the same switch for implicit extreme copy mode to specify the number of connections if the source is not a meta link file?
E.g.
```
xrdcp -y 10 root://server//10G /dev/null
```
does implicit extreme copy from 10 named connections:
```
***@***.***//10G
***@***.***//10G
***@***.***/10G
...
***@***.***//10G
```
Or do a fix to get the same result using the -S switch?
--
Reply to this email directly or view it on GitHub:
#1938
You are receiving this because you are subscribed to this thread.
Message ID: ***@***.***>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|
Thanks, did that varying number of chunks in flight and size: 1 standard socket:
4 parallel sockets:
|
Looks like it is marginally faster (didn't compute the transfer rate).
What version of the server are you using? Is async enabled for the server
or not?
Andy
…On Wed, 1 Mar 2023, Andreas-Joachim Peters wrote:
Thanks, did that varying number of chunks in flight and size:
**1 standard socket:**
```
***@***.***# for chunks in 4 8 16; do for chunksize in 4194304 16777216 67108864 134217728; do echo chunks:$chunks size:$chunksize; env XRD_CPPARALLELCHUNKS=$chunks XRD_CPCHUNKSIZE=$chunksize time -f%es xrdcp e.meta4 /dev/null -f --nopbar; done; done
chunks:4 size:4194304
5.13s
chunks:4 size:16777216
4.65s
chunks:4 size:67108864
5.03s
chunks:4 size:134217728
5.14s
chunks:8 size:4194304
5.14s
chunks:8 size:16777216
4.56s
chunks:8 size:67108864
4.78s
chunks:8 size:134217728
5.42s
chunks:16 size:4194304
4.55s
chunks:16 size:16777216
4.52s
chunks:16 size:67108864
5.26s
chunks:16 size:134217728
5.54s
```
**4 parallel sockets:**
```
***@***.***# for chunks in 4 8 16; do for chunksize in 4194304 16777216 67108864 134217728; do echo chunks:$chunks size:$chunksize; env XRD_CPPARALLELCHUNKS=$chunks XRD_CPCHUNKSIZE=$chunksize time -f%es xrdcp -S 4 e.meta4 /dev/null -f --nopbar; done; done
chunks:4 size:4194304
4.55s
chunks:4 size:16777216
3.92s
chunks:4 size:67108864
4.58s
chunks:4 size:134217728
4.87s
chunks:8 size:4194304
4.45s
chunks:8 size:16777216
4.03s
chunks:8 size:67108864
4.78s
chunks:8 size:134217728
5.09s
chunks:16 size:4194304
4.26s
chunks:16 size:16777216
5.25s
chunks:16 size:67108864
5.00s
chunks:16 size:134217728
5.22s
```
--
Reply to this email directly or view it on GitHub:
#1938 (comment)
You are receiving this because you commented.
Message ID: ***@***.***>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|
I tried client/server combinations from 5.5.3 and our build of the master branch from last week. The results are the same. |
Isn't it probably just, that for a single/parallel connection requests are serialized in the server, while for independent connections they can run in parallel? |
Well, you may be correct in the way that parallel streams are dispatched.
Indeed, they are not dispatched asynchronously. They are put into a queue
and run over sequentially. The reason is that this is the only way we have
to prvent a client of overwhelming a server. Believe me, clients (i.e.
users)) really have a habit of just doing that to get their work done
ASAP. So, yes, multiple connections will be more attuned to what you want
but in the end it's not what you want. You really want the I/O to be fair
shared and you don't get with multiple connections. Plus, the parallel
connection allow you to opt out TLS while the mail connection do not that.
So, shoose your poison.
Andy
…On Thu, 2 Mar 2023, Andreas-Joachim Peters wrote:
Isn't it probably just, that for a single/parallel connection requests are serialized in the server, while for independent connections they can run in parallel?
--
Reply to this email directly or view it on GitHub:
#1938 (comment)
You are receiving this because you commented.
Message ID: ***@***.***>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|
Yes, this has been observed by many people and simply is something that naturally happens in these kinds of scenarios. The only exception is that bound sockets should have performed much better than shown here and we will be reviewing the implementation on the client side a possible source of the performance problem. |
I did some 100GE benchmarks and I notice the following when comparing parallel sockets vs extreme copy.
The single stream copy :
The parallel socket implementation using 10 sockets:
An extreme copy by using 10 named connections to the same xrootd server:
It is fantastic, that I can run a single copy with 10 GB/s but I have to do some gymnastic to get this, while the easy defaults with single or parallel sockets have a much lower limit.
Maybe one could use the same switch for implicit extreme copy mode to specify the number of connections if the source is not a meta link file?
E.g.
does implicit extreme copy from 10 named connections:
Or do a fix to get the same result using the -S switch?
The text was updated successfully, but these errors were encountered: