-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New directive request for http internal buffer sizes #1532
Comments
Hi,
I think that the places that may matter for what you mention
are just the first two, i.e. inside XrdTPC
(the other places are just related to the request header)
Maybe Brian can comment on these two inside XrdTPC? Maybe just one
would need to be changed.
Cheers
Fabrizio
Il 12/10/2021 13:12, snafus ha scritto:
… Hi @abh3 <https://github.com/abh3>, @bbockelm
<https://github.com/bbockelm> , @ffurano <https://github.com/ffurano>,
I like to ask if a new (esoteric?) directive could be included to set
the size of the internal HTTP buffer size for 'normal' and tpc davs
transfers.
Currently it appears that there are hard-coded 1MiB buffers in the
XrdHttp and XrdTpc locations.
For RAL (with Echo, and it's Ceph object store), it would be very
helpful to be able to increase this to a some configurable sensible
multiple of this.
For read requests we can mitigate to some extent with a memcache proxy
and appropriate async sizes.
However for writes, I don't think we can do anything to make
improvements without code changes.
From tests privately updating the code, this does show significant
improvements, but there may be some other implications (side effects,
etc.) I wouldn't be aware of. To have this as a configurable option in
the official codebase would give demonstrable improvements for us.
Some of the locations I spotted (essentially reference 1024*1024) are:
https://github.com/xrootd/xrootd/blob/0e460d7a11702c9935ed9fc1c67f22cb998b6fa4/src/XrdTpc/XrdTpcStream.cc#L83
<https://github.com/xrootd/xrootd/blob/0e460d7a11702c9935ed9fc1c67f22cb998b6fa4/src/XrdTpc/XrdTpcStream.cc#L83>
https://github.com/xrootd/xrootd/blob/70066fe7edaa68ea271ffdf9d54c9d26294eeafd/src/XrdTpc/XrdTpcTPC.cc#L29
<https://github.com/xrootd/xrootd/blob/70066fe7edaa68ea271ffdf9d54c9d26294eeafd/src/XrdTpc/XrdTpcTPC.cc#L29>
https://github.com/xrootd/xrootd/blob/9b03713e40a98528e8736330ee40035281e78181/src/XrdHttp/XrdHttpProtocol.cc#L283
<https://github.com/xrootd/xrootd/blob/9b03713e40a98528e8736330ee40035281e78181/src/XrdHttp/XrdHttpProtocol.cc#L283>
https://github.com/xrootd/xrootd/blob/db9bf87bf648b5af3a2bd30d55995e4212f684a2/src/XrdHttp/XrdHttpReq.cc#L1291
<https://github.com/xrootd/xrootd/blob/db9bf87bf648b5af3a2bd30d55995e4212f684a2/src/XrdHttp/XrdHttpReq.cc#L1291>
https://github.com/xrootd/xrootd/blob/db9bf87bf648b5af3a2bd30d55995e4212f684a2/src/XrdHttp/XrdHttpReq.cc#L1296
<https://github.com/xrootd/xrootd/blob/db9bf87bf648b5af3a2bd30d55995e4212f684a2/src/XrdHttp/XrdHttpReq.cc#L1296>
Do you think this is something that can be accommodated?
Kind regards,
James
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1532>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJBUT5QRVLWTFES7IO36LTUGQJ2BANCNFSM5F2J2QJA>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Hi James -- I think this sounds like a great idea. For TPC, you found the correct location for increasing the buffer size. Note I never did much exploration into optimal block sizes; without it, The hard decision point is the buffering in XrdHttp. Right now, the size of the buffer for PUT appears to simply be whatever comes off the socket. You may put the buffer in the XrdHttp layer ... or it might make more sense at the filesystem layer. We already "paid the price" in XrdTpc in terms of code complexity but it's not immediately clear the same decision makes sense for XrdHttp. |
It's not clear what is meant by "filesystem layer". I do not like mucking
around a generic interface to simply satisfy one protocol that may not
staisfy another. So, I'd like to keep the complexity level to a minimum.
We have a hard enough time with http as it is.
…On Tue, 12 Oct 2021, Brian P Bockelman wrote:
Hi James -- I think this sounds like a great idea.
For TPC, you found the correct location for increasing the buffer size. Note `TPCHandler::m_block_size` is the size of the HTTP request generated when HTTP-TPC is doing multistreams and must be a multiple of the block size (otherwise you may hit deadlocks). My recommendation is to make it no smaller than 16MB.
I never did much exploration into optimal block sizes; without it, `libcurl` would generate extremely small writes (16KB or smaller, depending on how often you poll the TCP socket) and the Ceph RADOS integration would outright fail with unaligned writes. So, I'd welcome your work!
The hard decision point is the buffering in XrdHttp. Right now, the size of the buffer for PUT appears to simply be whatever comes off the socket. You _may_ put the buffer in the XrdHttp layer ... or it might make more sense at the filesystem layer. We already "paid the price" in XrdTpc in terms of code complexity but it's not immediately clear the same decision makes sense for XrdHttp.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#1532 (comment)
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|
Hi,
AFAIU (Brian) you are proposing a buffering layer that
accumulates sequential writes before flushing to the fs layer.
I think that the behaviour of such a buffer would be
dependent on the needs of the underlying fs implementation, and
in this case (correct me please if I am wrong) the underlying implementation
(and the only one that would need it) is XrdCeph. So I would
implement it in XrdCeph as a private write buffer...
f
…On 10/12/21 11:09 PM, xrootd-dev wrote:
It's not clear what is meant by "filesystem layer". I do not like mucking
around a generic interface to simply satisfy one protocol that may not
staisfy another. So, I'd like to keep the complexity level to a minimum.
We have a hard enough time with http as it is.
On Tue, 12 Oct 2021, Brian P Bockelman wrote:
> Hi James -- I think this sounds like a great idea.
>
> For TPC, you found the correct location for increasing the buffer size. Note `TPCHandler::m_block_size` is the size of the
HTTP request generated when HTTP-TPC is doing multistreams and must be a multiple of the block size (otherwise you may hit
deadlocks). My recommendation is to make it no smaller than 16MB.
>
> I never did much exploration into optimal block sizes; without it, `libcurl` would generate extremely small writes (16KB or
smaller, depending on how often you poll the TCP socket) and the Ceph RADOS integration would outright fail with unaligned
writes. So, I'd welcome your work!
>
> The hard decision point is the buffering in XrdHttp. Right now, the size of the buffer for PUT appears to simply be whatever
comes off the socket. You _may_ put the buffer in the XrdHttp layer ... or it might make more sense at the filesystem layer. We
already "paid the price" in XrdTpc in terms of code complexity but it's not immediately clear the same decision makes sense for
XrdHttp.
>
> --
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly or view it on GitHub:
> #1532 (comment)
>
> ########################################################################
> Use REPLY-ALL to reply to list
>
> To unsubscribe from the XROOTD-DEV list, click the following link:
> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1532 (comment)>, or
unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJBUTYPWHRLQUOEGXDQ7X3UGSPYBANCNFSM5F2J2QJA>.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
If you plan to do that you really need to copordinate with RAL as they are
the current maintainers of that plugin. I don't think it would be wise to
change that without going through it with them as they have been making
active changes.
…On Wed, 13 Oct 2021, Fabrizio Furano wrote:
Hi,
AFAIU (Brian) you are proposing a buffering layer that
accumulates sequential writes before flushing to the fs layer.
I think that the behaviour of such a buffer would be
dependent on the needs of the underlying fs implementation, and
in this case (correct me please if I am wrong) the underlying implementation
(and the only one that would need it) is XrdCeph. So I would
implement it in XrdCeph as a private write buffer...
f
On 10/12/21 11:09 PM, xrootd-dev wrote:
> It's not clear what is meant by "filesystem layer". I do not like mucking
> around a generic interface to simply satisfy one protocol that may not
> staisfy another. So, I'd like to keep the complexity level to a minimum.
> We have a hard enough time with http as it is.
>
> On Tue, 12 Oct 2021, Brian P Bockelman wrote:
>
>> Hi James -- I think this sounds like a great idea.
>>
>> For TPC, you found the correct location for increasing the buffer size. Note `TPCHandler::m_block_size` is the size of the
> HTTP request generated when HTTP-TPC is doing multistreams and must be a multiple of the block size (otherwise you may hit
> deadlocks). My recommendation is to make it no smaller than 16MB.
>>
>> I never did much exploration into optimal block sizes; without it, `libcurl` would generate extremely small writes (16KB or
> smaller, depending on how often you poll the TCP socket) and the Ceph RADOS integration would outright fail with unaligned
> writes. So, I'd welcome your work!
>>
>> The hard decision point is the buffering in XrdHttp. Right now, the size of the buffer for PUT appears to simply be whatever
> comes off the socket. You _may_ put the buffer in the XrdHttp layer ... or it might make more sense at the filesystem layer. We
> already "paid the price" in XrdTpc in terms of code complexity but it's not immediately clear the same decision makes sense for
> XrdHttp.
>>
>> --
>> You are receiving this because you are subscribed to this thread.
>> Reply to this email directly or view it on GitHub:
>> #1532 (comment)
>>
>> ########################################################################
>> Use REPLY-ALL to reply to list
>>
>> To unsubscribe from the XROOTD-DEV list, click the following link:
>> https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
>
> ?
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub <#1532 (comment)>, or
> unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJBUTYPWHRLQUOEGXDQ7X3UGSPYBANCNFSM5F2J2QJA>.
> Triage notifications on the go with GitHub Mobile for iOS
> <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android
> <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
>
>
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1532 (comment)
|
I see the points. Trying to find the best solution. |
Well, the issue is that the buffers are fixed for TPC and that is causing them some issues. I can see that but @bbockelm would need to address that as that part of the code is rather complicated and it's not exactly clear you can just change the buffer size without side-effects. As for the http buffer size, I agree with you, there is no reason to change that as they should use the xrd.buffers directive to enable maxi-buffers via a special option that I will relay to them and then use the xrootd.async segsize option to get the size they want for async I/O. So, for normal reads/writes we already have a solution. What we need is a solution for http TPC and @bbockelm has to weigh in on that. But even then, they still need a memcache layer to handle small application reads. All this will do is speed up file copies (i.e. xrdcp and TPC). |
I agree that a buffering layer has to be added somewhere to work around the slowness of the ofs being used (Ceph?) |
Hi, On the side of changing things in XrdTPC/Http it still could be interesting, and at some point I'd tried with hard-coded values with reasonable results. If there is some tuning / optimisation studies that might still be useful, I'd be interested in helping. I did also observe some multistream / out-of-order issues in tpc, but had not yet had chance to see where exactly that was appearing, and will report back when I find out more. |
Hi, I see, so I understand that you implemented aio in the XrdCeph layer. Correct? Do you think it would still be beneficial to receive larger chunks in the write call ? |
FWIW - I'm still fine with making the existing buffering in XrdTpc configurable. As I said, we already paid the price on that one and we're talking about a very modest config knob only for RAL. However, for the XrdHTTP components, seems we're all in agreement that XrdCeph is really the best place to land this. |
Yes, but (there is always a but). Since http hands off the reads/writes to the xroot layer, we already have tuning knobs there that allow for very large buffers. We don't document that feature because we've seen problems with people not understanding what that tuning knob is for -- and it's for very specific use cases like CTA and now RAL. I will start an offline thread on this. Since TPC bypasses the xroot layer, there is no knob. So, yes, we would need one there. |
Hi @abh3, @bbockelm , @ffurano,
I like to ask if a new (esoteric?) directive could be included to set the size of the internal HTTP buffer size for 'normal' and tpc davs transfers.
Currently it appears that there are hard-coded 1MiB buffers in the XrdHttp and XrdTpc locations.
For RAL (with Echo, and it's Ceph object store), it would be very helpful to be able to increase this to a some configurable sensible multiple of this.
For read requests we can mitigate to some extent with a memcache proxy and appropriate async sizes.
However for writes, I don't think we can do anything to make improvements without code changes.
From tests privately updating the code, this does show significant improvements, but there may be some other implications (side effects, etc.) I wouldn't be aware of. To have this as a configurable option in the official codebase would give demonstrable improvements for us.
Some of the locations I spotted (essentially reference 1024*1024) are:
xrootd/src/XrdTpc/XrdTpcStream.cc
Line 83 in 0e460d7
xrootd/src/XrdTpc/XrdTpcTPC.cc
Line 29 in 70066fe
xrootd/src/XrdHttp/XrdHttpProtocol.cc
Line 283 in 9b03713
xrootd/src/XrdHttp/XrdHttpReq.cc
Line 1291 in db9bf87
xrootd/src/XrdHttp/XrdHttpReq.cc
Line 1296 in db9bf87
Do you think this is something that can be accommodated?
Kind regards,
James
The text was updated successfully, but these errors were encountered: