Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New directive request for http internal buffer sizes #1532

Open
snafus opened this issue Oct 12, 2021 · 12 comments
Open

New directive request for http internal buffer sizes #1532

snafus opened this issue Oct 12, 2021 · 12 comments
Assignees

Comments

@snafus
Copy link
Contributor

snafus commented Oct 12, 2021

Hi @abh3, @bbockelm , @ffurano,

I like to ask if a new (esoteric?) directive could be included to set the size of the internal HTTP buffer size for 'normal' and tpc davs transfers.
Currently it appears that there are hard-coded 1MiB buffers in the XrdHttp and XrdTpc locations.
For RAL (with Echo, and it's Ceph object store), it would be very helpful to be able to increase this to a some configurable sensible multiple of this.
For read requests we can mitigate to some extent with a memcache proxy and appropriate async sizes.
However for writes, I don't think we can do anything to make improvements without code changes.

From tests privately updating the code, this does show significant improvements, but there may be some other implications (side effects, etc.) I wouldn't be aware of. To have this as a configurable option in the official codebase would give demonstrable improvements for us.

Some of the locations I spotted (essentially reference 1024*1024) are:

if (offset == m_offset && (force || (size && !(size % (1024*1024))))) {

size_t TPCHandler::m_small_block_size = 1*1024*1024;

hp->myBuff = BPool->Obtain(1024 * 1024);

l = (long)min(filesize-writtenbytes, (long long)1024*1024);

l = min(rwOps[0].byteend - rwOps[0].bytestart + 1 - writtenbytes, (long long)1024*1024);

Do you think this is something that can be accommodated?
Kind regards,
James

@ffurano
Copy link
Contributor

ffurano commented Oct 12, 2021 via email

@bbockelm
Copy link
Contributor

Hi James -- I think this sounds like a great idea.

For TPC, you found the correct location for increasing the buffer size. Note TPCHandler::m_block_size is the size of the HTTP request generated when HTTP-TPC is doing multistreams and must be a multiple of the block size (otherwise you may hit deadlocks). My recommendation is to make it no smaller than 16MB.

I never did much exploration into optimal block sizes; without it, libcurl would generate extremely small writes (16KB or smaller, depending on how often you poll the TCP socket) and the Ceph RADOS integration would outright fail with unaligned writes. So, I'd welcome your work!

The hard decision point is the buffering in XrdHttp. Right now, the size of the buffer for PUT appears to simply be whatever comes off the socket. You may put the buffer in the XrdHttp layer ... or it might make more sense at the filesystem layer. We already "paid the price" in XrdTpc in terms of code complexity but it's not immediately clear the same decision makes sense for XrdHttp.

@xrootd-dev
Copy link

xrootd-dev commented Oct 12, 2021 via email

@ffurano
Copy link
Contributor

ffurano commented Oct 13, 2021 via email

@abh3
Copy link
Member

abh3 commented Oct 13, 2021 via email

@ffurano
Copy link
Contributor

ffurano commented Jan 7, 2022

I see the points. Trying to find the best solution.

@abh3
Copy link
Member

abh3 commented Jan 8, 2022

Well, the issue is that the buffers are fixed for TPC and that is causing them some issues. I can see that but @bbockelm would need to address that as that part of the code is rather complicated and it's not exactly clear you can just change the buffer size without side-effects. As for the http buffer size, I agree with you, there is no reason to change that as they should use the xrd.buffers directive to enable maxi-buffers via a special option that I will relay to them and then use the xrootd.async segsize option to get the size they want for async I/O. So, for normal reads/writes we already have a solution. What we need is a solution for http TPC and @bbockelm has to weigh in on that. But even then, they still need a memcache layer to handle small application reads. All this will do is speed up file copies (i.e. xrdcp and TPC).

@ffurano
Copy link
Contributor

ffurano commented Jan 10, 2022

I agree that a buffering layer has to be added somewhere to work around the slowness of the ofs being used (Ceph?)
Changing the XrdHttp buffers can be good a for quick and dirty test, but I don't consider this a solution, as there are
drawbacks that can be quite heavy.
I am confident that a short discussion will clarify what is the best place to buffer this data before flushing.

@snafus
Copy link
Contributor Author

snafus commented Jan 10, 2022

Hi,
Thanks for the updated comments. We have been working on a buffering component to the XrdCeph plugin that sits between ceph (and libradosstriper) and the XRootD calls. It still requires some further AIO functionality, but in a lightly-loaded system appears to recover good performance).
Once the system is reasonably loaded we do still observe a drop in the per-file transfer speed (say to O(5) MB/s), and some increased variation in transfer speeds. Identifying if / where that bottleneck is, is hopefully a final step to the performance improvements.

On the side of changing things in XrdTPC/Http it still could be interesting, and at some point I'd tried with hard-coded values with reasonable results. If there is some tuning / optimisation studies that might still be useful, I'd be interested in helping. I did also observe some multistream / out-of-order issues in tpc, but had not yet had chance to see where exactly that was appearing, and will report back when I find out more.
Thanks,
James

@ffurano
Copy link
Contributor

ffurano commented Jan 10, 2022

Hi,

I see, so I understand that you implemented aio in the XrdCeph layer. Correct? Do you think it would still be beneficial to receive larger chunks in the write call ?

@bbockelm
Copy link
Contributor

FWIW - I'm still fine with making the existing buffering in XrdTpc configurable. As I said, we already paid the price on that one and we're talking about a very modest config knob only for RAL.

However, for the XrdHTTP components, seems we're all in agreement that XrdCeph is really the best place to land this.

@abh3
Copy link
Member

abh3 commented Jan 10, 2022

Yes, but (there is always a but). Since http hands off the reads/writes to the xroot layer, we already have tuning knobs there that allow for very large buffers. We don't document that feature because we've seen problems with people not understanding what that tuning knob is for -- and it's for very specific use cases like CTA and now RAL. I will start an offline thread on this. Since TPC bypasses the xroot layer, there is no knob. So, yes, we would need one there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants