pgrw configuration server side #1740

Jo-stfc · 2022-07-18T14:32:58Z

I've tested an upgrade from xrootd 5.3.3 to 5.4.3 on a machine with a ceph storage backend. This caused an immediate drop in performance, which I've tracked to the fact that on recent xrootd versions, pgrw is the default transfer operations client side. Changes on the XrdCeph plugin (like features) didn't seem to be picked up correctly, as by the time the plugin converts the request into aio_read, they're already split into 64kb chunks, which is the main reason for the slowdown.
Due to the fact that small read sizes might not be optimal for some storage types (ceph included), would it be possible to have this be configurable server-side, by having a configuration parameter determine whether the server supports pgrw, rather than it being tied to the protocol version, as it is currently?

abh3 · 2022-07-18T21:56:02Z

It would seem that the issue here is that async is allowed and for the pgread case it was meant to be used in that context for streaming across the network. So, I see the problem here you don't want to turn off async I/O to Ceph (note that in 5.4 we do turn async off now unless ithe I/O is going across the network). So, I think what you want is to be able to do is turn off async pgread but allow normal reads to use async I/O. Right?

Jo-stfc · 2022-07-19T07:58:12Z

preferably, I'd like for an option to turn off pgread/write altogether. non-async pg I/O is still significantly slower than normal read/write operations. I've managed to have this mode enabled by downgrading the kXR_PROTOCOLVERSION and recompiling the xroot packages, but this is not a sustainable solution long term. Having a config option that attains the same result would be preferable.

abh3 · 2022-07-19T09:15:31Z

OK, so we have a conflict of what we want to accomplish here. We tried to detect transmission error and obviously that does affect how we interact with the storage. However, the brute force "turn it off" disables transmission errors which in HL-LHC will be a big deal. So, that's not a good solution. I won't say we hit it right but the proposed solution isn't right either. I appreciate your immediate concern but we need to find a god transition because we need to address this issue that will plague HL-LHC. So, let's figure out what the right solution is for Ceph based storage.

Jo-stfc · 2022-07-19T09:22:01Z

The main problem is the small individual read size. either changing the read size or buffering multiple requests into a single read request to ceph might improve things

abh3 · 2022-07-19T09:35:13Z

Well, that was what I was talking about, If you can't support a 64K read size you need to turn it off which in this case means turning off pgread. The evil thing about this is that it will revert to TLS which is still a 64K read and you will be sunk nonetheless. Hey. we are trying to verify the data that was sent. So, we need to find the compromise between network integrity and file system integrity/. In general we thought Ceph solved that problem but apparently we missed it.

Jo-stfc · 2022-07-19T10:06:08Z

buffering should fix this issue. I tried some preliminary testing and there's a few things going wrong... pgwrite non-async seems to be doing fine, getting about the same magnitude in transfer speed as the normal write operation (~33 mbps). pgreads on the other hand show unexpected errors:

Run: [ERROR] Server responded with an error: [3005] Unable to pgRead dteam:test449.txt; operation now in progress (source)

tracking this down server side, the size of the read is 4MB, instead of the 16 specified in the buffer or the 8 usually requested by the client. I believe this might be from PgrwBuff's maxKeep, but it's unclear where the resoure conflict comes from. I'll update with any progress on this

Jo-stfc · 2022-07-25T16:04:45Z

I've found the issue, the clients were performing ::readv calls with iovcnt = 2044, which goes over the max system limit of 1024. I've made a PR that fixes it by bringing down the max dlen to ~2MB, which makes the clients calculate the right size. For future proofing, some error catching on XrdCl/XrdClAsyncPageReader.hh::InitIOV when iovcnt goes over limit would be nice. currently it just returns from the initialization with no errors in that case, as the clauses are tied with having dleft=0

simonmichal · 2022-07-27T13:20:30Z

@Jo-stfc : thanks a lot for reporting this problem, for your investigation and for understanding the root cause of it

It has been fixed in 8319e53

snafus · 2022-09-13T08:47:14Z

Hi,
Should we expect a 5.4.4 release that includes this commit: 8319e53 ?
My understanding is that this was fixed in the client (in 5.5.0). So clients with 5.4.2 (and below) should be unaffected - as no pgreads by default from (non Xcache) client - and 5.5.0 has the issue resolved.

I am concerned that there are an increasing number of 5.4.3 clients (presumably as the last in the series of 5.4.X), but that when we switch to 5.5.0 on the server side, these clients will all presumably start to fail against our site for root protocol transfers?
Do you have any advice here?
Thanks a lot, James

Jo-stfc closed this as completed Jul 19, 2022

Jo-stfc reopened this Jul 19, 2022

Jo-stfc mentioned this issue Jul 25, 2022

reduce maxCSSZ to avoid readv EINVAL error #1749

Closed

simonmichal closed this as completed Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pgrw configuration server side #1740

pgrw configuration server side #1740

Jo-stfc commented Jul 18, 2022

abh3 commented Jul 18, 2022

Jo-stfc commented Jul 19, 2022

abh3 commented Jul 19, 2022

Jo-stfc commented Jul 19, 2022

abh3 commented Jul 19, 2022

Jo-stfc commented Jul 19, 2022 •

edited

Jo-stfc commented Jul 25, 2022

simonmichal commented Jul 27, 2022

snafus commented Sep 13, 2022

pgrw configuration server side #1740

pgrw configuration server side #1740

Comments

Jo-stfc commented Jul 18, 2022

abh3 commented Jul 18, 2022

Jo-stfc commented Jul 19, 2022

abh3 commented Jul 19, 2022

Jo-stfc commented Jul 19, 2022

abh3 commented Jul 19, 2022

Jo-stfc commented Jul 19, 2022 • edited

Jo-stfc commented Jul 25, 2022

simonmichal commented Jul 27, 2022

snafus commented Sep 13, 2022

Jo-stfc commented Jul 19, 2022 •

edited