Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgrw configuration server side #1740

Closed
Jo-stfc opened this issue Jul 18, 2022 · 9 comments
Closed

pgrw configuration server side #1740

Jo-stfc opened this issue Jul 18, 2022 · 9 comments

Comments

@Jo-stfc
Copy link
Collaborator

Jo-stfc commented Jul 18, 2022

I've tested an upgrade from xrootd 5.3.3 to 5.4.3 on a machine with a ceph storage backend. This caused an immediate drop in performance, which I've tracked to the fact that on recent xrootd versions, pgrw is the default transfer operations client side. Changes on the XrdCeph plugin (like features) didn't seem to be picked up correctly, as by the time the plugin converts the request into aio_read, they're already split into 64kb chunks, which is the main reason for the slowdown.
Due to the fact that small read sizes might not be optimal for some storage types (ceph included), would it be possible to have this be configurable server-side, by having a configuration parameter determine whether the server supports pgrw, rather than it being tied to the protocol version, as it is currently?

@abh3
Copy link
Member

abh3 commented Jul 18, 2022

It would seem that the issue here is that async is allowed and for the pgread case it was meant to be used in that context for streaming across the network. So, I see the problem here you don't want to turn off async I/O to Ceph (note that in 5.4 we do turn async off now unless ithe I/O is going across the network). So, I think what you want is to be able to do is turn off async pgread but allow normal reads to use async I/O. Right?

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Jul 19, 2022

preferably, I'd like for an option to turn off pgread/write altogether. non-async pg I/O is still significantly slower than normal read/write operations. I've managed to have this mode enabled by downgrading the kXR_PROTOCOLVERSION and recompiling the xroot packages, but this is not a sustainable solution long term. Having a config option that attains the same result would be preferable.

@Jo-stfc Jo-stfc closed this as completed Jul 19, 2022
@abh3
Copy link
Member

abh3 commented Jul 19, 2022

OK, so we have a conflict of what we want to accomplish here. We tried to detect transmission error and obviously that does affect how we interact with the storage. However, the brute force "turn it off" disables transmission errors which in HL-LHC will be a big deal. So, that's not a good solution. I won't say we hit it right but the proposed solution isn't right either. I appreciate your immediate concern but we need to find a god transition because we need to address this issue that will plague HL-LHC. So, let's figure out what the right solution is for Ceph based storage.

@Jo-stfc Jo-stfc reopened this Jul 19, 2022
@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Jul 19, 2022

The main problem is the small individual read size. either changing the read size or buffering multiple requests into a single read request to ceph might improve things

@abh3
Copy link
Member

abh3 commented Jul 19, 2022

Well, that was what I was talking about, If you can't support a 64K read size you need to turn it off which in this case means turning off pgread. The evil thing about this is that it will revert to TLS which is still a 64K read and you will be sunk nonetheless. Hey. we are trying to verify the data that was sent. So, we need to find the compromise between network integrity and file system integrity/. In general we thought Ceph solved that problem but apparently we missed it.

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Jul 19, 2022

buffering should fix this issue. I tried some preliminary testing and there's a few things going wrong... pgwrite non-async seems to be doing fine, getting about the same magnitude in transfer speed as the normal write operation (~33 mbps). pgreads on the other hand show unexpected errors:

Run: [ERROR] Server responded with an error: [3005] Unable to pgRead dteam:test449.txt; operation now in progress (source)

tracking this down server side, the size of the read is 4MB, instead of the 16 specified in the buffer or the 8 usually requested by the client. I believe this might be from PgrwBuff's maxKeep, but it's unclear where the resoure conflict comes from. I'll update with any progress on this

@Jo-stfc
Copy link
Collaborator Author

Jo-stfc commented Jul 25, 2022

I've found the issue, the clients were performing ::readv calls with iovcnt = 2044, which goes over the max system limit of 1024. I've made a PR that fixes it by bringing down the max dlen to ~2MB, which makes the clients calculate the right size. For future proofing, some error catching on XrdCl/XrdClAsyncPageReader.hh::InitIOV when iovcnt goes over limit would be nice. currently it just returns from the initialization with no errors in that case, as the clauses are tied with having dleft=0

@simonmichal
Copy link
Contributor

@Jo-stfc : thanks a lot for reporting this problem, for your investigation and for understanding the root cause of it

It has been fixed in 8319e53

@snafus
Copy link
Contributor

snafus commented Sep 13, 2022

Hi,
Should we expect a 5.4.4 release that includes this commit: 8319e53 ?
My understanding is that this was fixed in the client (in 5.5.0). So clients with 5.4.2 (and below) should be unaffected - as no pgreads by default from (non Xcache) client - and 5.5.0 has the issue resolved.

I am concerned that there are an increasing number of 5.4.3 clients (presumably as the last in the series of 5.4.X), but that when we switch to 5.5.0 on the server side, these clients will all presumably start to fail against our site for root protocol transfers?
Do you have any advice here?
Thanks a lot, James

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants