-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgrw configuration server side #1740
Comments
It would seem that the issue here is that async is allowed and for the pgread case it was meant to be used in that context for streaming across the network. So, I see the problem here you don't want to turn off async I/O to Ceph (note that in 5.4 we do turn async off now unless ithe I/O is going across the network). So, I think what you want is to be able to do is turn off async pgread but allow normal reads to use async I/O. Right? |
preferably, I'd like for an option to turn off pgread/write altogether. non-async pg I/O is still significantly slower than normal read/write operations. I've managed to have this mode enabled by downgrading the kXR_PROTOCOLVERSION and recompiling the xroot packages, but this is not a sustainable solution long term. Having a config option that attains the same result would be preferable. |
OK, so we have a conflict of what we want to accomplish here. We tried to detect transmission error and obviously that does affect how we interact with the storage. However, the brute force "turn it off" disables transmission errors which in HL-LHC will be a big deal. So, that's not a good solution. I won't say we hit it right but the proposed solution isn't right either. I appreciate your immediate concern but we need to find a god transition because we need to address this issue that will plague HL-LHC. So, let's figure out what the right solution is for Ceph based storage. |
The main problem is the small individual read size. either changing the read size or buffering multiple requests into a single read request to ceph might improve things |
Well, that was what I was talking about, If you can't support a 64K read size you need to turn it off which in this case means turning off pgread. The evil thing about this is that it will revert to TLS which is still a 64K read and you will be sunk nonetheless. Hey. we are trying to verify the data that was sent. So, we need to find the compromise between network integrity and file system integrity/. In general we thought Ceph solved that problem but apparently we missed it. |
buffering should fix this issue. I tried some preliminary testing and there's a few things going wrong... pgwrite non-async seems to be doing fine, getting about the same magnitude in transfer speed as the normal write operation (~33 mbps). pgreads on the other hand show unexpected errors:
tracking this down server side, the size of the read is 4MB, instead of the 16 specified in the buffer or the 8 usually requested by the client. I believe this might be from PgrwBuff's maxKeep, but it's unclear where the resoure conflict comes from. I'll update with any progress on this |
I've found the issue, the clients were performing ::readv calls with iovcnt = 2044, which goes over the max system limit of 1024. I've made a PR that fixes it by bringing down the max dlen to ~2MB, which makes the clients calculate the right size. For future proofing, some error catching on XrdCl/XrdClAsyncPageReader.hh::InitIOV when iovcnt goes over limit would be nice. currently it just returns from the initialization with no errors in that case, as the clauses are tied with having dleft=0 |
Hi, I am concerned that there are an increasing number of 5.4.3 clients (presumably as the last in the series of 5.4.X), but that when we switch to 5.5.0 on the server side, these clients will all presumably start to fail against our site for root protocol transfers? |
I've tested an upgrade from xrootd 5.3.3 to 5.4.3 on a machine with a ceph storage backend. This caused an immediate drop in performance, which I've tracked to the fact that on recent xrootd versions, pgrw is the default transfer operations client side. Changes on the XrdCeph plugin (like features) didn't seem to be picked up correctly, as by the time the plugin converts the request into aio_read, they're already split into 64kb chunks, which is the main reason for the slowdown.
Due to the fact that small read sizes might not be optimal for some storage types (ceph included), would it be possible to have this be configurable server-side, by having a configuration parameter determine whether the server supports pgrw, rather than it being tied to the protocol version, as it is currently?
The text was updated successfully, but these errors were encountered: