-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rising number of open files in xcache #975
Comments
Hi Nikolai, Actually we just figured this one out with Andy ... this due to relatively short default timeout (180s) given to the cache to close a file after a client disconnect. When a system is busy, cache can not finish writing to the file in time and then xrootd layer decides to "leak" it. We are working on a proper fix but in the meantime, please use: See: Cheers, |
What Matevz says is completely correct. However, even after exhausting file descriptors, xrootd should neither crash nor give bad data. So, at least for the crash could we get a traceback or (hoping for the impossible) access to the core file.
From: Matevž Tadel
Sent: Tuesday, May 07, 2019 10:29 AM
To: xrootd/xrootd
Cc: Subscribed
Subject: Re: [xrootd/xrootd] Rising number of open files in xcache (#975)
Hi Nikolai,
Actually we just figured this one out with Andy ... this due to relatively short default timeout (180s) given to the cache to close a file after a client disconnect. When a system is busy, cache can not finish writing to the file in time and then xrootd layer decides to "leak" it. We are working on a proper fix but in the meantime, please use:
pss.ciosync 60 900
to increase the time given to the cache to close the file. The above line means try every 60s for a total of 900s -- this works for us at UCSD where we also had this problem.
See:
http://xrootd.org/doc/dev49/pss_config.htm#_Toc525070685
Cheers,
Matevz
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi again, I still don't have any further information on the crashes unfortunately as we currently have the production queue that processed files via xcache taken offline.
Cheers, |
Hi Nikolai,
Just for completeness, which version are you running?
Andy
…On Fri, 21 Jun 2019, Nikolai Hartmann wrote:
Hi again,
I still don't have any further information on the crashes unfortunately as we currently have the production queue that processed files via xcache taken offline.
However, i can provide 2 new pieces of feedback:
- The `ciosync` workaround seems to partially solve the problem. However, not 100% - sometimes we still observed that the number of files started again to rise steadily as seen in this plot [plot_openfiles_05-11_05_18.pdf](https://github.com/xrootd/xrootd/files/3314664/plot_openfiles_05-11_05_18.pdf)
- This might be a different issue, but i suspect it happens as well when the file limit is reached: We see some files that end up corrupted in the cache (wrong checksum). These files are marked as "complete" in the `.cinfo` files and when downloading them via the xcache server, the client receives the corrupted file. Out of 200k "complete" files in our cache we saw 91 such cases. From a quick check of 2 of these files they seem to have these things in common:
- number of bytes is correct
- the `.cinfo` file contains a certain number of `bytesMissed`, but not matching the size of the empty block
- the wrong checksum originates from missing parts in the file (blocks of 1 MiB size filled with zeros)
Cheers,
Nikolai
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#975 (comment)
|
This is still 4.9.1, precisely the versions |
Hi Nikolai,
It would be good if we ciuld find out which file descriptors are being
accumulated. So, when the server is running after a while, could you do an
lsof on it (it's 'lsof -p <pid>' and could up via grep how many TCP
connectins are open and how many files are open (what to grep for will be
obvious from he output). Do this on a periodic basis to see which ones
aremonitonically increasing.
Andy
…On Sun, 23 Jun 2019, Nikolai Hartmann wrote:
This is still 4.9.1, precisely the versions `xrootd-4.9.1-0.rc3.el7.x86_64` since April 1 and `xrootd-4.9.1-1.el7.x86_64` since May 9.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#975 (comment)
|
Hi Andy, That is essentially what the plots show. What's called "sockets" is when i grep for "TCP" and whats called "files" is when i grep for the folder where the xcache data (and also log, spool) is stored. Cheers, |
Hi Nikolai, I don't think the fd growth and file contents errors are related. About the file errors, people doing ATLAS tests in the US saw exactly the same symptom in about the same time-frame - certain blocks of a file being "all zeros". It was traced down to some network/storage/proxy SNAFU at ATLAS SW T2. Is it possible you pulled the files from that site, too? Nevertheless, we are working on providing a way to detect such errors in the caching proxy. Mind you, you would see the same problem were you to transfer the file using xrdcp ... so we really need either a protocol level checks or a way for caching proxy to retrieve checksums from some service providing them. About the rising number of fds, this will only happen when caching proxy is seriously overloaded, i.e., it is not able to write data to disk and those writes are also competing with reads for data that is already on the disk. Can you please describe your setup (xrootd config, disk configuration being used) and expected number of jobs and their read rates? Also, can you please show machine load and network in/out plots for the same time interval, say, 17.5. to 19.5. [ Of course, there is also a possibility that something else is going wrong, that's why Andy was asking about details of what kind of fds are leaked ... however, the ratio of files to sockets of 2 : 1 is indicative of the fd leak related to ciosync (2 files (data + cinfo) and 1 socket to the remote). ] The bytes_missed simply means that XCache has its write queue full and so served that many bytes to local clients by directly forwarding the request to the remote, without trying to write it to disk. Cheers, |
One more thing ... even when Andy and I rework the file-close protocol so that it eventually closes out all the fds, the situation from the overload perspective won't change much, that is, the caching proxy will still struggle to write out the last remaining blocks and then close the file. So, if you are consistently hitting this issue it means you either need a beefier machine (correctly configured for allowing O(1000) simultaneous read/write streams (raid 5/6 or zfs setups are known to choke with this rather badly)) or a caching cluster. There is another solution for this case, immediate local client redirect to the origin ... and this is almost ready to go. But it won't work if local clients can not connect to WAN. |
Hi Nikolai,
Ah, I was looking for a literal graph :-( Anyway, the graph is
fascinating. Is horizontal access is by date? The big questionis
what happened near midnight May 12th. Could you send me the log file (if
you still have it) for the two days in question? Please don't post it.
If you don't have one from the graph, could you send one with a
corresponding graph for another period? I need to understand what kind of
load triggered this. As Matevz says, it may simply be that once the server
is totally overloaded, things go downhill. In that case, we need to
recognize the situation much earlier when we have more time to recover.
Andy
…On Mon, 24 Jun 2019, Nikolai Hartmann wrote:
Hi Andy,
That is essentially what the plots show. What's called "sockets" is when i grep for "TCP" and whats called "files" is when i grep for the folder where the xcache data (and also log, spool) is stored.
Cheers,
Nikolai
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#975 (comment)
|
Unfortunately i don't have the log file anymore. And currently we don't have such a high load running on the server that this issue occurs. When we see it again i'll save the log. I just remember that at that time i didn't find anything special happen at that time. Also the load wasn't higher than in the period before. |
Hi Nikolai,
Usually, there are subtle "hints" in the log that indicate things are
going south. Additional file descriptors wouldn be allocated unless
client's ask for them.
Andy
…On Wed, 26 Jun 2019, Nikolai Hartmann wrote:
Unfortunately i don't have the log file anymore. And currently we don't have such a high load running on the server that this issue occurs. When we see it again i'll save the log. I just remember that at that time i didn't find anything special happen at that time. Also the load wasn't higher than in the period before.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#975 (comment)
|
Hi Nikolai, Would you mind sharing config/hardware details of your cache setup? We can also take it offline, if you prefer. Matevz |
The setup is described on slide 3 here (xrootd version is 4.9.1 now): |
RAID-6 is probably killing you under heavy multi-file load. This is the case for both hardware and software raids. XRootd/XCache can work with a set of independent disks and this scales much better, see page 12 of this: |
Hi, Guenter |
Hi Nikolai, Guenter, Did you try: Cheers, |
Hi Matevz, We have not tried that yet. But we continued running jobs via the xcache server, so we can proceed to investigate the problems when they occur again.
Cheers, |
Hi Nikolai, Maybe a version/build mismatch? Installing xrootd-rebuginfo rpm directly always worked from me (but i do my own builds). You could also try installing them with debuginfo-install (in yum-utils) ... this should take care of dependencies, too, IIRC. Can you please try with 4.10 that was just released? The fd leak under overload is still there (so don't reduce the timeouts just yet) but there were some other fixes in xrootd client that address some of the crashes seen in the wild. I didn't realize you were running in a container. Do you guys use host networking? Also, you should try to use separate disks, not raid ... you are leaving 10x performance on the floor there. Cheers, |
Hi Matevz, cheers, |
Hi Guenter, You'll probably want to put metadata (where cinfo files are stored) and root-fs (basically sym-links into data disks) on a non-data disk. After data disk failure, you replace the disk (or comment it out in the xrootd.cfg) and restart. XRootd will refuse to start if a configured target directory for oss.space does not exist. If you lose disk with meta-data, you have to clear the cache. You can remove the stale links after data disk replacement but actually don't have to ... each lfn will get cleared when its time comes (when it would be purged or when an open is attempted). Thinking about this, I could add full data-space scan during the startup purge -- normally purge only scans meta-data file to determine "age" of a file. Cheers, |
Hi Nikolai,
On small question. Does the OS used to build the container match the OS
being used to run the container? We've had other sites rtrying to run wih
a mismatch with pretty bad results.
Andy
…On Tue, 16 Jul 2019, Matev? Tadel wrote:
Hi Nikolai,
Maybe a version/build mismatch? Installing xrootd-rebuginfo rpm directly always worked from me (but i do my own builds). You could also try installing them with debuginfo-install (in yum-utils) ... this should take care of dependencies, too, IIRC.
Can you please try with 4.10 that was just released? The fd leak under overload is still there (so don't reduce the timeouts just yet) but there were some other fixes in xrootd client that address some of the crashes seen in the wild.
I didn't realize you were running in a container. Do you guys use host networking?
Also, you should try to use separate disks, not raid ... you are leaving 10x performance on the floor there.
Cheers,
Matevz
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#975 (comment)
|
Hi Andy, The containers where build on the same server where we also run them. Cheers, |
I am closing this as here has been no activity and no reports from other sites of similar issues. Please reopen if this is still a roblem. |
Hi,
We are noticing a steady increase in the number of open files on our xcache server in Munich, in particular when the server is under heavy load:
plot_openfiles_xcache_20190506_1719.pdf
It seems the files are not closed anymore, even if the connections are and the files are fully downloaded as suggested by looking at the network traffic monitoring for the same time periods:
plot_openfiles_xcache_20190506_1719_ganglia_overlay.pdf
We are running with prefetch mode enabled with xrootd 4.9.1 (0.rc3.el7).
When the number of open files hits the limit (in our case ~16k), clients start receiving empty or corrupt files (checksum errors). Sometimes the xrootd process also crashes then with SEGV. In most cases the files can be received correctly after a restart of the xrootd process.
Cheers,
Nikolai
The text was updated successfully, but these errors were encountered: