Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close file handle for simple HTTP reads. #664

Merged
merged 1 commit into from
Mar 16, 2018

Conversation

bbockelm
Copy link
Contributor

@bbockelm bbockelm commented Mar 5, 2018

If a client requested a single byte range which was less than the filesize, then the file handle was never explicitly closed.

When the TCP connection was closed, all open file handles on the bridge are automatically closed too. Hence, it seems like this is a moot point - a transient "leak". However, we have observed some clients (particularly: CVMFS) that hold open TCP connections for hours or days, resulting in the "leak" of a file handle per read operation. In some cases, the number of file handles leaked was in the tens-of-thousands.

In addition to the obvious memory issues, this was observable in the f-stream monitoring as the near-simultaneous closing of 10k file handles caused UDP buffer overflows and dropped monitoring packets.

If a client requested a single byte range which was less than
the filesize, then the file handle was never explicitly closed.

When the TCP connection was closed, all open file handles on the
bridge are automatically closed too.  Hence, it seems like this is
a moot point - a transient "leak".  However, we have observed some
clients (particularly: CVMFS) that hold open TCP connections for
hours or days, resulting in the "leak" of a file handle per read
operation.  In some cases, the number of file handles leaked was in
the tens-of-thousands.

In addition to the obvious memory issues, this was observable in the
f-stream monitoring as the near-simultaneous closing of 10k file handles
caused UDP buffer overflows and dropped monitoring packets.
@abh3
Copy link
Member

abh3 commented Mar 5, 2018

I have a question on this patch. My understanding is thet the original intent was to keep the file open as long as possible so that subsequent opens would avoid going through the full open path (i.e. the underlying xrootd reuses open file descriptors). Will this ptach violate that intent? If so, then the solution may be even worse for the common case where each HTTP get requires a full file open operations which is very expensive.

@bbockelm
Copy link
Contributor Author

bbockelm commented Mar 5, 2018

So, there's three distinct cases:

  1. File is read in total. In this case, it does open-read*-close.
  2. Read request against a part of the file. In this case, it does open-read*. This patch adds a close afterward.
  3. Vector read request. In this case, it does open-read*-close.

So, in the end, we're making things behave consistently (in addition to again being able to monitor via the f-stream).

I'm not opposed to doing another layer of file descriptors, but probably should fix this first.

@bbockelm
Copy link
Contributor Author

@abh3 @ffurano - any update on this? We'd really like to see this before 4.8.2 so we don't have to carry an out-of-tree patch.

It's been deployed on almost all our XCache servers in StashCache. Monitoring is now looking fairly great.

@ffurano ffurano merged commit f486090 into xrootd:master Mar 16, 2018
@ffurano
Copy link
Contributor

ffurano commented Mar 16, 2018

Hi, what you say makes a lot of sense to me, as well as the fixes. I find important the fact that you tested it in production. Thank you for sharing

@ffurano
Copy link
Contributor

ffurano commented Mar 16, 2018

Thinking again at it I'd say that it's correct that the server defends itself against crappy clients. The definitive solution to this issue IMO is a timeout

simonmichal pushed a commit that referenced this pull request Mar 20, 2018
Close file handle for simple HTTP reads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants