-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLOSE_WAIT, too many open files #277
Comments
after easy .perform if you run easy.close do you still see the open file issue? this should run with GC but if you're not having any GC runs then it would not close those connections... perhaps? |
I do have GC, I even do run it manually at some points. Will try |
I think i just ran into this issue as well. (Using Curl::Easy) Initially just running my script would immediately start piling up sockets in the ESTAB and CLOSE-WAIT state (as reported by the 'ss' utility in my ubuntu box) After adding an explicit call to 'curl.close' after being done with my curl object I got my script to run in constant file descriptor space. But this just lasted a few minutes, then it started piling up sockets in the CLOSE-WAIT state again. Finally, I added a 'GC.start' call after calling 'close' on my curl object and it seems to have fixed it for good. It was an ugly fix though, I just want to keep using curl/curb in my library but I'm afraid switching may be easier for me than fixing this all by myself (I have never ever read curl code, or native ruby extensions for the matter). I can certainly help diagnosing this, I know I can reliably reproduce the issue by just removing those fixes. Cheers! |
Well, it's good to hear that close + GC.start works around the issue. It makes sense, because the socket is only closed in a GC cycle. I think there is also a keepalive setting in curl we can set to resolve this, which would be less "ugly" to your point. Doing a little google search however, it looks like the issue is related to this bug report: http://sourceforge.net/p/curl/bugs/1010/ |
I use Curl::Easy for large amounts of traffic, and am also getting a pile-up sockets in the CLOSE_WAIT state. Solving this by closing, and opening a new, connection after every |
It looks like this might related to a bad version of libcurl? Did you check the bug report or also see this: https://curl.haxx.se/mail/lib-2009-03/0009.html |
Thanks for the tip @taf2 ! I'll look into that! |
I'm running into this as well since we basically upgraded to Ruby 2.3.0 and curb from 0.8.5 to 0.9.1. We use Ubuntu LTS 14.04 with libcurl 7.35 (latest Ubuntu 14.04 packaged version). Is the suggested solution to compile libcurl manually on latest Ubuntu LTS? Or is it possible a Ruby downgrade would help (because probably Ruby 2.3 has changed their GC behavior etc)? To be honest I'm a bit hesitant to install libcurl manually instead of using the default Ubuntu apt-repos. ;) |
BTW: the issue described here https://curl.haxx.se/mail/lib-2009-03/0009.html should have been fixed in libcurl 7.20 at least? Hence I guess this is not the problem here? |
@stayhero Yeah, I'm having this problem on Ruby v2.2.3p173 and I'm sure our libcurl is patched. |
Could this be a problem with ruby garbage collection, rather than curb/curl? |
Ruby 2.2.1p85 when this happened. On Mon, Mar 14, 2016 at 4:24 AM, Christian notifications@github.com wrote:
|
We've solved this problem in our codebase. We used to just grab a new Curb::Easy instance for each request, resulting in the aforementioned issue. Our workaround was to always reuse and reset the Curl::Easy instance, something along these lines: def self.get(url = nil, params = nil, &block)
url, params = url_and_params_from_args(url, params, &block)
return with_curl do |easy|
easy.url = url_with_params(url, params)
easy.http_get
end
end
def self.with_curl(&block)
easy = Thread.current[:pebblebed_curb_easy] ||= Curl::Easy.new
easy.reset
yield easy
return handle_http_errors(Response.new(easy))
end
def self.handle_http_errors(response)
if response.status == 404
errmsg = "Resource not found: '#{response.url}'"
errmsg << extract_error_summary(response.body)
raise HttpNotFoundError.new(ActiveSupport::SafeBuffer.new(errmsg), response.status)
elsif response.status >= 400
errmsg = "Service request to '#{response.url}' failed (#{response.status}):"
errmsg << extract_error_summary(response.body)
raise HttpError.new(ActiveSupport::SafeBuffer.new(errmsg), response.status, response)
end
response
end |
@thomax you should look at using the following methods as they internally use Thread.current to reuse a curl handle: Curl.get |
@taf2 Thanks for the suggestion, I'll look into that! Would it be unexpected behaviour if |
Possibly Curl::Easy allows you to get a new libcurl easy handle. I'm not exactly sure what is causing your issue... Do you have sample code to recreate the issue? |
I also have this issue with code like this: def post_request(url, request_body)
request = Curl::Easy.new(url)
request.headers['Content-type'] = 'application/json'
request.certtype = 'PEM'
request.cert = "<certificate-string>"
request.cacert = "<root-cert-string>"
request.cert_key = "<private-key-string>"
request.http_post(request_body)
return request.body_str
end I tried @nubis suggestion with def post_request(url, request_body)
request = Curl::Easy.new(url)
request.headers['Content-type'] = 'application/json'
request.certtype = 'PEM'
request.cert = "<certificate-string>"
request.cacert = "<root-cert-string>"
request.cert_key = "<private-key-string>"
request.http_post(request_body)
response_body = request.body_str
request.close
GC.start
return response_body
end System:
|
+1 I also have this problem using Curb inside a thin service. Even if I use Curl.post/get AND explicitly close the connections AND run GC.start occasionally, that only helps reduce the problem, it seems to still occur eventually. Symptoms of failure are weird. Either a "Couldn't resolve host name (Curl::Err::HostResolutionError)" or sometimes it causes a Segmentation fault. |
@mikaelhm the GC.start is probably not necessary only request.close - to ensure curb tells libcurl to close the handle, which would eventually be closed by GC anyway. GC will after a call to close just immediately run and free up your existing ruby objects which might be good or might not. You can for example in unicorn put that GC.start outside the request which might be better... @levilansing mixing Curl.post/Curl.get with close, may not be what you want? Curl.post/Curl.get will put a single Curl::Easy handle into Thread.current - meaning the handle is shared/re-used between requests. The main benefit of using it is to share the existing connect. Thinking about this issue, I suspect we might need to do something internal to force the handles to be freed or I wonder if the handle's are piling up in multi handle. It might be interesting to get the value of handle.mult.idle? I've made a commit here: 2509064..e3126a8 That clears out the multi request handle early. I think this might be the solution, please let me know. |
@taf2 I tried with just I will try to remove it again and upgrade curb to 0.9.2 and see if your attempt fixed it. |
@mikaelhm I bet the GC.start is triggering the multi handle to close that could be the leak. If so 0.9.2 will fix it unless of course there are other leaks :( |
I will let you know. Thank for attempting to fix the leak |
After a days use, I feel confident that @taf2 fixed the memory/socket leak. 0.9.3 fixed my issues. |
0.9.3. solved our issues as well. |
We ran into this issue today as well with curb 0.9.4. The Environment:
Code to reproduce: 100.times { curl = Curl::Easy.new("http://google.com"); curl.get } After a few minutes: |
Hi. For the past two weeks, I've been chasing down growing memory on my Rails Puma server randomly where file descriptors are used with connections in CLOSE_WAIT from curb requests after upgrading from
I've bought myself some time with the Attempting to rollback to version 0.8.8 and see if the issue still manifests. |
@ta I was unable to reproduce any CLOSE_WAIT with version 0.9.4 on my vagrant box with almost the same environment as you:
I did observe, that running the code block 100 times creates about 100 ESTABLISHED connections that won't go away until GC is called regardless if you call So |
@pts-owentran The problem only surfaced in our environment after upgrading from ruby 2.2.x to 2.4.x (we skipped 2.3.x). |
Thanks @robuye ! |
Okay, I need to test this more but maybe adding:
|
argh we already do that... so never mind... also running the example test provided... and running netstat i'm not seeing any connection leaks... maybe i'm not testing this correclty? while running the test script:
|
What version of libcurl are you guys testing on? I ran my test on ruby-2.5.3 and libcurl 7.53.1 |
I didn't have time to sit on it since my last comment. Here's the configuration I was using:
This is from docker, and I also tested it locally in the following configuration:
Something to bear in mind, you need to recompile and reload irb when you checkout different commit for testing. @taf2 I mention I used an example from the readme, but I also added a loop there to run it 100 times. Not sure you noticed, but without it 2 FDs are expected. |
Yup, I increased the loop to 1000 and monitored FDs open and it never exceeded 2 for me. I used ruby-2.5.3 for testing... and libcurl 7.53.1 and was not able to leak any FDs... |
alright, let me check it out on ruby 2.5, I will be back shortly. |
for us the server where we see the leaks runs ruby 2.5.1 and libcurl 7.22.0 |
It's the same on @taf2 are you on osx perhaps? |
@robuye ec2 aws amazon linux - so a flavor of centos... |
I could test it only on ubuntu 16.04 on EC2 and I see leaked FDs there. That's on I will see if I can debug this bug further over the weekend. |
yeah, i must not be testing this correctly then... Here's a gist I created to help me run the loop and also monitor netstat... maybe i'm not filtering netstat correctly? Or maybe something about how I'm running it... https://gist.github.com/taf2/683ca2f9cec226de44c8f992b1ca5cc2 |
okay typical me! I was running 0.9.2 🤯 anyways the test script in the gist should be helpful then in fixing this. |
hah that happened too often to me. I noticed calling |
We can use this to add it as a regression test... should figure out how to get netstat for bsd stystems to work to for osx users. |
When modifying the test script to GC.start after m.perform, the connection counts stay in the 2 - 3 range... so when GC runs connections are cleaned up. The issue here is we can either wait and let GC do the connection close or we keep the connection around because maybe the end user will want to use the connection again... I'm thinking we might need an explicit option for keep the connection open so it can be re-used or close it right away... |
Yea, I was thinking something along the lines, but I don't understand why this has changed as of |
Restoring these lines from e3126a fixes the problem:
the FDs are going down to zero over ~ 30 seconds. This delay is interesting, it looks like a timeout is kicking in? |
Bear in mind it's not a real fix, because if it stops curb from reusing connections that's pretty serious regression. |
Actually looking at the description of this commit this seems to be exactly what you were looking for @taf2 - to close FDs without waiting for GC. Calling it at the end of The commit you added included more changes and perhaps this would prevent curb from reusing connections. The test code we were using creates 100 multi handles and each has 2 easy handles so it doesn't show if the connections are reused. Also high number of FDs is expected. I will try to come up with another test to see if we reuse connections or not. |
@robuye yeah, I think the case of re-use is when doing something simple like:
|
I think this would create separate multis each with own pool and they wouldn't be able to reuse connections. At least that's my understanding of libcurl. I was thinking we'd need multiple requests in a single multi and to the same host (ideally HTTPS so we pay massive overhead here). And those should be slow to respond so the pool fills up and reusing begins. |
I think I have figured it out and I also have some food for thoughts. So libcurl will try to hold a connection open for as long as possible and it will reuse cached connections if possible too. See this thread from libcurl mailing list, it goes into details about this behaviour and Daniel explains it much better than I could. In case of Ruby we will keep the FDs open until multi handle is GCed. GC calls Note that once The code I used to test this behaviour is significantly longer so I will just link a gist here. It demonstrates how connections are reused and shows interesting results so give it a shot. I have enabled I pointed curb to my local server to minimize any network latency and get better insight into connections. Here's the script output:
And here's what happened according to server:
All 10 requests made by curb, but only 1 reused a connection. When I removed
This is somewhat irrelevant to the FDs leak we discuss here, but I wanted to make sure we're not introducing another regression. Regarding the FDs leak I think the best fix is to expose URL = 'http://google.com'
multi = Curl::Multi.new
multi.add(Curl::Easy.new(URL))
multi.perform
# connections & FDs remain open and will be reused by next handle
multi.add(Curl::Easy.new(URL))
multi.perform
multi.close
# all connections & FDs are closed
multi.add(Curl::Easy.new(URL))
# => Curl::Err::MultiBadHandle: CURLError: Invalid multi handle
multi.perform
# => Curl::Err::MultiBadHandle: CURLError: Invalid multi handle Each multi manages it's own pool of connections and it won't reuse FDs from another multi so leaving sockets hanging until GC runs feels quite wasteful. We could provide an option to autoclose a handle when I think it should be fairly simple and safe fix, so let me know if it's feasible. |
@robuye great analysis, I like the idea of a I agree we need some regression tests for connection keep-alive and connection leaking. |
Thanks! Your implementation is better than what I came up with. Cool idea with re-initializing I ran my gist on your branch and it seems to be working exactly as expected. Nice work 👍 |
Great work, Rob and Todd. Thank you both so much for investing the time to
hunt this down. Happy to test whenever the fix is in. I was looking at the
behavior from a year back, but still odd I couldn't make it reproduce the
issue at will. It must have been some GC timing issue and the way my app
uses the Parallel gem to create these connections in threads.
|
@pts-owentran thanks, we've pushed 0.9.8 to rubygems. If you have chance to run this let me know, I think with the updates to include a close method and autoclose we're actually close to a 1.x release 😄 |
This code:
leaks connections.
After some time I am getting errors from sidekiq:
Error fetching job: Too many open files - getaddrinfo
.$ netstat -an | awk '/tcp/ {print $6}' | sort | uniq -c:
$ lsof -p 6245 | grep CLOSE_WAIT | wc -l
A part of
lsof
output:The text was updated successfully, but these errors were encountered: