high memory utilization in RequestSemaphoreFilter #241

Closed
azenkov opened this Issue Feb 10, 2014 · 4 comments

Comments

3 participants
Contributor

azenkov commented Feb 10, 2014

I was profiling finagle server to find a a reason for OOM exception we saw on or production servers. I was able to narrow it down to RequestSemaphoreFilter. We are using .maxConcurrentRequests(4) on our ServerBuilder.
Looks like the reason for a high memory consumption is that RequestSemaphoreFilter holds references to request object for way too long. Our requests can get very big in hundreds of MB and when RequestSemaphoreFilter kepps holding hundreds of request objects in its waitq ArrayDeque server crushes with OOM.
Is this expected behavior or a bug?

we saw a big jump in memory utilization. After profil

Contributor

evnm commented Feb 10, 2014

Am I correct in assuming that you're capping your service at four concurrent requests because of the large request size? If your servers can take it, this problem may be alleviated by increasing your request concurrency.

Barring that, you could try composing your own RequestSemaphoreFilter around your service rather than relying on the one baked into com.twitter.finagle.server.DefaultServer. By default DefaultServer creates a semaphore with no maximum number of waiters, which I suspect is leading to your blowout of pending requests.

If ServerBuilder.maxConcurrentRequests isn't set, then a RequestSemaphoreFilter isn't used. Instead, try building a filter with a realistic number set for the maxWaiters argument to AsyncSemaphore. This should stop requests from backing up, but note that it will result in RejectedExecutionExceptions being returned by the service in cases of excess request buildup.

Contributor

azenkov commented Feb 11, 2014

Thank you, Evan!
Looks like the problem I have sits a little deeper than RequestSemaphoreFilter. Even when I limit the number of connections like this

ServerBuilder.openConnectionsThresholds(new OpenConnectionsThresholds(4, 10, Duration.apply(100, TimeUnit.MILLISECONDS)))

I still receive the OOM during stress tests. I kept profiling and it looks like IdleConnectionFilter is only applied after the socket is accepted and request is read from the network. In my case a server received thousands of big requests and is unable to provide backpressure to clients since all of those connections are accepted and read by netty/finagle into RAM which is crashing the server. I was wondering if there is a way out of such situations when I need to prevent requests from being read from the network.

Contributor

mosesn commented Feb 11, 2014

More discussion on the finaglers list serv.

Contributor

mosesn commented Feb 4, 2015

We still don't have a great solution for this, but this ticket is mostly a red herring. I'm going to close it for now. Thanks for the great discussion, @azenkov!

@mosesn mosesn closed this Feb 4, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment