Warp exits after accepting too many simultaneous connections on Linux #825

cuklev · 2020-11-24T16:48:14Z

Might be related to #603

Here is a sample code:

{-# LANGUAGE OverloadedStrings #-}
module Main (main) where

import Control.Concurrent
import Data.IORef
import Network.HTTP.Types
import Network.Wai
import Network.Wai.Handler.Warp

main :: IO ()
main = do
  counter <- newIORef (0 :: Int) -- just for keeping count of accepted connections
  run 3003 $ \_ respond -> do
    print =<< atomicModifyIORef' counter (\x -> (x+1, x+1))
    threadDelay 3000000 -- simulate something that is slow to process
    print =<< atomicModifyIORef' counter (\x -> (x-1, x-1))
    respond $ responseLBS status200 [] "works\n"

When I run something like while :; do curl -s http://localhost:3003 > /dev/null & done the Haskell program receives
Network.Socket.accept: resource exhausted (Too many open files) and then exits successfully after all connections close.
It always happens after printing 1011 for me. This is because each accepted connection is a new open file and there is a limit to open files per process.
On my system this limit seems to be 1024 (can be seen or changed with ulimit -Sn).

I am not sure how this thing should be solved.
Should warp not accept connections when there are too many that have been opened?
Should accepting be allowed to fail and retry after that?
Should the server respond with something like 429 Too Many Requests?

The text was updated successfully, but these errors were encountered:

snoyberg · 2020-11-25T06:29:19Z

The server can't respond with a 429 in that case, since it cannot accept the new connection at all. I'd strongly advise bumping the FD limit, 1024 is far too low for a busy server.

cuklev · 2020-11-27T16:52:54Z

Well, it is not necessarily a busy server. It could be just someone trying to abuse it.
In my case, I was surprised that my server process exited.
I feel like bumping the FD limit is only a temporary solution.

swamp-agr · 2021-02-05T19:27:59Z

@cuklev Could you please provide client-side code you're invoking?

cuklev · 2021-02-08T17:25:03Z

while :; do curl -s http://localhost:3003 > /dev/null & done in bash.

swamp-agr · 2021-02-08T17:45:03Z

Seems that you're running out of sockets/FDs. ulimit -Sn will show current value of FDs.

Application cannot allocate more than ulimit -Sn sockets and simple refuses to respond since you're forcing it to wait for 3 seconds for every single query. Warp throws an error, since it could not allocate more.

I do not know what is the best strategy for the socket exhaustion fault tolerance here. Maybe add allocation counter, threshold and/or queue and to change its strategy when threshold is reached to schedule responses into the queue and process them separately.

As of now, you could go ahead and set soft/hard limits per user/application on system level based on expected/predicted RPS from clients/proxy.

cuklev · 2021-02-09T14:05:22Z

Yes, increasing the FDs limit will improve the situation but it will not solve it. Warp should definetely catch that error and not exit.
I tested the same setup but with nginx in the middle, using proxy_pass to the Haskell server. In that case, my application never crashes. Nginx responds with 500 for half of the requests.

swamp-agr · 2021-02-09T15:02:48Z

curl

Consider curl case for simplicity.

So, application is listening port 3003 (1 FD).
It is trying to accept incoming connections from a lot of curls.
According to curl defaults, each "client" will wait for accept from server up to 60 seconds and for connect up to 300 seconds.
And if both events happened it will wait indefinitely for response from application.
E.g. curl/application will not close socket until response will be send from application and delivered to curl.
All 1023 available sockets/FDs will be exhausted soon.
In this case, application will throw something like Network.Socket.accept: resource exhausted (No file descriptors available).

According to current warp implementation, there should be appropriate design fix for leaking connections in case of accepting them. I am currently investigating leaking side of the story.

Let's return to the nginx.

NGINX

With nginx there are a lot of variables that should be taken into account:

nginx soft/hard limits;
workers parameters;
different timeout parameters;
nginx server parameters;
(multiple) nginx site configuration(s);
application soft/hard limits.
sysctl TCP/IP/socket parameters.

NGINX + Warp + /etc/sysctl.conf should be configured extremely careful, there should be no contradictions for all possible combinations of parameters mentioned above.

E.g. decreasing proxy_read_timeout and proxy_send_timeout on NGINX side could fix warp availability in particular use case.
Another example is to remove keepalive from your upstream configuration. It could also help in different use case.

Vlix · 2022-07-25T23:15:44Z

I think it should be possible to not let the application crash, and just print to stdout/stderr that no file descriptors were available, and just continue with the loop?

The Network.Socket error is just an IOError with OtherError and a string, so it should be easy, although pretty frail, so let's hope Network.Socket doesn't change it's exception's syntax 🙃

steve-chavez mentioned this issue Nov 24, 2021

Network.Socket.accept: resource exhausted (Too many open files) PostgREST/postgrest#2042

Closed

bladyjoker mentioned this issue Dec 20, 2023

GRPC file handles not closing mlabs-haskell/cardano-open-oracle-protocol#86

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warp exits after accepting too many simultaneous connections on Linux #825

Warp exits after accepting too many simultaneous connections on Linux #825

cuklev commented Nov 24, 2020

snoyberg commented Nov 25, 2020

cuklev commented Nov 27, 2020

swamp-agr commented Feb 5, 2021

cuklev commented Feb 8, 2021

swamp-agr commented Feb 8, 2021

cuklev commented Feb 9, 2021

swamp-agr commented Feb 9, 2021 •

edited

Loading

Vlix commented Jul 25, 2022

Warp exits after accepting too many simultaneous connections on Linux #825

Warp exits after accepting too many simultaneous connections on Linux #825

Comments

cuklev commented Nov 24, 2020

snoyberg commented Nov 25, 2020

cuklev commented Nov 27, 2020

swamp-agr commented Feb 5, 2021

cuklev commented Feb 8, 2021

swamp-agr commented Feb 8, 2021

cuklev commented Feb 9, 2021

swamp-agr commented Feb 9, 2021 • edited Loading

curl

NGINX

Vlix commented Jul 25, 2022

swamp-agr commented Feb 9, 2021 •

edited

Loading