Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warp exits after accepting too many simultaneous connections on Linux #825

Open
cuklev opened this issue Nov 24, 2020 · 8 comments
Open

Comments

@cuklev
Copy link

cuklev commented Nov 24, 2020

Might be related to #603

Here is a sample code:

{-# LANGUAGE OverloadedStrings #-}
module Main (main) where

import Control.Concurrent
import Data.IORef
import Network.HTTP.Types
import Network.Wai
import Network.Wai.Handler.Warp

main :: IO ()
main = do
  counter <- newIORef (0 :: Int) -- just for keeping count of accepted connections
  run 3003 $ \_ respond -> do
    print =<< atomicModifyIORef' counter (\x -> (x+1, x+1))
    threadDelay 3000000 -- simulate something that is slow to process
    print =<< atomicModifyIORef' counter (\x -> (x-1, x-1))
    respond $ responseLBS status200 [] "works\n"

When I run something like while :; do curl -s http://localhost:3003 > /dev/null & done the Haskell program receives
Network.Socket.accept: resource exhausted (Too many open files) and then exits successfully after all connections close.
It always happens after printing 1011 for me. This is because each accepted connection is a new open file and there is a limit to open files per process.
On my system this limit seems to be 1024 (can be seen or changed with ulimit -Sn).

I am not sure how this thing should be solved.
Should warp not accept connections when there are too many that have been opened?
Should accepting be allowed to fail and retry after that?
Should the server respond with something like 429 Too Many Requests?

@snoyberg
Copy link
Member

The server can't respond with a 429 in that case, since it cannot accept the new connection at all. I'd strongly advise bumping the FD limit, 1024 is far too low for a busy server.

@cuklev
Copy link
Author

cuklev commented Nov 27, 2020

Well, it is not necessarily a busy server. It could be just someone trying to abuse it.
In my case, I was surprised that my server process exited.
I feel like bumping the FD limit is only a temporary solution.

@swamp-agr
Copy link

@cuklev Could you please provide client-side code you're invoking?

@cuklev
Copy link
Author

cuklev commented Feb 8, 2021

while :; do curl -s http://localhost:3003 > /dev/null & done in bash.

@swamp-agr
Copy link

Seems that you're running out of sockets/FDs. ulimit -Sn will show current value of FDs.

Application cannot allocate more than ulimit -Sn sockets and simple refuses to respond since you're forcing it to wait for 3 seconds for every single query. Warp throws an error, since it could not allocate more.

I do not know what is the best strategy for the socket exhaustion fault tolerance here. Maybe add allocation counter, threshold and/or queue and to change its strategy when threshold is reached to schedule responses into the queue and process them separately.

As of now, you could go ahead and set soft/hard limits per user/application on system level based on expected/predicted RPS from clients/proxy.

@cuklev
Copy link
Author

cuklev commented Feb 9, 2021

Yes, increasing the FDs limit will improve the situation but it will not solve it. Warp should definetely catch that error and not exit.
I tested the same setup but with nginx in the middle, using proxy_pass to the Haskell server. In that case, my application never crashes. Nginx responds with 500 for half of the requests.

@swamp-agr
Copy link

swamp-agr commented Feb 9, 2021

curl

Consider curl case for simplicity.

  • So, application is listening port 3003 (1 FD).
  • It is trying to accept incoming connections from a lot of curls.
  • According to curl defaults, each "client" will wait for accept from server up to 60 seconds and for connect up to 300 seconds.
  • And if both events happened it will wait indefinitely for response from application.
  • E.g. curl/application will not close socket until response will be send from application and delivered to curl.
  • All 1023 available sockets/FDs will be exhausted soon.
  • In this case, application will throw something like Network.Socket.accept: resource exhausted (No file descriptors available).

According to current warp implementation, there should be appropriate design fix for leaking connections in case of accepting them. I am currently investigating leaking side of the story.

Let's return to the nginx.

NGINX

With nginx there are a lot of variables that should be taken into account:

  • nginx soft/hard limits;
  • workers parameters;
  • different timeout parameters;
  • nginx server parameters;
  • (multiple) nginx site configuration(s);
  • application soft/hard limits.
  • sysctl TCP/IP/socket parameters.

NGINX + Warp + /etc/sysctl.conf should be configured extremely careful, there should be no contradictions for all possible combinations of parameters mentioned above.

E.g. decreasing proxy_read_timeout and proxy_send_timeout on NGINX side could fix warp availability in particular use case.
Another example is to remove keepalive from your upstream configuration. It could also help in different use case.

@Vlix
Copy link
Contributor

Vlix commented Jul 25, 2022

I think it should be possible to not let the application crash, and just print to stdout/stderr that no file descriptors were available, and just continue with the loop?

The Network.Socket error is just an IOError with OtherError and a string, so it should be easy, although pretty frail, so let's hope Network.Socket doesn't change it's exception's syntax 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants