-
-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce number of Puma processes and threads #630
Conversation
This looks good to me. It seems like we can keep a higher number of threads while still comfortably fitting in a dyno, although if you'd rather have suspenders be lighter by default I'm not against that. 👍 |
I'm not sure that it is explained in the code comment, so maybe my message or comment isn't clear. Increasing processes is very good for concurrency but very bad for memory usage. Increasing threads helps somewhat with concurrency but isn't as bad for memory usage. This is because processes don't efficiently share memory (even though Ruby 2.1+ purportedly has copy-on-write), and threads (in Ruby) don't efficiently share CPU, because of the global interpreter lock. I think 2x2 is a good combination, because it means that:
More than two threads on those processes increases the risk of simultaneously serving multiple bloated requests. If the worst request bloats a process by 30MB, the worst-case scenario for a dyno in a Suspenders app will bloat past the 512MB limit. Two threads feels like a safe maximum while still providing at least some safety against IO-bound requests. Does that make more sense? |
It does, thank you! The code comment did explain the relationship between workers and threads, but I appreciate the longer answer. Is this PR and #627 all we have in mind for now to improve memory usage for suspenders? If so, when we merge them I'll do a release. |
@tute I'll take a quick look today to see if there's anything else we can do to slim it down and get closer to Rails bootup size. Most of the dependencies we add are development/test dependencies, so it seems like Suspenders size should be pretty close. |
@tute I don't think there are any other major quick wins to remove from our The startup for a vanilla Rails app using Postgres is ~78M for the master process and ~72M for each cold worker process. The startup for Suspenders is ~90M for the master process and ~83M for each worker process. If we removed all the gems Suspenders uses that vanilla Rails doesn't, we'd gain around 35M of extra ceiling for each dyno. This isn't significant, since Ralis apps under load will easily bloat more than that already. If we were going to remove something, the biggest win would be to remove NewRelic, which would reduce our usage to 84M in the master and 77M for workers, saving 18M for each dyno. For things that not every app uses, changing it so that ActionMailer and the |
I don't totally understand the relationship between the number of threads and the database pool, but should the default production pool size be adjusted to match these changes? https://github.com/thoughtbot/suspenders/blob/master/templates/postgresql_database.yml.erb#L18 Otherwise, this looks great; thanks for diving deep on it! 👍 |
@bernerdschaefer The Heroku guide to concurrency and database connections recommends using a pool equal to the number of threads:
This is the
Ours is similar, but not quite the same:
I'm not sure why it's a little different, but it seems like we'll have at least one connection for each thread on the Puma server, so I think we're okay; we might be using more database connections for each process than we technically need to. It looks like @calebthompson introduced this line in the commit that introduced Puma. It probably makes sense to use the Heroku-recommended settings for Any thoughts? |
I think tackling the questions about database pool size indeed make sense in a separate PR. |
Using a simple Suspenders application, I profiled memory usage for our default configuration, as well as a few others. I found the following: * A Puma cluster uses a master process and multiple worker processes. The amount of memory used by a cluster is equal to the memory usage of the master process plus the possible bloated size of a worker process times the number of worker processes. * At boot, a simple Suspenders application uses about 117MB for the master process and 109M for each worker. * After the first request is served, a process increases to around 117M, like the master process. * The amount of potential bloat increases with each thread, because it's possible for every thread to be handling a bloated request at once. * Using [siege], I determined that the expected bloat in a simple scenario is around 10M per thread. This will be much worse in some applications. This provides the following formula for maximum memory usage under load: master_usage + worker_count * (worker_usage + bloat * thread_count) For this simple Suspenders application, this formula provides the following worst-case usage: 117 + 3 * (117 + 10 * 5) = 618 This is over the 512MB limit for a 1x Heroku dyno, and the application is very simple. I recommend changing to a default of two worker processes and two threads per dyno, changing the usage to: 117 + 2 * (117 + 10 * 2) = 391 This provides reasonable performance with a high memory ceiling. When applications begin to show troublesome performance characteristics under load, developers can tune the application's process and thread count according to its real-world memory usage, possibly upgrading the dyno size as appropriate. [siege]: https://www.joedog.org/siege-home/
0c2c4f8
to
bfd75f9
Compare
Using a simple Suspenders application, I profiled memory usage for our
default configuration, as well as a few others. I found the following:
The amount of memory used by a cluster is equal to the memory usage of
the master process plus the possible bloated size of a worker process
times the number of worker processes.
master process and 109M for each worker.
like the master process.
possible for every thread to be handling a bloated request at once.
scenario is around 10M per thread. This will be much worse in some
applications.
This provides the following formula for maximum memory usage under load:
For this simple Suspenders application, this formula provides the
following worst-case usage:
This is over the 512MB limit for a 1x Heroku dyno, and the application
is very simple.
I recommend changing to a default of two worker processes and two
threads per dyno, changing the usage to:
This provides reasonable performance with a high memory ceiling. When
applications begin to show troublesome performance characteristics under
load, developers can tune the application's process and thread count
according to its real-world memory usage, possibly upgrading the dyno
size as appropriate.