Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database Pool Exhaustion #862

Closed
rickychilcott opened this issue May 15, 2020 · 19 comments
Closed

Database Pool Exhaustion #862

rickychilcott opened this issue May 15, 2020 · 19 comments

Comments

@rickychilcott
Copy link
Contributor

I'm intermittently experiencing ActionView::Template::Error: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use errors.

I'm fairly certain I've tracked it down to Thread creation/execution in

Thread.start do
# `ActionView::PartialRenderer` mutates the contents of `opts[:locals]`, `opts[:locals][:as]` in particular:
# https://github.com/rails/rails/blob/v6.0.2.1/actionview/lib/action_view/renderer/partial_renderer.rb#L379
# https://github.com/rails/rails/blob/v6.0.2.1/actionview/lib/action_view/renderer/partial_renderer.rb#L348-L356
opts[:locals] = opts[:locals].dup if opts[:locals]
render_partials_serial(view_context.dup, slice, opts)
end
end.flat_map(&:value)

I think I understand why the partials are rendered this way -- an attempt at parallelizing view rendering for very expensive/large-collection partials? -- but it also isn't protecting against going over the pool limit and triggering issues like I'm experiencing.

For now, I've set Thredded::CollectionToStringsWithCacheRenderer.render_threads = 1 which appears to have resolved the issue - albeit with slightly lower performance. It took some digging to figure that out, so I'm not sure if this is well documented yet or if other have experienced this before?

I also wonder if I can somehow increase the number of rendering threads beyond 1, but stop db exhaustion from happening. I spent a few minutes testing the concept of wrapping each serial partial render with ActiveRecord::Base.connection_pool.with_connection to grab a connection and return it after each render. Perhaps I'm just not fully testing the system, but it seemed to resolve my issues -- though I'd think that it could still cause pool exhaustion (if you have 10 connections in your pool and you try to render a large collection that generates 25 threads, it's possible it could take too long to get a connection)

def render_partials_serial(view_context, collection, opts)
  ActiveRecord::Base.connection_pool.with_connection do
    partial_renderer = ActionView::PartialRenderer.new(@lookup_context)
    collection.map { |object| render_partial(partial_renderer, view_context, opts.merge(object: object)) }
  end
end

Any thoughts or ways I can assist? I do recognize that if I increased my database pool to allow more connections, I may never hit this issue, but resources are somewhat constrained and it was surprising to see a Thread.start in Thredded without some more documentation around it or issues being raised.

Thanks for everyone's work on this fine gem.

@Velora
Copy link

Velora commented Jun 12, 2020

Hey @rickychilcott I wanted to chime in to say that we are experiencing the same issue as you.

I plan on looking into this more in the near future, but I wanted to share what I have found so far in case it is helpful to anyone.

First, Thredded::CollectionToStringsWithCacheRenderer.render_threads = 1 does resolve the issue for us as well.

I believe in our situation we could narrow it down to possibly being related to URLs being added in posts? We've noticed this issue occur on topics that have between 10-17 posts where almost every post has a URL added in the post. Other topics with many more posts and no URLs in their content have not had this issue.

We haven't tested enough to come up with perfect case were we can create the issue ourselves, but when we saw topics run into this issue it would simply be impossible to view them unless we set Thredded::CollectionToStringsWithCacheRenderer.render_threads = 1, and from those that did run into this issue they generally had many URLs in posts with between 10-17 posts total when they would break.

@glebm
Copy link
Collaborator

glebm commented Jun 13, 2020

Interesting insight about URLs. The only place I can think of that might hit the database there concurrently is Rails.cache (Thredded caches the results of Onebox renders by default). What's your Rails.cache set to?

@Velora
Copy link

Velora commented Jun 15, 2020

We're using memcachier. The posts shouldn't be expiring from the cache from what I can see (unless they aren't accessed in long enough that they are cleared from running out of cache space). I will have to test some more, but I don't think we have ever seen this error on a topic that didn't have a number of URLs in its posts.

@glebm
Copy link
Collaborator

glebm commented Jun 15, 2020

Does the stack trace indicate where it's trying to obtain a connection?

@timdiggins
Copy link
Collaborator

@rickychilcott @Velora is this now fixed with master or are you still having to set Threads to 1?

@Velora
Copy link

Velora commented Feb 18, 2022

@timdiggins thanks for following up. I haven't tried master, but with 0.16.16 it was an issue. Was there a commit to master that should fix this?

@timdiggins
Copy link
Collaborator

@Velora I don't think there was a commit since 0.16.16 specifically addressing this.

@rickychilcott
Copy link
Contributor Author

I'm deploying a change now that will make it easy for me to test this. I'll try it out in the next few days.

@rickychilcott
Copy link
Contributor Author

I just checked this in my production system and setting it to 10 threads. No good -- could not obtain a connection from the pool within 5.000`. So I've set it back to 1 thread.

@rickychilcott
Copy link
Contributor Author

rickychilcott commented Feb 21, 2022

And to clarify, this was tested with master revision 9a7158d, so the very latest.

@timdiggins
Copy link
Collaborator

It would be nice to have some kind of resolution to this issue before releasing v1.0 -- at minimum have some clarity as to when and why this happens.

I'm not really following what's happening, ie what's the problem.

Is it just that each thread requires a separate DB connection from the pool and that that quickly exhausts the total when there are enough concurrent renders? Or is it not releasing them at the end or similar issue? If so, is there a fix to this?

Alternatively, should we just change the default and make larger thread count as more of an experimental setting and state in the Readme that you should set the threads to 1 unless you have a very large pool of DB connections.

@timdiggins
Copy link
Collaborator

timdiggins commented Feb 22, 2022

@rickychilcott can you post a stacktrace of the database pool exhaustion in case it gives any insight into what's going on?

@timdiggins
Copy link
Collaborator

FYI slightly related discussion: #770

@rickychilcott
Copy link
Contributor Author

@rickychilcott can you post a stacktrace of the database pool exhaustion in case it gives any insight into what's going on?

Sure. The log is at https://gist.github.com/rickychilcott/49c98899a1689697aa3ed637f1160a61

@rickychilcott
Copy link
Contributor Author

It's been a while since I've looked at this, but the hard part is getting it to be reproducible locally.

I think if you set something like this in your database.yml you might be able to repro it more easily. I normally have it be something like:

default: &default
  adapter: postgresql
  encoding: unicode
  pool: <%= ENV["DB_POOL"] || ENV['RAILS_MAX_THREADS'] || 5 %>

And then set DB_POOL to a low number like 2 and then finally view a Thredded post with a larger number of comments. This is of course an artificially constrained setup, but would more easily force it to exhaust the connection limit.

@timdiggins
Copy link
Collaborator

Sure. The log is at https://gist.github.com/rickychilcott/49c98899a1689697aa3ed637f1160a61

Thanks @rickychilcott - Very useful! I'll look a bit more but any chance you can tell me what's going on at app/models/user.rb:83 ? It's the only line in the stacktrace that comes from the main app itself rather than gems. Probably some kind of db access?

@rickychilcott
Copy link
Contributor Author

rickychilcott commented Feb 23, 2022

You're right. Something basic. That line is...

default_scope -> { order(:name) }

Pretty boring.

@timdiggins
Copy link
Collaborator

@rickychilcott your responses have been super helpful.

The cause of db accesses in rendering posts are typically User mentions, and currently our demo content in the database seeder generates no user mentions, so it has been hard to repro this straightforwardly. Cutting down DB_POOL to a ridiculously low number (as you suggested) and also adding user mentions to the demo content makes it simple to repro.

It's also then easy to see that adding the ActiveRecord::Base.connection_pool.with_connection around the render_partials_serial fixes this (though only relevant to do this in multi-threading environment.

The drawback is that this will then grab a db connection from the pool for each render thread, whether it's needed (e.g. has a user mention) or not, but the downside is that db pool exhaustion is avoided. An alternative is to wrap the users_provider fetch (lib/thredded/users_provider.rb:28) with the connection_pool.with_connection, but I'm less convinced about this.

I'll put up a draft PR with a possible solution (and the code I used to generate the sample data) for comment.

@timdiggins
Copy link
Collaborator

timdiggins commented Feb 23, 2022

potentially fixed by #926

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants