Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
makara connection pool management is not thread-safe #151
The connection pool management is not thread-safe. So, when running makara in thread-intensive code, such as sidekiq, subtle errors creep in.
The fix is not entirely straightforward: AR 4 and 5 has reorganized the connection pool management to be thread-safe, with lots of mutex usage. Makara has none of that, and it's connection pool management is not thread-safe.
In fact, makara uses an array of connections for each pool, which is traversed each time a connection decision is being made. Connections are added up to the maximum connection, but there are no protections against simultaneous access and changes across threads.
In addition, in the master and replica (slave) connection pools, there are connections with different states in each array: blacklisted or not.
We could protect the connection pool array with a semaphore, but traversing an array behind a semaphore is a Bad Idea because it blocks all other threads from traversing that same array. If the semaphore supported read-locking in addition to write-locking then multi-thread traversals would work in parallel, except for adding or removing connections.
However, IMHO, it would be better to have multiple arrays of connection pools, with each kind of connection being queued into a separate pool (array) of connections. So, blacklisted connections would be maintained in a blacklisted connection pool, and the remaining set of available connections would be in the available connection pool. Then, each pool could be managed as a thread-safe
Makara should be updated to make best use of the latest AR connection pool code, becoming thread-safe in the process.
I've already forked the makara repo, and have added a hijack on the
However, that change was insufficient. There are still AR timeouts occurring in our sidekiq processes, which are thread-full, and the stacktraces on those timeouts do not include makara code anywhere.
Whether or not the timeouts and lack-of-thread-awareness in makara are related is still TBD.
So, there appear to be two issues:
I've reviewed the ActiveRecord 4.2.6 and 5 code connection adapters and connection handling, to get an idea of how hard it would be to update/rewrite makara to become an alternative connection handler for AR.
In the current AR 4.2.6+ or 5 code, each model has a connection handler, which is supposed to determine the appropriate connection using a connection spec. Once a connection is selected, it is cached.
In order to support dynamic query balancing, each model would have to use a dynamic connection handler, which would choose a connection on each query, instead of using a static, cached connection. This appears to require a change to AR itself.
Would you happen to be aware anyone seriously looking into these AR issues? Are you or anyone on your team doing so?
The alternative to using an AR "insert" like makara is a proxy service, like
Since we are already using
As you might guess, right now I'm looking for the Path of Least Effort, much as the cat in Heinlein's book was always looking for the "Door Into Summer". I'm really hopeful that you might have a good answer to help us find that door.
I kind of like how switch_point does it https://github.com/eagletmt/switch_point/blob/master/lib/switch_point/proxy.rb#L22-L33 by making an active record subclass just for getting access to a connection_pool, similar to https://github.com/customink/secondbase/blob/master/lib/second_base/base.rb
or https://github.com/instructure/shackles/blob/master/lib/shackles/connection_handler.rb kind of extends the connection handler
@aks how have you guys mitigated this? We're evaluating makara with https://github.com/ankane/distribute_reads + sidekiq, and in our staging environment seeing that no Postgres data is being collected. My hunch is that this is related to makara's lack of thread safety.
This seems like a deal breaker for anyone using makara, right?
EDIT: I think this is just a reporting issue, what used to show up as
I don't see other well-maintained gems for real-write splitting in production, anyone has other suggestions that worked well in production environment?
FYI We use AWS Aurora MySQL as our backing store and our application is hosted on Heroku.
Check out the fresh_connection gem, or the pg-pool-II proxy.…
________________________________ From: Rajagopal <email@example.com> Sent: Wednesday, August 1, 2018 5:36 PM To: taskrabbit/makara Cc: Alan Stebbens; Mention Subject: Re: [taskrabbit/makara] makara connection pool management is not thread-safe (#151) @aks<https://github.com/aks> @jwg2s<https://github.com/jwg2s> @bleonard<https://github.com/bleonard> We are evaluating using makara for read-write splitting and came across this thread. Do you have updates/suggestions/mitigation options for this issue? I don't see other well-maintained gems for real-write splitting in production, anyone has other suggestions that worked well in production environment? FYI We use AWS Aurora MySQL as our backing store and our application is hosted on Heroku. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#151 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAAPlqgGkzAjoR5m8KSvKljms4mfxvMeks5uMkmAgaJpZM4MdAFx>.
@rajagopals recently went through the same thing, needed to migrate off of pgpool/pgbouncer and the thread-safety issue ultimately prevented us from adopting Makara.
We wound up using active_record_slave on top of an Aurora Postgres cluster. No issues yet in a multi-threaded production environment, handling ~40k queries per minute. YMMV
@patrykk21 Hmm, that would be a matter of getting lucky to see an issue. We'd need to examine the code to see how to get concurrent threads into a state that causes specific problems. Considering you're using Sidekiq and there are other libraries that do suggest they're thread-safe (possibly by leaning more heavily on Active Record's connection handlers), I'd go with an alternative for now.