Raise a specific exception when acquiring the commit lock fails. #18

jamadden · 2014-09-03T15:04:09Z

Like other issues and pull requests have noted, the global commit lock can sometimes become a bottleneck. Because this issue is most often due to transient issues like high write loads which in certain scenarios may be recoverable (e.g., by backing off and retrying), it can be useful to specifically catch the exception thrown in that case. A plain StorageError can have many meanings, and the different text messages used between databases means it's not necessarily portably reliable to try to distinguish this case based on the message alone.

This PR adds a specific exception that the ILocker instances throw when getting a commit lock fails. For backwards compatibility, it is a subclass of StorageError. It does not mixin transaction.interfaces.TransientError (even though it is probably transient) because that would probably result in behaviour changes as middleware transaction managers would start retrying on this case, which may actually increase database load and commit lock failures. Therefore, users would still need to account for this case explicitly.

Internally, we've been using a monkey-patched version of this technique and it has been useful for us to be able to distinguish this case. Although we haven't had a use case for them, I can add specific exception subclasses for the other errors thrown by locker.py in this or another PR. I'm very open to any comments or suggestions.

…quireCommitLockError, when acquiring the commit lock fails.

hathawsh · 2014-09-05T05:01:10Z

Which database type are you using? I'm surprised this error ever occurs. If RelStorage can't acquire a lock, it's normally supposed to just block until it can.

jamadden · 2014-09-05T13:03:10Z

Good question. We are currently using MySQL, which along with Oracle, according to README.txt and locker.py, is one of the two databases to support the commit-lock-timeout setting (while PostgreSQL does not). Likewise, according to the documentation and options.py, that value defaults to 30 seconds.

We have actually increased that timeout to 60 seconds but can still sometimes see this. Because conflict resolution occurs with the commit lock held, in the past we've found that inefficient or expensive conflict resolution, such as the kind that can happen with very large zc.queue objects, can easily take much longer than that. In that case, the solutions were to adjust our code to avoid those queues or conflict resolution. But sadly we haven't yet been able to identify hotspots in every case.

I've considered trying to move to PostgreSQL, but the fact that it doesn't support the commit-lock-timeout may be a deal breaker as long as we do see these long commit-lock times. Our application is an interactive web application, and even if the write ratio is low, if locks can get held forever, a few long-running commits may eventually stack up other commits behind them (blocking their request threads) and quickly exhaust all the available request threads, resulting in an effective DoS of the site.

We actually mitigate this by simply aborting read-only GET requests so that they never have to acquire the commit lock at all, only true write requests do---plus 60 seconds is still a long time to wait to receive GET results. In the early days, that made quite a difference. I'd be grateful for any suggestions about reducing or eliminating this lock time/contention.

seanupton · 2014-10-10T17:39:04Z

@jamadden FWIW, I have implemented commmit-lock-timeout for PostgreSQL 9.3+ in another pull request: #20

Have you considered setting RELSTORAGE_ABORT_EARLY in your environment? I suspect, but do not know that might work-around what you are seeing in MySQL in a similar way to what I am seeing in PostgreSQL [1].

[1] https://mail.zope.org/pipermail/zodb-dev/2014-July/015230.html

jamadden · 2016-06-09T12:58:14Z

From what I could determine, the ABORT_EARLY option did not have any impact (but I can't remember for sure if we actually tried it or if that was just based on code inspection.)

This still looks like a good idea to me.

Update of #18

Raise a specific exception, relstorage.adapters.interfaces.UnableToAc…

55912ed

…quireCommitLockError, when acquiring the commit lock fails.

Merge https://github.com/zodb/relstorage

94facc3

jamadden added the enhancement label Jun 9, 2016

jamadden mentioned this pull request Jun 9, 2016

Final Release? #28

Closed

jamadden added this to the 1.7.0 milestone Jun 10, 2016

jamadden added a commit that referenced this pull request Jun 15, 2016

Merge pull request #48 from zodb/commit-lock-ex

e20d8b9

Update of #18

jamadden closed this Jun 15, 2016

jamadden mentioned this pull request Jun 15, 2016

Should abort_early be the default? #50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise a specific exception when acquiring the commit lock fails. #18

Raise a specific exception when acquiring the commit lock fails. #18

jamadden commented Sep 3, 2014

hathawsh commented Sep 5, 2014

jamadden commented Sep 5, 2014

seanupton commented Oct 10, 2014

jamadden commented Jun 9, 2016

Raise a specific exception when acquiring the commit lock fails. #18

Raise a specific exception when acquiring the commit lock fails. #18

Conversation

jamadden commented Sep 3, 2014

hathawsh commented Sep 5, 2014

jamadden commented Sep 5, 2014

seanupton commented Oct 10, 2014

jamadden commented Jun 9, 2016