Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise a specific exception when acquiring the commit lock fails. #18

Closed
wants to merge 2 commits into from
Closed

Raise a specific exception when acquiring the commit lock fails. #18

wants to merge 2 commits into from

Conversation

jamadden
Copy link
Member

@jamadden jamadden commented Sep 3, 2014

Like other issues and pull requests have noted, the global commit lock can sometimes become a bottleneck. Because this issue is most often due to transient issues like high write loads which in certain scenarios may be recoverable (e.g., by backing off and retrying), it can be useful to specifically catch the exception thrown in that case. A plain StorageError can have many meanings, and the different text messages used between databases means it's not necessarily portably reliable to try to distinguish this case based on the message alone.

This PR adds a specific exception that the ILocker instances throw when getting a commit lock fails. For backwards compatibility, it is a subclass of StorageError. It does not mixin transaction.interfaces.TransientError (even though it is probably transient) because that would probably result in behaviour changes as middleware transaction managers would start retrying on this case, which may actually increase database load and commit lock failures. Therefore, users would still need to account for this case explicitly.

Internally, we've been using a monkey-patched version of this technique and it has been useful for us to be able to distinguish this case. Although we haven't had a use case for them, I can add specific exception subclasses for the other errors thrown by locker.py in this or another PR. I'm very open to any comments or suggestions.

…quireCommitLockError, when acquiring the commit lock fails.
@hathawsh
Copy link
Member

hathawsh commented Sep 5, 2014

Which database type are you using? I'm surprised this error ever occurs. If RelStorage can't acquire a lock, it's normally supposed to just block until it can.

@jamadden
Copy link
Member Author

jamadden commented Sep 5, 2014

Good question. We are currently using MySQL, which along with Oracle, according to README.txt and locker.py, is one of the two databases to support the commit-lock-timeout setting (while PostgreSQL does not). Likewise, according to the documentation and options.py, that value defaults to 30 seconds.

We have actually increased that timeout to 60 seconds but can still sometimes see this. Because conflict resolution occurs with the commit lock held, in the past we've found that inefficient or expensive conflict resolution, such as the kind that can happen with very large zc.queue objects, can easily take much longer than that. In that case, the solutions were to adjust our code to avoid those queues or conflict resolution. But sadly we haven't yet been able to identify hotspots in every case.

I've considered trying to move to PostgreSQL, but the fact that it doesn't support the commit-lock-timeout may be a deal breaker as long as we do see these long commit-lock times. Our application is an interactive web application, and even if the write ratio is low, if locks can get held forever, a few long-running commits may eventually stack up other commits behind them (blocking their request threads) and quickly exhaust all the available request threads, resulting in an effective DoS of the site.

We actually mitigate this by simply aborting read-only GET requests so that they never have to acquire the commit lock at all, only true write requests do---plus 60 seconds is still a long time to wait to receive GET results. In the early days, that made quite a difference. I'd be grateful for any suggestions about reducing or eliminating this lock time/contention.

@seanupton
Copy link
Contributor

@jamadden FWIW, I have implemented commmit-lock-timeout for PostgreSQL 9.3+ in another pull request: #20

Have you considered setting RELSTORAGE_ABORT_EARLY in your environment? I suspect, but do not know that might work-around what you are seeing in MySQL in a similar way to what I am seeing in PostgreSQL [1].

[1] https://mail.zope.org/pipermail/zodb-dev/2014-July/015230.html

@jamadden
Copy link
Member Author

jamadden commented Jun 9, 2016

From what I could determine, the ABORT_EARLY option did not have any impact (but I can't remember for sure if we actually tried it or if that was just based on code inspection.)

This still looks like a good idea to me.

@jamadden jamadden mentioned this pull request Jun 9, 2016
@jamadden jamadden added this to the 1.7.0 milestone Jun 10, 2016
jamadden added a commit that referenced this pull request Jun 15, 2016
@jamadden jamadden closed this Jun 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants