Revert #469 and return to taking shared locks first #471

jamadden · 2021-05-21T19:01:13Z

Stress testing a large busy Zope application indicated that performance did not improve and was actually slightly worse this way.

The following is part of an internal discussion on why that could have been.

In the test run, there were only 8 UnableToLockRowsToModifyError recorded as triggering retries, vs 66 UnableToLockRowsToReadCurrentError. In at least one case, a transaction got both, first Modify then ReadCurrent on the retry.

In the other thread about BTree sizes, we ran through an analysis of all the reported ReadCurrent errors and came to the conclusion that they're all due to modifications to the BTrees involved in the IntId utility.

The hypothesis goes that many transactions wind up waiting on an exclusive lock for one of those objects, and only after waiting for some period do they discover the ReadCurrent failure and doom the transaction.

If they check for the ReadCurrent failure first, and either find one or deadlock, they doom the transaction quickly without waiting and hopefully a retry succeeds.

I think what we didn't pay close enough attention to in the original discussion about changing the lock order is that any time there was a deadlock, it always really represented a doomed transaction --- there would be no way to recover without retrying anyway. The hope was that (1) instead of letting the RDBMS kill an arbitrary transaction, we would be in control of which transaction died, thus letting the one with the most invested already proceed and commit; and (2) maybe we could do more conflict resolution. (1) didn't pan out because it turns out that if two or more processes are sitting waiting on locks and the RDBMS kills one of them, neither one really had that much of an advantage over the other --- they were basically both in the same place. (2) didn't pan out because one of the transactions was doomed anyway.

This also fixes two bugs in the PostgreSQL implementation:

Selecting the right kind of lock error
Raising an exception in the stored proc had a Syntax error.

Stress testing a large busy Zope application indicated that performance did not improve and was actually slightly worse this way. The following is part of an internal discussion on why that could have been. In the test run, there were only 8 UnableToLockRowsToModifyError recorded as triggering retries, vs 66 UnableToLockRowsToReadCurrentError. In at least one case, a transaction got *both*, first Modify then ReadCurrent on the retry. In the other thread about BTree sizes, we ran through an analysis of all the reported ReadCurrent errors and came to the conclusion that they're *all* due to modifications to the BTrees involved in the IntId utility. The hypothesis goes that many transactions wind up waiting on an exclusive lock for one of those objects, and only after waiting for some period do they discover the ReadCurrent failure and doom the transaction. If they check for the ReadCurrent failure first, and either find one or deadlock, they doom the transaction quickly without waiting and hopefully a retry succeeds. I think what we didn't pay close enough attention to in the original discussion about changing the lock order is that any time there was a deadlock, it always really represented a doomed transaction --- there would be no way to recover without retrying *anyway*. The hope was that (1) instead of letting the RDBMS kill an arbitrary transaction, we would be in control of which transaction died, thus letting the one with the most invested already proceed and commit; and (2) maybe we could do more conflict resolution. (1) didn't pan out because it turns out that if two or more processes are sitting waiting on locks and the RDBMS kills one of them, neither one really had that much of an advantage over the other --- they were basically both in the same place. (2) didn't pan out because one of the transactions was doomed anyway.

jamadden merged commit 1c37cee into master May 21, 2021

jamadden deleted the revert-lock-order branch May 21, 2021 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert #469 and return to taking shared locks first #471

Revert #469 and return to taking shared locks first #471

jamadden commented May 21, 2021

Revert #469 and return to taking shared locks first #471

Revert #469 and return to taking shared locks first #471

Conversation

jamadden commented May 21, 2021