[YSQL] Explicit row-level locking should only lock what is returned to the client #9929

pkj415 · 2021-09-05T21:08:16Z

"When a locking clause appears at the top level of a SELECT query, the rows that are locked are exactly those that are returned by the query; in the case of a join query, the rows locked are those that contribute to returned join rows." - this is from Pg docs - https://www.postgresql.org/docs/12/sql-select.html

The below example shows this semantic in action. In YB, we lock more rows than what is returned to the user. Though this won't result in a correctness issue, to match Pg usability, we should lock only what is returned.

create table employees (k int primary key, profession text, email text);
create table physician (k int primary key, email text);
insert into employees values (1, 'sales', 'salesman1@xyz.com') ;
insert into employees values (2, 'sales', 'salesman2@xyz.com') ;
insert into employees values (3, 'physician', 'phy1@xyz.com') ;
insert into employees values (4, 'physician', 'phy2@xyz.com') ;
insert into physician values (1, 'phy1@xyz.com');
insert into physician values (2, 'phy2@xyz.com');

session1> begin transaction isolation level repeatable read;
session2> begin transaction isolation level repeatable read;

session1> select * from physician, employees where employees.profession = 'sales' and
employees.email = physician.email for update;
-- no rows will be returned to user, hence nothing locked in Pg. In YB we lock all rows since seq scan reads all rows.
Filtering removes rows later, but by that time we have already locked rows in YB.

session2> select * from employees for update; -- doesn't block in Pg (blocks in YB).

The cause of this is that even in REPEATABLE READ isolation level, we take coarser locks than just locks for specific rows. This should not be done in any isolation level except SERIALIZABLE (where coarser locks are required to ensure predicate locking to block new rows to be inserted that match the predicate of an ongoing txn's read).

NOTE: Locking only rows returned to user as compared to coarser chunks (i.e., by locking a prefix of the pk) might seem to be inefficient due to the many locks that we now have to take. But note coarser locks can also hurt performance by locking what isn't to be locked. So, both have their disadvantages based on the specific workload; but locking only rows returned to user gives us behaviour same as Postgres. This will serve as the motivation to switch to only locking rows that are returned to user rather than coarser locks.

The text was updated successfully, but these errors were encountered:

…ing semantics + support SKIP LOCKED for REPEATABLE READ isolation level Summary: Part 1 (for first 2 issues): ---------------------------- Currently in YB, to lock rows explicitly, YSQL sends a marker with read requests to tserver processes. A lock is taken on the longest possible prefix of the DocKey that will contain the data matching the query. Example: Consider a table with pk (h1, h2, r1, r2). If a read request sets h1, h2, and r1, a lock is taken on everything that matches that prefix (this is done via a single intent). Similarly, if the read request sets just h1, a lock is taken on everything that matches that value for h1. And if a read request doesn't specify h1, a lock is taken on the whole tablet. This is the case for both REPEATABLE READ and SERIALIZABLE. This results in 2 issues for REPEATABLE READ isolation in YSQL: 1. Even data that is filtered out later by YSQL is locked. 2. New tuples also can't be inserted that fall in the locked prefix range. So, we are locking future non-existant tuples as well. For any isolation other than SERIALIZABLE, in Pg, locks are only taken on the rows that are "returned" to the client or parent plan node by the LockRows plan node. Rows that are filtered out by child plan nodes of LockRows are not locked. In YSQL, LockRows is just a dummy placeholder and we instead pass down locking information to leaf nodes that scan docdb and hence lock larger chunks of the tablet. This mismatch happens because explicit locking was implemented (https://phabricator.dev.yugabyte.com/D7007) by just returning early from ExecLockRows() in YSQL and hence logic in the LockRows node was not used. This diff performs explicit locking tuple by tuple in the LockRows node similar to Postgres. This has two benefits - 1. We match postgres locking semantics in REPEATABLE READ (and also will match in the READ COMMITED isolation that we plan to add support for in near future). 2. This makes implementation of other Pg locking features like SKIP LOCKED stragiht-forward (since we now lock 1 tuple at a time and not whole chunks - which allows others txns to see what rows are still available for locking). Without this, such existing Pg features would require lot of code changes. Locking only rows returned to user as compared to coarser chunks (i.e., by locking a prefix of the pk) might seem to be inefficient due to the many locks that we now have to take. But note coarser locks can also hurt performance by locking what isn't to be locked. So, both methods have their disadvantages based on the specific workload; but locking only rows returned to user gives us behaviour same as Postgres. This serves as the motivation to switch to only locking rows that are returned to user rather than coarser locks. NOTE: For SERIALIZABLE level, explicit locking is same as before since we need to lock whole predicates to block insertion of new rows matching the predicate (which the current prefix-locking achieves). Part 2: Support for SKIP LOCKED -------------------------------- After Part 1, SKIP LOCKED becomes straight-forward (since we now lock 1 tuple at a time and not whole chunks - which allows others txns to see what rows are still available for locking). The implementation plumbs the wait policy for locking from postgres to tserver process which uses it in conflict_resolution.cc. If the policy for a certain intent is to skip, in case of conflict we return a SkipLocking error. Following the wait_policy in YBCLockTuple() will make the implementation clear. Test Plan: 1. New test method TestRowLockInJoin: to ensure we lock only those tuples which are returned to user (in REPEATABLE READ isolation). 2. New test method TestSkipLocked: for SKIP LOCKED functionality. 3. TestRowKeyShareLock: This test existed already and assertions in it were based on the assumption that both SERIALIZABLE and REPEATABLE READ take blanket/predicate locks (e.g.: full table if no prefix of the pk is specified). The behaviour in test is corrected to match Pg behaviour. Also some more cases are added to it to test predicate locking in SERIALIZABLE level. 4. TestInsertSelectRowLock, TestDeleteSelectRowLock: Modified tests since yugabyte#2809 is fixed with this. Reviewers: mbautin, sergei, neil, jason Subscribers: yql, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D13352

…ort SKIP LOCKED for REPEATABLE READ isolation level Summary: **Part 1 (for first 2 issues):** Currently in YB, to lock rows explicitly, YSQL sends a marker with read requests to tserver processes. A lock is taken on the longest possible prefix of the DocKey that will contain the data matching the query. Example: Consider a table with pk (h1, h2, r1, r2). If a read request sets h1, h2, and r1, a lock is taken on everything that matches that prefix (this is done via a single intent). Similarly, if the read request sets just h1, a lock is taken on everything that matches that value for h1. And if a read request doesn't specify h1, a lock is taken on the whole tablet. This is the case for both REPEATABLE READ and SERIALIZABLE. **This results in 2 issues for REPEATABLE READ isolation in YSQL:** 1. Even data that is filtered out later by YSQL is locked. 2. New tuples also can't be inserted that fall in the locked prefix range. With a "prefix" intent, we are locking future non-existent tuples as well. For any isolation other than SERIALIZABLE, in Pg, locks are only taken on the rows that are "returned" to the client or parent plan node by the LockRows plan node. Rows that are filtered out by child plan nodes of LockRows are not locked. In YSQL, LockRows is just a dummy placeholder and we instead pass down locking information to leaf nodes that scan docdb and hence lock larger chunks of the tablet. This mismatch happens because explicit locking was implemented (https://phabricator.dev.yugabyte.com/D7007) by just returning early from ExecLockRows() in YSQL and hence logic in the LockRows node was not used. This diff performs explicit locking tuple by tuple in the LockRows node similar to Postgres. **This has two benefits -** i) We match postgres locking semantics in REPEATABLE READ (and also will match in the READ COMMITED isolation that we plan to add support for in near future). ii) For REPEATABLE READ/ READ COMMITTED, this makes implementation of other Pg locking features easier - a) SKIP LOCKED becomes straight-forward since we now lock 1 tuple at a time and not whole chunks - which allows others txns to see what rows are still available for locking). Without this, such existing Pg features would require lot of code changes. b) Special handling of conflicts that use "EvalPlanQual" in Postgres will be readily available for YB use as we implement READ COMMITTED. Locking only rows returned to user as compared to coarser chunks (i.e., by locking a prefix of the pk) might seem to be inefficient due to the many locks that we now have to take. But note coarser locks can also hurt performance by locking what isn't to be locked. So, both methods have their disadvantages based on the specific workload; but locking only rows returned to user gives us behaviour same as Postgres. This serves as the motivation to switch to only locking rows that are returned to user rather than coarser locks. To follow the flow of logic, start with //nodeLockRows.c// This diff is a result of conversation from this slack thread - https://yugabyte.slack.com/archives/CAR5BCH29/p1631036276226800 ``` NOTE: i) For SERIALIZABLE level, explicit locking is same as before since we need to lock whole predicates to block insertion of new rows matching the predicate (which the current prefix-locking achieves). ii) We can optimize the case of explicit locking on a SELECT with full pk specified by using just 1 RPC call. The will follow after this diff lands (#10429) ``` **Part 2: Support for SKIP LOCKED** After Part 1, SKIP LOCKED becomes straight-forward (since we now lock 1 tuple at a time and not whole chunks - which allows others txns to see what rows are still available for locking). The implementation plumbs the wait policy for locking from postgres to tserver process which uses it in conflict_resolution.cc. If the policy for a certain intent is to skip, in case of conflict we return a SkipLocking error. Following the wait_policy in YBCLockTuple() will make the implementation clear. Test Plan: **New tests:** 1. TestRowLockInJoin: to ensure we lock only those tuples which are returned to user (in REPEATABLE READ isolation). 2. TestSkipLocked: for SKIP LOCKED functionality. **Modified existing tests:** 1. TestInsertSelectRowLock, TestDeleteSelectRowLock: Modified tests since #2809 is fixed with this. 2. TestRowKeyShareLock: This test existed already and assertions in it were based on the assumption that both SERIALIZABLE and REPEATABLE READ take blanket/predicate locks (e.g.: full table if no prefix of the pk is specified). The behaviour in test is corrected to match Pg behaviour. Also some more cases are added to it to test predicate locking in SERIALIZABLE level. 3. TestPgExplicitLocks, TestPgForeignKeyOptimization and TestPgTransactions.testExplicitLocking Jenkins: urgent Reviewers: mbautin, neil, jason, smishra, mihnea, dmitry, sergei Reviewed By: dmitry, sergei Subscribers: yql, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D13352

pkj415 self-assigned this Sep 5, 2021

pkj415 added the area/ysql Yugabyte SQL (YSQL) label Sep 5, 2021

pkj415 mentioned this issue Sep 5, 2021

[YSQL] Postgres compatible transaction semantics #5683

Open

pkj415 mentioned this issue Oct 27, 2021

[YSQL] Lock row using single RPC in REPEATABLE READ & READ COMMITTED level if pk is specified #10429

Closed

pkj415 closed this as completed Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] Explicit row-level locking should only lock what is returned to the client #9929

[YSQL] Explicit row-level locking should only lock what is returned to the client #9929

pkj415 commented Sep 5, 2021 •

edited

Loading

[YSQL] Explicit row-level locking should only lock what is returned to the client #9929

[YSQL] Explicit row-level locking should only lock what is returned to the client #9929

Comments

pkj415 commented Sep 5, 2021 • edited Loading

pkj415 commented Sep 5, 2021 •

edited

Loading