Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Explicit row-level locking should only lock what is returned to the client #9929

Closed
pkj415 opened this issue Sep 5, 2021 · 0 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL)

Comments

@pkj415
Copy link
Contributor

pkj415 commented Sep 5, 2021

"When a locking clause appears at the top level of a SELECT query, the rows that are locked are exactly those that are returned by the query; in the case of a join query, the rows locked are those that contribute to returned join rows." - this is from Pg docs - https://www.postgresql.org/docs/12/sql-select.html

The below example shows this semantic in action. In YB, we lock more rows than what is returned to the user. Though this won't result in a correctness issue, to match Pg usability, we should lock only what is returned.

create table employees (k int primary key, profession text, email text);
create table physician (k int primary key, email text);
insert into employees values (1, 'sales', 'salesman1@xyz.com') ;
insert into employees values (2, 'sales', 'salesman2@xyz.com') ;
insert into employees values (3, 'physician', 'phy1@xyz.com') ;
insert into employees values (4, 'physician', 'phy2@xyz.com') ;
insert into physician values (1, 'phy1@xyz.com');
insert into physician values (2, 'phy2@xyz.com');

session1> begin transaction isolation level repeatable read;
session2> begin transaction isolation level repeatable read;

session1> select * from physician, employees where employees.profession = 'sales' and
employees.email = physician.email for update;
-- no rows will be returned to user, hence nothing locked in Pg. In YB we lock all rows since seq scan reads all rows.
Filtering removes rows later, but by that time we have already locked rows in YB.

session2> select * from employees for update; -- doesn't block in Pg (blocks in YB).

The cause of this is that even in REPEATABLE READ isolation level, we take coarser locks than just locks for specific rows. This should not be done in any isolation level except SERIALIZABLE (where coarser locks are required to ensure predicate locking to block new rows to be inserted that match the predicate of an ongoing txn's read).

NOTE: Locking only rows returned to user as compared to coarser chunks (i.e., by locking a prefix of the pk) might seem to be inefficient due to the many locks that we now have to take. But note coarser locks can also hurt performance by locking what isn't to be locked. So, both have their disadvantages based on the specific workload; but locking only rows returned to user gives us behaviour same as Postgres. This will serve as the motivation to switch to only locking rows that are returned to user rather than coarser locks.

@pkj415 pkj415 self-assigned this Sep 5, 2021
@pkj415 pkj415 added the area/ysql Yugabyte SQL (YSQL) label Sep 5, 2021
pkj415 added a commit to pkj415/yugabyte-db that referenced this issue Nov 30, 2021
…ing semantics + support SKIP LOCKED for REPEATABLE READ isolation level

Summary:
Part 1 (for first 2 issues):
----------------------------

Currently in YB, to lock rows explicitly, YSQL sends a marker with read requests to tserver processes.
A lock is taken on the longest possible prefix of the DocKey that will contain the data matching the query.
Example: Consider a table with pk (h1, h2, r1, r2). If a read request sets h1, h2, and r1, a lock is
taken on everything that matches that prefix (this is done via a single intent). Similarly, if the read
request sets just h1, a lock is taken on everything that matches that value for h1. And if a read request
doesn't specify h1, a lock is taken on the whole tablet. This is the case for both REPEATABLE READ
and SERIALIZABLE.

This results in 2 issues for REPEATABLE READ isolation in YSQL:
1. Even data that is filtered out later by YSQL is locked.
2. New tuples also can't be inserted that fall in the locked prefix range. So, we are locking future
   non-existant tuples as well.

For any isolation other than SERIALIZABLE, in Pg, locks are only taken on the rows that are "returned" to the
client or parent plan node by the LockRows plan node. Rows that are filtered out by child plan nodes of LockRows
are not locked. In YSQL, LockRows is just a dummy placeholder and we instead pass down locking information to
leaf nodes that scan docdb and hence lock larger chunks of the tablet.

This mismatch happens because explicit locking was implemented (https://phabricator.dev.yugabyte.com/D7007)
by just returning early from ExecLockRows() in YSQL and hence logic in the LockRows node was not used.

This diff performs explicit locking tuple by tuple in the LockRows node similar to Postgres.
This has two benefits -
1. We match postgres locking semantics in REPEATABLE READ (and also will match in the READ COMMITED isolation
   that we plan to add support for in near future).
2. This makes implementation of other Pg locking features like SKIP LOCKED stragiht-forward (since we now
   lock 1 tuple at a time and not whole chunks - which allows others txns to see what rows are still available
   for locking). Without this, such existing Pg features would require lot of code changes.

Locking only rows returned to user as compared to coarser chunks (i.e., by locking a prefix of the pk) might
seem to be inefficient due to the many locks that we now have to take. But note coarser locks can also hurt
performance by locking what isn't to be locked. So, both methods have their disadvantages based on the specific
workload; but locking only rows returned to user gives us behaviour same as Postgres. This serves as the
motivation to switch to only locking rows that are returned to user rather than coarser locks.

NOTE: For SERIALIZABLE level, explicit locking is same as before since we need to lock whole predicates to
block insertion of new rows matching the predicate (which the current prefix-locking achieves).

Part 2: Support for SKIP LOCKED
--------------------------------

After Part 1, SKIP LOCKED becomes straight-forward (since we now lock 1 tuple at a time and not whole
chunks - which allows others txns to see what rows are still available for locking).

The implementation plumbs the wait policy for locking from postgres to tserver process which uses
it in conflict_resolution.cc. If the policy for a certain intent is to skip, in case of conflict we return
a SkipLocking error.

Following the wait_policy in YBCLockTuple() will make the implementation clear.

Test Plan:
1. New test method TestRowLockInJoin: to ensure we lock only those tuples which are returned to user (in REPEATABLE READ isolation).
2. New test method TestSkipLocked: for SKIP LOCKED functionality.
3. TestRowKeyShareLock: This test existed already and assertions in it were based on the assumption that both SERIALIZABLE and REPEATABLE READ take blanket/predicate locks (e.g.: full table if no prefix of the pk is specified). The behaviour in test is corrected to match Pg behaviour. Also some more cases are added to it to test predicate locking in SERIALIZABLE level.
4. TestInsertSelectRowLock, TestDeleteSelectRowLock: Modified tests since yugabyte#2809 is fixed with this.

Reviewers: mbautin, sergei, neil, jason

Subscribers: yql, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D13352
pkj415 added a commit that referenced this issue Dec 3, 2021
…ort SKIP LOCKED for REPEATABLE READ isolation level

Summary:
**Part 1 (for first 2 issues):**

Currently in YB, to lock rows explicitly, YSQL sends a marker with read requests to tserver processes.
A lock is taken on the longest possible prefix of the DocKey that will contain the data matching the query.
Example: Consider a table with pk (h1, h2, r1, r2). If a read request sets h1, h2, and r1, a lock is
taken on everything that matches that prefix (this is done via a single intent). Similarly, if the read
request sets just h1, a lock is taken on everything that matches that value for h1. And if a read request
doesn't specify h1, a lock is taken on the whole tablet. This is the case for both REPEATABLE READ
and SERIALIZABLE.

**This results in 2 issues for REPEATABLE READ isolation in YSQL:**
1. Even data that is filtered out later by YSQL is locked.
2. New tuples also can't be inserted that fall in the locked prefix range. With a "prefix" intent, we are locking
    future non-existent tuples as well.

For any isolation other than SERIALIZABLE, in Pg, locks are only taken on the rows that are "returned" to the
client or parent plan node by the LockRows plan node. Rows that are filtered out by child plan nodes of LockRows
are not locked. In YSQL, LockRows is just a dummy placeholder and we instead pass down locking information to
leaf nodes that scan docdb and hence lock larger chunks of the tablet.

This mismatch happens because explicit locking was implemented (https://phabricator.dev.yugabyte.com/D7007)
by just returning early from ExecLockRows() in YSQL and hence logic in the LockRows node was not used.

This diff performs explicit locking tuple by tuple in the LockRows node similar to Postgres.

**This has two benefits -**

i) We match postgres locking semantics in REPEATABLE READ (and also will match in the READ COMMITED isolation
that we plan to add support for in near future).

ii) For REPEATABLE READ/ READ COMMITTED, this makes implementation of other Pg locking features easier -
    a) SKIP LOCKED becomes straight-forward since we now lock 1 tuple at a time and not whole chunks - which allows others
        txns to see what rows are still available for locking). Without this, such existing Pg features would require lot of code changes.
    b) Special handling of conflicts that use "EvalPlanQual" in Postgres will be readily available for YB use as we implement READ
        COMMITTED.

Locking only rows returned to user as compared to coarser chunks (i.e., by locking a prefix of the pk) might
seem to be inefficient due to the many locks that we now have to take. But note coarser locks can also hurt
performance by locking what isn't to be locked. So, both methods have their disadvantages based on the specific
workload; but locking only rows returned to user gives us behaviour same as Postgres. This serves as the
motivation to switch to only locking rows that are returned to user rather than coarser locks.

To follow the flow of logic, start with //nodeLockRows.c//

This diff is a result of conversation from this slack thread - https://yugabyte.slack.com/archives/CAR5BCH29/p1631036276226800

```
NOTE:

i) For SERIALIZABLE level, explicit locking is same as before since we need to lock whole predicates to
block insertion of new rows matching the predicate (which the current prefix-locking achieves).
ii) We can optimize the case of explicit locking on a SELECT with full pk specified by using just 1 RPC call. The will follow after this diff lands (#10429)

```
**Part 2: Support for SKIP LOCKED**

After Part 1, SKIP LOCKED becomes straight-forward (since we now lock 1 tuple at a time and not whole
chunks - which allows others txns to see what rows are still available for locking).

The implementation plumbs the wait policy for locking from postgres to tserver process which uses
it in conflict_resolution.cc. If the policy for a certain intent is to skip, in case of conflict we return
a SkipLocking error.

Following the wait_policy in YBCLockTuple() will make the implementation clear.

Test Plan:
**New tests:**

1. TestRowLockInJoin: to ensure we lock only those tuples which are returned to user (in REPEATABLE READ isolation).
2. TestSkipLocked: for SKIP LOCKED functionality.

**Modified existing tests:**
1. TestInsertSelectRowLock, TestDeleteSelectRowLock: Modified tests since #2809 is fixed with this.
2. TestRowKeyShareLock: This test existed already and assertions in it were based on the assumption that both SERIALIZABLE and REPEATABLE READ take blanket/predicate locks (e.g.: full table if no prefix of the pk is specified). The behaviour in test is corrected to match Pg behaviour. Also some more cases are added to it to test predicate locking in SERIALIZABLE level.
3. TestPgExplicitLocks, TestPgForeignKeyOptimization and TestPgTransactions.testExplicitLocking

Jenkins: urgent

Reviewers: mbautin, neil, jason, smishra, mihnea, dmitry, sergei

Reviewed By: dmitry, sergei

Subscribers: yql, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D13352
@pkj415 pkj415 closed this as completed Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL)
Projects
None yet
Development

No branches or pull requests

1 participant