DetermineKeysToLock take row level kStrongSnapshotWrite on every update request and will block the next one? #5036

iswarezwp · 2020-07-10T12:38:44Z

According to the Transaction isolation levels document said:

Multiple SI transactions could be modifying different columns in the same row concurrently. They acquire weak SI locks on the row key, and strong SI locks on the individual columns they are writing to. The weak SI locks on the row do not conflict with each other.

But DetermineKeysToLock seems to get only the whole doc_key of a DocOperation

    // here the GetDocPaths will only return the whole doc_key 
    RETURN_NOT_OK(doc_op->GetDocPaths(GetDocPathsMode::kLock, &doc_paths, &level));
    if (isolation_level != IsolationLevel::NON_TRANSACTIONAL) {
      level = isolation_level;
    }
    ......

and take strong intent locks on the doc_key path：

      RETURN_NOT_OK(ApplyIntent(doc_path, strong_intent_types, &result.lock_batch));

Did I get it wrong?

The text was updated successfully, but these errors were encountered:

mbautin · 2020-07-29T18:17:30Z

@iswarezwp DetermineKeysToLock is supposed to go through both weak and strong locks for a the given set of operations. The function DecodePrefixLengths computes the list of all possible prefixes from an encoded DocKey, invoked from this line in DetermineKeysToLock:

RETURN_NOT_OK(SubDocKey::DecodePrefixLengths(doc_path.as_slice(), &key_prefix_lengths));

When you said that DetermineKeysToLock seems to get only the whole doc_key of DocOperation, did you observe that in practice, perhaps by adding logging? In that case, I would be curious to see the exact test case because that would be a bug.

Full code of DetermineKeysToLock for context:

Result<DetermineKeysToLockResult> DetermineKeysToLock(
    const std::vector<std::unique_ptr<DocOperation>>& doc_write_ops,
    const google::protobuf::RepeatedPtrField<KeyValuePairPB>& read_pairs,
    const IsolationLevel isolation_level,
    const OperationKind operation_kind,
    const RowMarkType row_mark_type,
    bool transactional_table,
    PartialRangeKeyIntents partial_range_key_intents) {
  DetermineKeysToLockResult result;
  boost::container::small_vector<RefCntPrefix, 8> doc_paths;
  boost::container::small_vector<size_t, 32> key_prefix_lengths;
  result.need_read_snapshot = false;
  for (const unique_ptr<DocOperation>& doc_op : doc_write_ops) {
    doc_paths.clear();
    IsolationLevel level;
    RETURN_NOT_OK(doc_op->GetDocPaths(GetDocPathsMode::kLock, &doc_paths, &level));
    if (isolation_level != IsolationLevel::NON_TRANSACTIONAL) {
      level = isolation_level;
    }
    IntentTypeSet strong_intent_types = GetStrongIntentTypeSet(level, operation_kind,
                                                               row_mark_type);
    if (isolation_level == IsolationLevel::SERIALIZABLE_ISOLATION &&
        operation_kind == OperationKind::kWrite &&
        doc_op->RequireReadSnapshot()) {
      strong_intent_types = IntentTypeSet({IntentType::kStrongRead, IntentType::kStrongWrite});
    }

    for (const auto& doc_path : doc_paths) {
      key_prefix_lengths.clear();
      RETURN_NOT_OK(SubDocKey::DecodePrefixLengths(doc_path.as_slice(), &key_prefix_lengths));
      // At least entire doc_path should be returned, so empty key_prefix_lengths is an error.
      if (key_prefix_lengths.empty()) {
        return STATUS_FORMAT(Corruption, "Unable to decode key prefixes from: $0",
                             doc_path.as_slice().ToDebugHexString());
      }
      // We will acquire strong lock on entire doc_path, so remove it from list of weak locks.
      key_prefix_lengths.pop_back();
      auto partial_key = doc_path;
      // Acquire weak lock on empty key for transactional tables,
      // unless specified key is already empty.
      if (doc_path.size() > 0 && transactional_table) {
        partial_key.Resize(0);
        RETURN_NOT_OK(ApplyIntent(
            partial_key, StrongToWeak(strong_intent_types), &result.lock_batch));
      }
      for (size_t prefix_length : key_prefix_lengths) {
        partial_key.Resize(prefix_length);
        RETURN_NOT_OK(ApplyIntent(
            partial_key, StrongToWeak(strong_intent_types), &result.lock_batch));
      }

      RETURN_NOT_OK(ApplyIntent(doc_path, strong_intent_types, &result.lock_batch));
    }

    if (doc_op->RequireReadSnapshot()) {
      result.need_read_snapshot = true;
    }
  }

  if (!read_pairs.empty()) {
    RETURN_NOT_OK(EnumerateIntents(
        read_pairs,
        [&result](IntentStrength strength, Slice value, KeyBytes* key, LastKey) {
          RefCntPrefix prefix(key->AsSlice());
          auto intent_types = strength == IntentStrength::kStrong
              ? IntentTypeSet({IntentType::kStrongRead})
              : IntentTypeSet({IntentType::kWeakRead});
          return ApplyIntent(prefix, intent_types, &result.lock_batch);
        }, partial_range_key_intents));
  }

  return result;
}

iswarezwp · 2020-07-30T02:22:32Z

@mbautin I add some log at the end of ApplyIntent function：

CHECKED_STATUS ApplyIntent(RefCntPrefix key,
                           const IntentTypeSet intent_types,
                           LockBatchEntries *keys_locked) {
  // Have to strip kGroupEnd from end of key, because when only hash key is specified, we will
  // get two kGroupEnd at end of strong intent.
  size_t size = key.size();
  if (size > 0) {
    if (key.data()[0] == ValueTypeAsChar::kGroupEnd) {
      if (size != 1) {
        return STATUS_FORMAT(Corruption, "Key starting with group end: $0",
            key.as_slice().ToDebugHexString());
      }
      size = 0;
    } else {
      while (key.data()[size - 1] == ValueTypeAsChar::kGroupEnd) {
        --size;
      }
    }
  }
  key.Resize(size);
  keys_locked->push_back({key, intent_types});
  LOG(INFO) << "ApplyIntent to " << key.as_slice().ToDebugString() << ", IntentTypeSet: " << intent_types.ToUIntPtr();
  return Status::OK();
}

and create a table with 4 columns and insert some rows for test：

yugabyte=# create table test(id int primary key, col1 text, col2 text, col3 text);
CREATE TABLE
yugabyte=# insert into test values(1, 'a1', 'b1', 'c1');
INSERT 0 1
yugabyte=# insert into test values(2, 'a2', 'b2', 'c2');
INSERT 0 1
yugabyte=# insert into test values(3, 'a3', 'b3', 'c3');
INSERT 0 1

test 1: update the same raw with different column:

yugabyte=# update test set col1='a11' where id=1;
yugabyte=# update test set col2='b11' where id=1;

logs:

# update col1:
I0730 02:05:57.276625 116829 docdb.cc:163] ApplyIntent to , IntentTypeSet: 3
I0730 02:05:57.276651 116829 docdb.cc:163] ApplyIntent to 4712104880000001, IntentTypeSet: 12

# update col2:
I0730 02:08:56.951122 116825 docdb.cc:163] ApplyIntent to , IntentTypeSet: 3
I0730 02:08:56.951149 116825 docdb.cc:163] ApplyIntent to 4712104880000001, IntentTypeSet: 12

These tow update query apply same IntentTypeSet to the same doc_path: 4712104880000001。
IntentTypeSet: 12 stand for 0b1100，means：IntentTypeSet({IntentType::kStrongRead, IntentType::kStrongWrite}，I think this two IntentTypeSet(12) will be conflicted on the same doc_path and SharedLockManager will block the later update query until the first update release the lock。

test 2: also I test another row's update on col1:

yugabyte=# update test set col1='a21' where id=2;

logs:

I0730 02:11:05.549399 116826 docdb.cc:163] ApplyIntent to , IntentTypeSet: 3
I0730 02:11:05.549427 116826 docdb.cc:163] ApplyIntent to 47C0C44880000002, IntentTypeSet: 12

the doc_path value is different now.

…erations Summary: **Item 1:** An `UPDATE` operation is a `read-modify-write` operation as DocDB finds the row first and then performs the update. In a `SERIALIZABLE ISOLATION` transaction, DocDB takes a read lock for the row that is being updated to prevent this row from being removed by a concurrent transaction. This is done by creating a `strong read` intent for the row. But this kind of an intent conflicts with updates to that row's columns by another transaction. ``` CREATE TABLE t(k INT PRIMARY KEY, v1 INT, v2 INT); INSERT INTO t VALUES(1, 2, 3); --Connection 1: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v1 = 20 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v1), weak write for (1) --Connection 2: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v2 = 30 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v2), weak write for (1) -- Where strong read for (1) conflicts with weak write for (1) in Connection 1 ``` To avoid conflict in the described scenario, we must use a `weak read` intent for row locking (to prevent row from being removed) instead of a `strong read`. The `DocOperation::GetDocPaths` function called with a `GetDocPathsMode::kLock` argument determines what intents to use for row locking. `Strong read` intents will be created for all the paths returned by this method call. Also `weak read` intents will be created for all the prefixes of returned paths. As a result, to make `weak read` intents for a row, `GetDocPaths` method may return the path for an arbitrary column of that row instead of the row itself. (And a strong read intent will be created on that column.) The path of the liveness column can be used for this purpose. **Note:** In case `UPDATE` command uses an expression instead of an explicit value for some of the columns the `GetDocPaths` method will create `strong read` intent for the whole row because such the expressions may read the column values. And determining of exact set of read columns may be too expensive: ``` CREATE TABLE t(k INT PRIMARY KEY, v INT); INSERT INTO t values(1, 1); -- Connection 1 START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v = v + 1 WHERE k = 1; -- Connection 2 START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v = v * 3 WHERE k = 1; ``` **Item 2:** Pggate sends a write operation that only includes the modified columns in case of a `single row UPDATE`. In case of a `non-single row UPDATE` pggate sends a write operation with all columns (including unchanged). This causes DocDB to take redundant locks on unchanged columns. The reason why pggate works this way is that `BEFORE UPDATE` triggers may change additional columns in the row. A single-row UPDATE guarantees that the current table has no triggers at all, so no extra columns are changed. To avoid sending unchanged columns to DocDB in case of `non-single row UPDATE` (actually when row has `BEFORE UPDATE` trigger) the comparison of columns values in `old` and `new` tuple is performed. Extra columns are added into write operation only in case their values in `old` and `new` tuple don't match. **Note**: This part of diff fixes the problem with key column changes by the `BEFORE UPDATE` trigger. Unit test for this case are added. **Item 3:** To prevent row from being deleted by another transaction, a foreign key check sends a read command with a `ROW_MARK_KEYSHARE` row mark. But `ROW_MARK_KEYSHARE` creates a `strong read` intent (same as `ROW_MARK_SHARE`). As a result, updating of columns in a referenced table by another transaction conflicts with the current transaction. To avoid this conflict, `ROW_MARK_KEYSHARE` has to create a `weak read` intent instead of a `strong read`. But without extra changes the scenario with `FOR KEY SHARE` and an incomplete row doc key will be broken: ``` CREATE TABLE t(h INT, r1 INT, r2 INT, v INT, PRIMARY KEY(h, r1 ASC, r2 ASC)); INSERT INTO t VALUES(1, 2, 3, 4); -- Connection 1: BEGIN; SELECT * FROM t WHERE h = 1 FOR KEY SHARE; -- Another transaction should not be able to remove any of returned rows -- Connection 2: DELETE FROM t WHERE h = 1 AND r1 = 2 AND r2 = 3; -- Creates strong write intent for (1, 2, 3) and weak write intents for (1, 2) and (1), but Connection 1 created weak read intent for (1), so no conflict here and row will be deleted ``` To avoid such behavior, the internal client should use `ROW_MARK_SHARE` instead of `ROW_MARK_KEYSHARE` in scenarios where not all key components are specified in read operation. **item 4** To fix the #2922 issue completely the following mapping between row marks and intents is implemented ``` switch (row_mark) { case RowMarkType::ROW_MARK_EXCLUSIVE: // FOR UPDATE: strong read + strong write lock on the DocKey, // as if we're replacing or deleting the entire row in DocDB. return IntentTypeSet({IntentType::kStrongRead, IntentType::kStrongWrite}); case RowMarkType::ROW_MARK_NOKEYEXCLUSIVE: // FOR NO KEY UPDATE: strong read + weak write lock on the DocKey, as if we're reading // the entire row and then writing only a subset of columns in DocDB. return IntentTypeSet({IntentType::kStrongRead, IntentType::kWeakWrite}); case RowMarkType::ROW_MARK_SHARE: // FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB. return IntentTypeSet({IntentType::kStrongRead}); case RowMarkType::ROW_MARK_KEYSHARE: // FOR KEY SHARE: weak read lock on the DocKey, preventing the entire row from being // replaced / deleted, as if we're simply reading some of the column. // This is the type of locking that is used by foreign keys, so this will // prevent the referenced row from disappearing. The reason it does not // conflict with the FOR NO KEY UPDATE above is conceptually the following: // an operation that reads the entire row and then writes a subset of columns // (FOR NO KEY UPDATE) does not have to conflict with an operation that could // be reading a different subset of columns (FOR KEY SHARE). return IntentTypeSet({IntentType::kWeakRead}); default: break; } ``` Test Plan: New test cases were introduced ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressTrigger' ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.ReferencedTableUpdate* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SameColumnUpdate* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowKeyShareLock* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowLockConflictMatrix* ``` Reviewers: mihnea, alex, hbhanawat, mbautin, sergei Reviewed By: mbautin, sergei Subscribers: kannan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D11239

…ase of update operations Summary: **Item 1:** An `UPDATE` operation is a `read-modify-write` operation as DocDB finds the row first and then performs the update. In a `SERIALIZABLE ISOLATION` transaction, DocDB takes a read lock for the row that is being updated to prevent this row from being removed by a concurrent transaction. This is done by creating a `strong read` intent for the row. But this kind of an intent conflicts with updates to that row's columns by another transaction. ``` CREATE TABLE t(k INT PRIMARY KEY, v1 INT, v2 INT); INSERT INTO t VALUES(1, 2, 3); --Connection 1: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v1 = 20 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v1), weak write for (1) --Connection 2: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v2 = 30 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v2), weak write for (1) -- Where strong read for (1) conflicts with weak write for (1) in Connection 1 ``` To avoid conflict in the described scenario, we must use a `weak read` intent for row locking (to prevent row from being removed) instead of a `strong read`. The `DocOperation::GetDocPaths` function called with a `GetDocPathsMode::kLock` argument determines what intents to use for row locking. `Strong read` intents will be created for all the paths returned by this method call. Also `weak read` intents will be created for all the prefixes of returned paths. As a result, to make `weak read` intents for a row, `GetDocPaths` method may return the path for an arbitrary column of that row instead of the row itself. (And a strong read intent will be created on that column.) The path of the liveness column can be used for this purpose. **Note:** In case `UPDATE` command uses an expression instead of an explicit value for some of the columns the `GetDocPaths` method will create `strong read` intent for the whole row because such the expressions may read the column values. And determining of exact set of read columns may be too expensive: ``` CREATE TABLE t(k INT PRIMARY KEY, v INT); INSERT INTO t values(1, 1); -- Connection 1 START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v = v + 1 WHERE k = 1; -- Connection 2 START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v = v * 3 WHERE k = 1; ``` **Item 2:** Pggate sends a write operation that only includes the modified columns in case of a `single row UPDATE`. In case of a `non-single row UPDATE` pggate sends a write operation with all columns (including unchanged). This causes DocDB to take redundant locks on unchanged columns. The reason why pggate works this way is that `BEFORE UPDATE` triggers may change additional columns in the row. A single-row UPDATE guarantees that the current table has no triggers at all, so no extra columns are changed. To avoid sending unchanged columns to DocDB in case of `non-single row UPDATE` (actually when row has `BEFORE UPDATE` trigger) the comparison of columns values in `old` and `new` tuple is performed. Extra columns are added into write operation only in case their values in `old` and `new` tuple don't match. **Note**: This part of diff fixes the problem with key column changes by the `BEFORE UPDATE` trigger. Unit test for this case are added. **Item 3:** To prevent row from being deleted by another transaction, a foreign key check sends a read command with a `ROW_MARK_KEYSHARE` row mark. But `ROW_MARK_KEYSHARE` creates a `strong read` intent (same as `ROW_MARK_SHARE`). As a result, updating of columns in a referenced table by another transaction conflicts with the current transaction. To avoid this conflict, `ROW_MARK_KEYSHARE` has to create a `weak read` intent instead of a `strong read`. But without extra changes the scenario with `FOR KEY SHARE` and an incomplete row doc key will be broken: ``` CREATE TABLE t(h INT, r1 INT, r2 INT, v INT, PRIMARY KEY(h, r1 ASC, r2 ASC)); INSERT INTO t VALUES(1, 2, 3, 4); -- Connection 1: BEGIN; SELECT * FROM t WHERE h = 1 FOR KEY SHARE; -- Another transaction should not be able to remove any of returned rows -- Connection 2: DELETE FROM t WHERE h = 1 AND r1 = 2 AND r2 = 3; -- Creates strong write intent for (1, 2, 3) and weak write intents for (1, 2) and (1), but Connection 1 created weak read intent for (1), so no conflict here and row will be deleted ``` To avoid such behavior, the internal client should use `ROW_MARK_SHARE` instead of `ROW_MARK_KEYSHARE` in scenarios where not all key components are specified in read operation. **item 4** To fix the yugabyte#2922 issue completely the following mapping between row marks and intents is implemented ``` switch (row_mark) { case RowMarkType::ROW_MARK_EXCLUSIVE: // FOR UPDATE: strong read + strong write lock on the DocKey, // as if we're replacing or deleting the entire row in DocDB. return IntentTypeSet({IntentType::kStrongRead, IntentType::kStrongWrite}); case RowMarkType::ROW_MARK_NOKEYEXCLUSIVE: // FOR NO KEY UPDATE: strong read + weak write lock on the DocKey, as if we're reading // the entire row and then writing only a subset of columns in DocDB. return IntentTypeSet({IntentType::kStrongRead, IntentType::kWeakWrite}); case RowMarkType::ROW_MARK_SHARE: // FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB. return IntentTypeSet({IntentType::kStrongRead}); case RowMarkType::ROW_MARK_KEYSHARE: // FOR KEY SHARE: weak read lock on the DocKey, preventing the entire row from being // replaced / deleted, as if we're simply reading some of the column. // This is the type of locking that is used by foreign keys, so this will // prevent the referenced row from disappearing. The reason it does not // conflict with the FOR NO KEY UPDATE above is conceptually the following: // an operation that reads the entire row and then writes a subset of columns // (FOR NO KEY UPDATE) does not have to conflict with an operation that could // be reading a different subset of columns (FOR KEY SHARE). return IntentTypeSet({IntentType::kWeakRead}); default: break; } ``` Test Plan: New test cases were introduced ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressTrigger' ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.ReferencedTableUpdate* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SameColumnUpdate* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowKeyShareLock* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowLockConflictMatrix* ``` Reviewers: mihnea, alex, hbhanawat, mbautin, sergei Reviewed By: mbautin, sergei Subscribers: kannan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D11239

yugabyte-ci added the community/request Issues created by external users label Jul 10, 2020

kmuthukk added the area/docdb YugabyteDB core features label Jul 10, 2020

kmuthukk assigned ddorian Jul 10, 2020

ddorian assigned mbautin and unassigned ddorian Aug 10, 2020

bmatican added this to To do in Performance via automation Oct 28, 2020

d-uspenskiy assigned d-uspenskiy and unassigned mbautin Jan 22, 2021

m-iancu added this to the 2.7.x milestone Apr 27, 2021

m-iancu added this to Backlog in YSQL via automation Apr 27, 2021

m-iancu removed this from Backlog in YSQL Apr 27, 2021

d-uspenskiy closed this as completed Aug 10, 2021

Performance automation moved this from To do to Done Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DetermineKeysToLock take row level kStrongSnapshotWrite on every update request and will block the next one? #5036

DetermineKeysToLock take row level kStrongSnapshotWrite on every update request and will block the next one? #5036

iswarezwp commented Jul 10, 2020 •

edited by daniel-yb

Loading

mbautin commented Jul 29, 2020

iswarezwp commented Jul 30, 2020 •

edited

Loading

DetermineKeysToLock take row level kStrongSnapshotWrite on every update request and will block the next one? #5036

DetermineKeysToLock take row level kStrongSnapshotWrite on every update request and will block the next one? #5036

Comments

iswarezwp commented Jul 10, 2020 • edited by daniel-yb Loading

mbautin commented Jul 29, 2020

iswarezwp commented Jul 30, 2020 • edited Loading

iswarezwp commented Jul 10, 2020 •

edited by daniel-yb

Loading

iswarezwp commented Jul 30, 2020 •

edited

Loading