-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DetermineKeysToLock take row level kStrongSnapshotWrite on every update request and will block the next one? #5036
Comments
@iswarezwp DetermineKeysToLock is supposed to go through both weak and strong locks for a the given set of operations. The function RETURN_NOT_OK(SubDocKey::DecodePrefixLengths(doc_path.as_slice(), &key_prefix_lengths)); When you said that Full code of Result<DetermineKeysToLockResult> DetermineKeysToLock(
const std::vector<std::unique_ptr<DocOperation>>& doc_write_ops,
const google::protobuf::RepeatedPtrField<KeyValuePairPB>& read_pairs,
const IsolationLevel isolation_level,
const OperationKind operation_kind,
const RowMarkType row_mark_type,
bool transactional_table,
PartialRangeKeyIntents partial_range_key_intents) {
DetermineKeysToLockResult result;
boost::container::small_vector<RefCntPrefix, 8> doc_paths;
boost::container::small_vector<size_t, 32> key_prefix_lengths;
result.need_read_snapshot = false;
for (const unique_ptr<DocOperation>& doc_op : doc_write_ops) {
doc_paths.clear();
IsolationLevel level;
RETURN_NOT_OK(doc_op->GetDocPaths(GetDocPathsMode::kLock, &doc_paths, &level));
if (isolation_level != IsolationLevel::NON_TRANSACTIONAL) {
level = isolation_level;
}
IntentTypeSet strong_intent_types = GetStrongIntentTypeSet(level, operation_kind,
row_mark_type);
if (isolation_level == IsolationLevel::SERIALIZABLE_ISOLATION &&
operation_kind == OperationKind::kWrite &&
doc_op->RequireReadSnapshot()) {
strong_intent_types = IntentTypeSet({IntentType::kStrongRead, IntentType::kStrongWrite});
}
for (const auto& doc_path : doc_paths) {
key_prefix_lengths.clear();
RETURN_NOT_OK(SubDocKey::DecodePrefixLengths(doc_path.as_slice(), &key_prefix_lengths));
// At least entire doc_path should be returned, so empty key_prefix_lengths is an error.
if (key_prefix_lengths.empty()) {
return STATUS_FORMAT(Corruption, "Unable to decode key prefixes from: $0",
doc_path.as_slice().ToDebugHexString());
}
// We will acquire strong lock on entire doc_path, so remove it from list of weak locks.
key_prefix_lengths.pop_back();
auto partial_key = doc_path;
// Acquire weak lock on empty key for transactional tables,
// unless specified key is already empty.
if (doc_path.size() > 0 && transactional_table) {
partial_key.Resize(0);
RETURN_NOT_OK(ApplyIntent(
partial_key, StrongToWeak(strong_intent_types), &result.lock_batch));
}
for (size_t prefix_length : key_prefix_lengths) {
partial_key.Resize(prefix_length);
RETURN_NOT_OK(ApplyIntent(
partial_key, StrongToWeak(strong_intent_types), &result.lock_batch));
}
RETURN_NOT_OK(ApplyIntent(doc_path, strong_intent_types, &result.lock_batch));
}
if (doc_op->RequireReadSnapshot()) {
result.need_read_snapshot = true;
}
}
if (!read_pairs.empty()) {
RETURN_NOT_OK(EnumerateIntents(
read_pairs,
[&result](IntentStrength strength, Slice value, KeyBytes* key, LastKey) {
RefCntPrefix prefix(key->AsSlice());
auto intent_types = strength == IntentStrength::kStrong
? IntentTypeSet({IntentType::kStrongRead})
: IntentTypeSet({IntentType::kWeakRead});
return ApplyIntent(prefix, intent_types, &result.lock_batch);
}, partial_range_key_intents));
}
return result;
} |
@mbautin I add some log at the end of ApplyIntent function:
and create a table with 4 columns and insert some rows for test:
test 1: update the same raw with different column:
logs: # update col1:
I0730 02:05:57.276625 116829 docdb.cc:163] ApplyIntent to , IntentTypeSet: 3
I0730 02:05:57.276651 116829 docdb.cc:163] ApplyIntent to 4712104880000001, IntentTypeSet: 12
# update col2:
I0730 02:08:56.951122 116825 docdb.cc:163] ApplyIntent to , IntentTypeSet: 3
I0730 02:08:56.951149 116825 docdb.cc:163] ApplyIntent to 4712104880000001, IntentTypeSet: 12 These tow update query apply same IntentTypeSet to the same doc_path: 4712104880000001。 test 2: also I test another row's update on col1:
logs:
the doc_path value is different now. |
…erations Summary: **Item 1:** An `UPDATE` operation is a `read-modify-write` operation as DocDB finds the row first and then performs the update. In a `SERIALIZABLE ISOLATION` transaction, DocDB takes a read lock for the row that is being updated to prevent this row from being removed by a concurrent transaction. This is done by creating a `strong read` intent for the row. But this kind of an intent conflicts with updates to that row's columns by another transaction. ``` CREATE TABLE t(k INT PRIMARY KEY, v1 INT, v2 INT); INSERT INTO t VALUES(1, 2, 3); --Connection 1: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v1 = 20 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v1), weak write for (1) --Connection 2: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v2 = 30 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v2), weak write for (1) -- Where strong read for (1) conflicts with weak write for (1) in Connection 1 ``` To avoid conflict in the described scenario, we must use a `weak read` intent for row locking (to prevent row from being removed) instead of a `strong read`. The `DocOperation::GetDocPaths` function called with a `GetDocPathsMode::kLock` argument determines what intents to use for row locking. `Strong read` intents will be created for all the paths returned by this method call. Also `weak read` intents will be created for all the prefixes of returned paths. As a result, to make `weak read` intents for a row, `GetDocPaths` method may return the path for an arbitrary column of that row instead of the row itself. (And a strong read intent will be created on that column.) The path of the liveness column can be used for this purpose. **Note:** In case `UPDATE` command uses an expression instead of an explicit value for some of the columns the `GetDocPaths` method will create `strong read` intent for the whole row because such the expressions may read the column values. And determining of exact set of read columns may be too expensive: ``` CREATE TABLE t(k INT PRIMARY KEY, v INT); INSERT INTO t values(1, 1); -- Connection 1 START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v = v + 1 WHERE k = 1; -- Connection 2 START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v = v * 3 WHERE k = 1; ``` **Item 2:** Pggate sends a write operation that only includes the modified columns in case of a `single row UPDATE`. In case of a `non-single row UPDATE` pggate sends a write operation with all columns (including unchanged). This causes DocDB to take redundant locks on unchanged columns. The reason why pggate works this way is that `BEFORE UPDATE` triggers may change additional columns in the row. A single-row UPDATE guarantees that the current table has no triggers at all, so no extra columns are changed. To avoid sending unchanged columns to DocDB in case of `non-single row UPDATE` (actually when row has `BEFORE UPDATE` trigger) the comparison of columns values in `old` and `new` tuple is performed. Extra columns are added into write operation only in case their values in `old` and `new` tuple don't match. **Note**: This part of diff fixes the problem with key column changes by the `BEFORE UPDATE` trigger. Unit test for this case are added. **Item 3:** To prevent row from being deleted by another transaction, a foreign key check sends a read command with a `ROW_MARK_KEYSHARE` row mark. But `ROW_MARK_KEYSHARE` creates a `strong read` intent (same as `ROW_MARK_SHARE`). As a result, updating of columns in a referenced table by another transaction conflicts with the current transaction. To avoid this conflict, `ROW_MARK_KEYSHARE` has to create a `weak read` intent instead of a `strong read`. But without extra changes the scenario with `FOR KEY SHARE` and an incomplete row doc key will be broken: ``` CREATE TABLE t(h INT, r1 INT, r2 INT, v INT, PRIMARY KEY(h, r1 ASC, r2 ASC)); INSERT INTO t VALUES(1, 2, 3, 4); -- Connection 1: BEGIN; SELECT * FROM t WHERE h = 1 FOR KEY SHARE; -- Another transaction should not be able to remove any of returned rows -- Connection 2: DELETE FROM t WHERE h = 1 AND r1 = 2 AND r2 = 3; -- Creates strong write intent for (1, 2, 3) and weak write intents for (1, 2) and (1), but Connection 1 created weak read intent for (1), so no conflict here and row will be deleted ``` To avoid such behavior, the internal client should use `ROW_MARK_SHARE` instead of `ROW_MARK_KEYSHARE` in scenarios where not all key components are specified in read operation. **item 4** To fix the #2922 issue completely the following mapping between row marks and intents is implemented ``` switch (row_mark) { case RowMarkType::ROW_MARK_EXCLUSIVE: // FOR UPDATE: strong read + strong write lock on the DocKey, // as if we're replacing or deleting the entire row in DocDB. return IntentTypeSet({IntentType::kStrongRead, IntentType::kStrongWrite}); case RowMarkType::ROW_MARK_NOKEYEXCLUSIVE: // FOR NO KEY UPDATE: strong read + weak write lock on the DocKey, as if we're reading // the entire row and then writing only a subset of columns in DocDB. return IntentTypeSet({IntentType::kStrongRead, IntentType::kWeakWrite}); case RowMarkType::ROW_MARK_SHARE: // FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB. return IntentTypeSet({IntentType::kStrongRead}); case RowMarkType::ROW_MARK_KEYSHARE: // FOR KEY SHARE: weak read lock on the DocKey, preventing the entire row from being // replaced / deleted, as if we're simply reading some of the column. // This is the type of locking that is used by foreign keys, so this will // prevent the referenced row from disappearing. The reason it does not // conflict with the FOR NO KEY UPDATE above is conceptually the following: // an operation that reads the entire row and then writes a subset of columns // (FOR NO KEY UPDATE) does not have to conflict with an operation that could // be reading a different subset of columns (FOR KEY SHARE). return IntentTypeSet({IntentType::kWeakRead}); default: break; } ``` Test Plan: New test cases were introduced ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressTrigger' ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.ReferencedTableUpdate* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SameColumnUpdate* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowKeyShareLock* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowLockConflictMatrix* ``` Reviewers: mihnea, alex, hbhanawat, mbautin, sergei Reviewed By: mbautin, sergei Subscribers: kannan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D11239
…ase of update operations Summary: **Item 1:** An `UPDATE` operation is a `read-modify-write` operation as DocDB finds the row first and then performs the update. In a `SERIALIZABLE ISOLATION` transaction, DocDB takes a read lock for the row that is being updated to prevent this row from being removed by a concurrent transaction. This is done by creating a `strong read` intent for the row. But this kind of an intent conflicts with updates to that row's columns by another transaction. ``` CREATE TABLE t(k INT PRIMARY KEY, v1 INT, v2 INT); INSERT INTO t VALUES(1, 2, 3); --Connection 1: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v1 = 20 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v1), weak write for (1) --Connection 2: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v2 = 30 WHERE k = 1; -- created intents: strong read for (1), strong write for (1, v2), weak write for (1) -- Where strong read for (1) conflicts with weak write for (1) in Connection 1 ``` To avoid conflict in the described scenario, we must use a `weak read` intent for row locking (to prevent row from being removed) instead of a `strong read`. The `DocOperation::GetDocPaths` function called with a `GetDocPathsMode::kLock` argument determines what intents to use for row locking. `Strong read` intents will be created for all the paths returned by this method call. Also `weak read` intents will be created for all the prefixes of returned paths. As a result, to make `weak read` intents for a row, `GetDocPaths` method may return the path for an arbitrary column of that row instead of the row itself. (And a strong read intent will be created on that column.) The path of the liveness column can be used for this purpose. **Note:** In case `UPDATE` command uses an expression instead of an explicit value for some of the columns the `GetDocPaths` method will create `strong read` intent for the whole row because such the expressions may read the column values. And determining of exact set of read columns may be too expensive: ``` CREATE TABLE t(k INT PRIMARY KEY, v INT); INSERT INTO t values(1, 1); -- Connection 1 START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v = v + 1 WHERE k = 1; -- Connection 2 START TRANSACTION ISOLATION LEVEL SERIALIZABLE; UPDATE t SET v = v * 3 WHERE k = 1; ``` **Item 2:** Pggate sends a write operation that only includes the modified columns in case of a `single row UPDATE`. In case of a `non-single row UPDATE` pggate sends a write operation with all columns (including unchanged). This causes DocDB to take redundant locks on unchanged columns. The reason why pggate works this way is that `BEFORE UPDATE` triggers may change additional columns in the row. A single-row UPDATE guarantees that the current table has no triggers at all, so no extra columns are changed. To avoid sending unchanged columns to DocDB in case of `non-single row UPDATE` (actually when row has `BEFORE UPDATE` trigger) the comparison of columns values in `old` and `new` tuple is performed. Extra columns are added into write operation only in case their values in `old` and `new` tuple don't match. **Note**: This part of diff fixes the problem with key column changes by the `BEFORE UPDATE` trigger. Unit test for this case are added. **Item 3:** To prevent row from being deleted by another transaction, a foreign key check sends a read command with a `ROW_MARK_KEYSHARE` row mark. But `ROW_MARK_KEYSHARE` creates a `strong read` intent (same as `ROW_MARK_SHARE`). As a result, updating of columns in a referenced table by another transaction conflicts with the current transaction. To avoid this conflict, `ROW_MARK_KEYSHARE` has to create a `weak read` intent instead of a `strong read`. But without extra changes the scenario with `FOR KEY SHARE` and an incomplete row doc key will be broken: ``` CREATE TABLE t(h INT, r1 INT, r2 INT, v INT, PRIMARY KEY(h, r1 ASC, r2 ASC)); INSERT INTO t VALUES(1, 2, 3, 4); -- Connection 1: BEGIN; SELECT * FROM t WHERE h = 1 FOR KEY SHARE; -- Another transaction should not be able to remove any of returned rows -- Connection 2: DELETE FROM t WHERE h = 1 AND r1 = 2 AND r2 = 3; -- Creates strong write intent for (1, 2, 3) and weak write intents for (1, 2) and (1), but Connection 1 created weak read intent for (1), so no conflict here and row will be deleted ``` To avoid such behavior, the internal client should use `ROW_MARK_SHARE` instead of `ROW_MARK_KEYSHARE` in scenarios where not all key components are specified in read operation. **item 4** To fix the yugabyte#2922 issue completely the following mapping between row marks and intents is implemented ``` switch (row_mark) { case RowMarkType::ROW_MARK_EXCLUSIVE: // FOR UPDATE: strong read + strong write lock on the DocKey, // as if we're replacing or deleting the entire row in DocDB. return IntentTypeSet({IntentType::kStrongRead, IntentType::kStrongWrite}); case RowMarkType::ROW_MARK_NOKEYEXCLUSIVE: // FOR NO KEY UPDATE: strong read + weak write lock on the DocKey, as if we're reading // the entire row and then writing only a subset of columns in DocDB. return IntentTypeSet({IntentType::kStrongRead, IntentType::kWeakWrite}); case RowMarkType::ROW_MARK_SHARE: // FOR SHARE: strong read on the DocKey, as if we're reading the entire row in DocDB. return IntentTypeSet({IntentType::kStrongRead}); case RowMarkType::ROW_MARK_KEYSHARE: // FOR KEY SHARE: weak read lock on the DocKey, preventing the entire row from being // replaced / deleted, as if we're simply reading some of the column. // This is the type of locking that is used by foreign keys, so this will // prevent the referenced row from disappearing. The reason it does not // conflict with the FOR NO KEY UPDATE above is conceptually the following: // an operation that reads the entire row and then writes a subset of columns // (FOR NO KEY UPDATE) does not have to conflict with an operation that could // be reading a different subset of columns (FOR KEY SHARE). return IntentTypeSet({IntentType::kWeakRead}); default: break; } ``` Test Plan: New test cases were introduced ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressTrigger' ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.ReferencedTableUpdate* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.SameColumnUpdate* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowKeyShareLock* ./yb_build.sh --cxx-test pgwrapper_pg_mini-test --gtest_filter PgMiniTest.RowLockConflictMatrix* ``` Reviewers: mihnea, alex, hbhanawat, mbautin, sergei Reviewed By: mbautin, sergei Subscribers: kannan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D11239
According to the Transaction isolation levels document said:
But DetermineKeysToLock seems to get only the whole doc_key of a DocOperation
and take strong intent locks on the doc_key path:
Did I get it wrong?
The text was updated successfully, but these errors were encountered: