New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage, lock_manager: Use the new lock waiting queue instead of WaiterManager to handle pessimistic lock waking up #13447
storage, lock_manager: Use the new lock waiting queue instead of WaiterManager to handle pessimistic lock waking up #13447
Conversation
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/release |
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
/release |
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
/release |
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
/release |
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
…ager into a directory Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Some test results: Hot update
Sysbench common
TPCC
(WIP...) |
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
} | ||
|
||
fn is_empty(&self) -> bool { | ||
self.wait_table.is_empty() | ||
self.waiter_pool.is_empty() | ||
} | ||
|
||
/// Returns the duplicated `Waiter` if there is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is stale. And the return value of this function is confusing now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest looks okay to me.
As it's so large a pull request, I'm not confident that code review could find most of the problems. We probably need to rely on further integration tests in this aspect.
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -60,13 +61,6 @@ lazy_static! { | |||
exponential_buckets(0.0001, 2.0, 20).unwrap() // 0.1ms ~ 104s | |||
) | |||
.unwrap(); | |||
pub static ref WAIT_TABLE_STATUS_GAUGE: WaitTableStatusGauge = register_static_int_gauge_vec!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do any related panels need to be removed accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't remove that panel, but I put LOCK_WAIT_QUEUE_ENTRIES_GAUGE_VEC to the same panel.
src/server/lock_manager/deadlock.rs
Outdated
} | ||
} | ||
DetectType::CleanUpWaitFor => { | ||
detect_table.clean_up_wait_for(txn_ts, lock.ts, lock.hash) | ||
let wait_info = wait_info.unwrap(); | ||
detect_table.clean_up_wait_for(txn_ts, wait_info.lock_digest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about merge them into one line?
@@ -12,6 +12,7 @@ make_auto_flush_static_metric! { | |||
detect, | |||
clean_up_wait_for, | |||
clean_up, | |||
update_wait_for, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentionally unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's unused currently, so does the update_wait_for
method of LockManager
.
src/storage/mvcc/txn.rs
Outdated
pub(crate) fn unlock_key(&mut self, key: Key, pessimistic: bool) -> Option<ReleasedLock> { | ||
let released = ReleasedLock::new(&key, pessimistic); | ||
/// Append a modify that unlocks the key. If the lock is removed due to | ||
/// committing, a non-zero `commit_ts` need to be provided; otherwise if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// committing, a non-zero `commit_ts` need to be provided; otherwise if | |
/// committing, a non-zero `commit_ts` needs to be provided; otherwise if |
// requests to detect deadlock, clean up its wait-for entries in the | ||
// deadlock detector. | ||
if is_pessimistic_txn && self.remove_from_detected(lock_ts) { | ||
self.detector_scheduler.clean_up(lock_ts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an equivalent part of the clean up logic now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the old logic wakes up all waiters waiting for the lock with lock_ts
on the specified key, and cleans all edges that waits for the lock_ts
from the detector. In new logic, waking-up happens in the new lock waiting queue, and when a lock-waiting request is finished (either canceled or resumed and successfully acquired, which is not yet supported), the remove_lock_wait
function in LockManager will be invoked, which then leads to a call to clean_up_wait_for
(here)
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
.remove("wake_up_delay_duration") | ||
.map(ReadableDuration::from) | ||
{ | ||
info!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only the config wake_up_delay_duration
need to print log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logs about the two config items mentioned here were previously printed in waiter_manager.rs
when handling Task::ChangeConfig
message. Now the wake_up_delay_duration
is changed to be handled somewhere else, so I printed it here. When changing wait_for_lock_timeout
, the log will still be printed at the old place.
} | ||
self.waiter_mgr_scheduler.wait_for( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any special meanings that you moved self.waiter_mgr_scheduler.wait_for
behind self.detector_scheduler.detect
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It seems to be an accident. I'll change it back.
/// `Notify` consumes the `Waiter` to notify the corresponding transaction | ||
/// going on. | ||
fn notify(self) { | ||
/// Consumes the `Waiter` to notify the corresponding transaction `going on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Consumes the `Waiter` to notify the corresponding transaction `going on. | |
/// Consumes the `Waiter` to notify the corresponding transaction going on. |
} | ||
|
||
fn cancel_for_timeout(self, _skip_resolving_lock: bool) -> KeyLockWaitInfo { | ||
let lock_info = self.wait_info.lock_info.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to clone lock_info
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move it, we will not be able to call another method cancel
on self
since self
is partially moved. And also, the function needs to return a complete KeyLockWaitInfo
to be used by the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see other problems
detector_scheduler, | ||
default_wait_for_lock_timeout: cfg.wait_for_lock_timeout, | ||
wake_up_delay_duration: cfg.wake_up_delay_duration, | ||
// wake_up_delay_duration: cfg.wake_up_delay_duration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we just remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I forgot it.
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
/merge |
@MyonKeminta: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 77caef7
|
@MyonKeminta: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
What is changed and how it works?
Issue Number: ref #13298
What's Changed:
This PR refactors the implementation of lock waiting and waking up by introducing a new lock waiting queue, without changing the current lock-waiting behavior. This will be part of the work of introducing the new lock waiting model (#13298)
Note that this PR doesn't introduce any optimization to the lock-waiting model. It's a refectory which is the basis of the optimization.
Requires:
WriteResultLockInfo
(returned byAcquirePessimisticLock::process_write
) carries parameters, which can be used for resuming the request in the future.WriteResultLockInfo
will be converted intoLockWaitContext
andLockWaitEntry
, and then send to bothLockManager
and the newLockWaitQueues
.Scheduler::process_write
, which will then callon_release_locks
to pop lock waiting entries from the queues and wake up them asynchronously (to avoid increasing too much latency of the current command).LockManager
(and its inner moduleWaiterManager
) no longer has the responsibility for waking up waiters, but keeps its functionality of handling timeout and performing deadlock detection. Instead, it has a newremove_lock_wait
method to remove a waiter from it.WaiterManager
can now be uniquely identified by aLockWaitToken
, and the data structure inWaiterManager
is therefore changed. Accessing by lock hash and transaction ts is still necessary to handle the result of deadlock detection.Related changes
pingcap/docs
/pingcap/docs-cn
:Check List
Tests (WIP)
Side effects
Release note