*: add resource group for the read path #14001

glorv · 2022-12-28T16:16:11Z

Signed-off-by: glorv glorvs@163.com

What is changed and how it works?

Issue Number: Ref #13730

What's Changed:

This PR introduces some changes to support resource control and priority-based task scheduling. This implementation only some the task scheduling for unified-read-pool, similar change for the write path will be a separated PR. The logic for maintain the newest resource group metadata also will be included in a separated PR.

The resource control is based on resource group. A resource group maintains the resource quota(CPU time, IO, ...) for a specific collection of request. These requests is mark by the same `ResourceGroupName` in the request context. Typically, a resource group can represent one user or tenant.

### Resource Control
This PR introduced a new component called `resource_control`. This component is used to maintain the metadata and running state of each resource group. TiKV fetches the resource group config metadata from PD and use it to drive task scheduling.

### Config
We introduce a new config submodule `resource_control`, currently there is only one config time indicate whether the resource control feature is enabled. This config does not support online config change currently.

### Task Scheduling
As tasks with different resource group tags share the same unified-read-pool, the new scheduler scheduling tasks based on virtual time. Tasks from a resource group that has smaller virtual time will be scheduled earlier than a bigger one. And after each task finished, we calculate it consumed resources(as a int value) and increase the related virtual time(with a factor). The resource consumed factor is calculated by the inverted value of resource group quota config. In this way, we can ensure task from all resource group are scheduled fairly with its quota portion.

Currently, In the read path, we only track the CPU time of read tasks, but we may also take other resource into account such as IO read bytes.

Related changes

PR to update pingcap/docs/pingcap/docs-cn:
Need to cherry-pick to the release branch

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression
- Consumes more CPU
- Consumes more MEM
Breaking backward compatibility

Release note

Support priority-based scheduling  for read path

Signed-off-by: glorv <glorvs@163.com>

ti-chi-bot · 2022-12-28T16:16:12Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

Connor1996
nolouch

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

glorv · 2022-12-28T16:17:26Z

@BornChanger @HuSharp PTAL

glorv · 2022-12-28T16:17:57Z

depending on tikv/yatp#72

Signed-off-by: glorv <glorvs@163.com>

HuSharp

Do we still need to keep the HTTP interface? If we do, should we put it in the pr that provides the watcher mechanism?
BTW, @nolouch will change resource_manager.proto, Some changes may need to be made after it.

HuSharp · 2022-12-29T02:16:29Z

components/resource_control/src/resource_group.rs

+    fn get_ru_setting(setting: &GroupSettings, is_read: bool) -> f64 {
+        if setting.get_mode() == GroupMode::RuMode {
+            if is_read {
+                setting.get_r_u_settings().get_r_r_u().get_tokens()
+            } else {
+                setting.get_r_u_settings().get_w_r_u().get_tokens()
+            }
+        } else if is_read {
+            setting.get_resource_settings().get_cpu().get_tokens()
+        } else {
+            setting.get_resource_settings().get_io_write().get_tokens()
+        }
+    }


How about match (setting.get_mode(), is_read)

HuSharp · 2022-12-29T06:59:50Z

components/resource_control/src/resource_group.rs

+        let group1 = resouce_ctl.resource_group("test".as_bytes());
+        assert_eq!(group1.weight, 500);
+        let group2 = resouce_ctl.resource_group("test2".as_bytes());
+        assert_eq!(group1.weight, 250);


seems like group2

Signed-off-by: glorv <glorvs@163.com>

glorv · 2022-12-29T08:53:18Z

/test

Connor1996 · 2022-12-29T19:53:40Z

components/resource_control/src/resource_group.rs

+    }
+
+    pub fn get_resource_group(&self, name: &str) -> Option<Ref<'_, String, GroupSettings>> {
+        self.resource_groups.get(&name.to_ascii_lowercase())


how about restricting or converting on tidb side, it's wasteful to convert it to lowercase everytime

tidb will convert into lowercase then put it to resource manager(PD). That's mean the group name is case-insensitive.

This method should be only use to update or get the resource group meta, so it is not called very often. All methods under ResourceController won't do this convert because they are under the hot path.

…roup

BornChanger · 2023-01-03T09:06:03Z

components/resource_control/src/resource_group.rs

+    fn gen_group_priority_factor(&self, setting: &GroupSettings, is_read: bool) -> u64 {
+        let ru_settings = Self::get_ru_setting(setting, is_read);
+        // TODO: ensure the result is a valid positive integer
+        (self.total_ru_quota / ru_settings * 10.0) as u64


Suggested change

(self.total_ru_quota / ru_settings * 10.0) as u64

u64::max(self.total_ru_quota / ru_settings * 10.0) as u64, 1_u64)

Signed-off-by: glorv <glorvs@163.com>

BornChanger · 2023-01-03T16:31:52Z

src/storage/mod.rs

@@ -750,6 +752,11 @@ impl<E: Engine, L: LockManager, F: KvFormat> Storage<E, L, F> {
        const CMD: CommandKind = CommandKind::batch_get_command;
        // all requests in a batch have the same region, epoch, term, replica_read


Are we sure requests in a batch have the same resource group?

This is depend on the caller/use case. But in tidb I think in most case it it. In the common case, it unusual that two resource group access the same table region at the same time from the same tidb server. Though, in theory, this can happen, but I don't see the exact user case to do so.

BornChanger · 2023-01-03T16:34:52Z

src/storage/mod.rs

@@ -1657,6 +1673,11 @@ impl<E: Engine, L: LockManager, F: KvFormat> Storage<E, L, F> {
        const CMD: CommandKind = CommandKind::raw_batch_get_command;
        // all requests in a batch have the same region, epoch, term, replica_read


BornChanger · 2023-01-05T09:39:42Z

components/resource_control/src/lib.rs

+#[serde(rename_all = "kebab-case")]
+pub struct Config {
+    #[online_config(skip)]
+    pub enabled: bool,


please also add a sample in config-tempalte.toml

@Connor1996 PTAL, do you feel ok with this new config

Bad name. Should be enable.

nolouch · 2023-01-09T03:37:52Z

components/resource_control/src/resource_group.rs

+            (GroupMode::RuMode, false) => setting.get_r_u_settings().get_w_r_u().get_tokens(),
+            // TODO: currently we only consider the cpu usage in the read path, we may also take
+            // io read bytes into account later.
+            (GroupMode::NativeMode, true) => setting.get_resource_settings().get_cpu().get_tokens(),


I updated the kvproto, changed NativeMode to Raw mode which comes from you. :)

Also needs to change GroupSettings to ResourceGroup :)

Signed-off-by: glorv <glorvs@163.com>

nolouch · 2023-01-09T14:32:38Z

components/resource_control/src/resource_group.rs

+impl ResourceGroupManager {
+    fn get_ru_setting(rg: &ResourceGroup, is_read: bool) -> f64 {
+        match (rg.get_mode(), is_read) {
+            (GroupMode::RuMode, true) => rg.get_r_u_settings().get_r_r_u().get_tokens(),


I think should use fill_rate, the token is a state of the token bucket.

nolouch · 2023-01-09T14:55:29Z

components/resource_control/src/resource_group.rs

+    fn gen_group_priority_factor(&self, rg: &ResourceGroup, is_read: bool) -> u64 {
+        let ru_settings = Self::get_ru_setting(rg, is_read);
+        // TODO: ensure the result is a valid positive integer
+        (self.total_ru_quota / ru_settings * 10.0) as u64


Should total ru quota sum all resource group settings? It's at the global level more reasonable.

We just need to total ru quota to be a reasonable big enough value so after cast (self.total_ru_quota / ru_settings) as integer, it doesn't loss much precision. I don't find a good way to set this value, so just use a big enough value by default.

nolouch · 2023-01-09T17:23:39Z

components/resource_control/src/resource_group.rs

+
+impl GroupPriorityTracker {
+    fn get_priority(&self, level: usize) -> u64 {
+        // let level = match priority {


remove the code?

nolouch · 2023-01-09T17:47:57Z

components/resource_control/src/resource_group.rs

+        let mode = if is_ru_mode {
+            GroupMode::RuMode
+        } else {
+            GroupMode::NativeMode


Suggested change

GroupMode::NativeMode

GroupMode::RawMode

nolouch · 2023-01-09T18:15:13Z

components/resource_control/src/resource_group.rs

+        let group = GroupPriorityTracker {
+            weight: priority_factor,
+            virtual_time: AtomicU64::new(self.last_min_vt.load(Ordering::Acquire)),
+            vt_delta_for_get,


Is write will use field? otherwise no need is_read field, so will two resource control for one resource group?

Write and read uses different ResouceController(and GroupPriorityTracker), only their config are the same.

nolouch · 2023-01-09T18:28:57Z

components/resource_control/src/resource_group.rs

+
+        // TODO: use different threshold for different resource type
+        // needn't do update if the virtual different is less than 100ms/100KB.
+        if min_vt + 100_000 >= max_vt {


why choose 100ms?

This is choosed by intuition. As we adjust the vt of all group's each second, we leave some groups' value unchanged if the value is near the biggest value, so we can keep the small schedule advantage between them. This can let the tail lantency a bit stabler.

nolouch · 2023-01-09T18:40:24Z

components/resource_control/src/resource_group.rs

+        let mut extras1 = Extras::single_level();
+        extras1.set_metadata("test".as_bytes().to_owned());
+        assert_eq!(resouce_ctl.priority_of(&extras1), 25_000);
+        assert_eq!(group1.current_vt(), 25_000);


Does that mean vt increased 25ms? will it weight too large?

We don't care much about the vt's factor, we just care about the proportion between different groups.

components/resource_control/src/resource_group.rs

Connor1996 · 2023-01-12T03:35:16Z

components/resource_control/src/resource_group.rs

+
+    fn calculate_factor(max_quota: u64, quota: u64) -> u64 {
+        if quota > 0 {
+            (max_quota as f64 * 10.0 / quota as f64).round() as u64


added a comment

Connor1996

rest LGTM

Signed-off-by: glorv <glorvs@163.com>

glorv · 2023-01-12T07:03:51Z

@sticnarf Could you please take a look

sticnarf · 2023-01-12T07:24:29Z

src/read_pool.rs

+                extras.set_metadata(group_meta.clone());
+                let task_cell = if let Some(resource_ctl) = resource_ctl {
+                    TaskCell::new(
+                        TrackedFuture::new(ControlledFuture::new(


What do you think about integrating resource controller to the Runner? Then, we needn't another group_meta clone for each future.

I don't find a good way to do so. Because typically, when scheduling a task, the system may need to generate a priority value for it and update some internal state based on the task's consuming resources. It's hard to do the second step in the runner. In this implement, as we only track the cpu time usage, it's possible to support in the runner. In the future, if we want to track more resources, e.g. reading bytes, it's seems hard to do it in the runner.

In this implement, as we only track the cpu time usage, it's possible to support in the runner. In the future, if we want to track more resources, e.g. reading bytes, it's seems hard to do it in the runner.

Makes sense.

nolouch

lgtm

Connor1996 · 2023-01-12T07:32:26Z

components/resource_control/src/resource_group.rs

+    }
+
+    fn add_resource_group(&self, name: Vec<u8>, ru_quota: u64) {
+        let mut max_ru_quota = self.max_ru_quota.lock().unwrap();


why not use atomic

Because when triggering adjust_all_resource_group_factors we need to ensure the max ru quota is not changed. Since we don't assume that the CRUD of resource group happens very often, use a mutex can simple the logic.
If we use atomic here, when there are too new resource groups calling add at the same time and each of them triggers adjust all factors, we can ensure the later (which is large) takes effect for all groups.

Added a comment to explain this mutex.

Signed-off-by: glorv <glorvs@163.com>

HuSharp · 2023-01-12T08:27:00Z

/test

Connor1996

LGTM

HuSharp · 2023-01-12T10:12:07Z

/test

…roup

…ource_group

glorv · 2023-01-12T11:06:34Z

/test

Connor1996 · 2023-01-12T11:32:24Z

/merge

ti-chi-bot · 2023-01-12T11:32:25Z

@Connor1996: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Connor1996 · 2023-01-13T01:01:22Z

/merge

ti-chi-bot · 2023-01-13T01:01:23Z

@Connor1996: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot · 2023-01-13T01:01:25Z

This pull request has been accepted and is ready to merge.

Commit hash: f866d91

add resource group for the read path

0c2f846

Signed-off-by: glorv <glorvs@163.com>

ti-chi-bot added release-note do-not-merge/needs-triage-completed size/XXL labels Dec 28, 2022

glorv requested review from Connor1996 and nolouch December 28, 2022 16:16

fix clippy

2190d5f

Signed-off-by: glorv <glorvs@163.com>

HuSharp reviewed Dec 29, 2022

View reviewed changes

glorv added 2 commits December 29, 2022 15:47

fix test

2beb118

Signed-off-by: glorv <glorvs@163.com>

update yatp

7a62f72

Signed-off-by: glorv <glorvs@163.com>

Connor1996 reviewed Dec 29, 2022

View reviewed changes

ti-chi-bot added the needs-rebase label Dec 30, 2022

Merge branch 'master' of https://github.com/tikv/tikv into resource_g…

8815643

…roup

ti-chi-bot removed do-not-merge/needs-triage-completed needs-rebase labels Jan 3, 2023

BornChanger reviewed Jan 3, 2023

View reviewed changes

simplify some code

b44b022

Signed-off-by: glorv <glorvs@163.com>

BornChanger reviewed Jan 3, 2023

View reviewed changes

ti-chi-bot added the needs-rebase label Jan 4, 2023

BornChanger reviewed Jan 5, 2023

View reviewed changes

nolouch mentioned this pull request Jan 6, 2023

Global Resource Quota Control pingcap/tidb#38825

Closed

41 tasks

nolouch reviewed Jan 9, 2023

View reviewed changes

update kv proto

814bc33

Signed-off-by: glorv <glorvs@163.com>

nolouch reviewed Jan 9, 2023

View reviewed changes

Connor1996 reviewed Jan 12, 2023

View reviewed changes

components/resource_control/src/resource_group.rs Outdated Show resolved Hide resolved

Connor1996 reviewed Jan 12, 2023

View reviewed changes

resolve comments

171d7f6

Signed-off-by: glorv <glorvs@163.com>

glorv force-pushed the resource_group branch from 983e6b2 to 171d7f6 Compare January 12, 2023 07:03

sticnarf reviewed Jan 12, 2023

View reviewed changes

nolouch approved these changes Jan 12, 2023

View reviewed changes

ti-chi-bot added the status/LGT1 Status: PR - There is already 1 approval label Jan 12, 2023

Connor1996 reviewed Jan 12, 2023

View reviewed changes

add a comment about mutex

0afce74

Signed-off-by: glorv <glorvs@163.com>

Merge branch 'master' into resource_group

5b7c20d

Connor1996 approved these changes Jan 12, 2023

View reviewed changes

ti-chi-bot added status/LGT2 Status: PR - There are already 2 approvals and removed status/LGT1 Status: PR - There is already 1 approval labels Jan 12, 2023

sticnarf approved these changes Jan 12, 2023

View reviewed changes

glorv added 2 commits January 12, 2023 18:35

Merge branch 'master' of https://github.com/tikv/tikv into resource_g…

bc3c5ec

…roup

Merge branch 'resource_group' of ssh://github.com/glorv/tikv into res…

f866d91

…ource_group

ti-chi-bot added the status/can-merge Status: Can merge to base branch label Jan 13, 2023

ti-chi-bot merged commit 2daa168 into tikv:master Jan 13, 2023

ti-chi-bot added this to the Pool milestone Jan 13, 2023

glorv deleted the resource_group branch April 25, 2023 02:37

	(self.total_ru_quota / ru_settings * 10.0) as u64
	u64::max(self.total_ru_quota / ru_settings * 10.0) as u64, 1_u64)

		@@ -750,6 +752,11 @@ impl<E: Engine, L: LockManager, F: KvFormat> Storage<E, L, F> {
		const CMD: CommandKind = CommandKind::batch_get_command;
		// all requests in a batch have the same region, epoch, term, replica_read

		@@ -1657,6 +1673,11 @@ impl<E: Engine, L: LockManager, F: KvFormat> Storage<E, L, F> {
		const CMD: CommandKind = CommandKind::raw_batch_get_command;
		// all requests in a batch have the same region, epoch, term, replica_read

*: add resource group for the read path #14001

*: add resource group for the read path #14001

Conversation

glorv commented Dec 28, 2022 • edited

What is changed and how it works?

Related changes

Check List

Release note

ti-chi-bot commented Dec 28, 2022 • edited

glorv commented Dec 28, 2022

glorv commented Dec 28, 2022

HuSharp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glorv commented Dec 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glorv Jan 10, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Connor1996 left a comment

Choose a reason for hiding this comment

glorv commented Jan 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nolouch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuSharp commented Jan 12, 2023

Connor1996 left a comment

Choose a reason for hiding this comment

HuSharp commented Jan 12, 2023

glorv commented Jan 12, 2023

Connor1996 commented Jan 12, 2023

ti-chi-bot commented Jan 12, 2023

Connor1996 commented Jan 13, 2023

ti-chi-bot commented Jan 13, 2023

ti-chi-bot commented Jan 13, 2023

glorv commented Dec 28, 2022 •

edited

ti-chi-bot commented Dec 28, 2022 •

edited

glorv Jan 10, 2023 •

edited