New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: add resource group for the read path #14001
Conversation
Signed-off-by: glorv <glorvs@163.com>
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
@BornChanger @HuSharp PTAL |
depending on tikv/yatp#72 |
Signed-off-by: glorv <glorvs@163.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to keep the HTTP interface? If we do, should we put it in the pr that provides the watcher mechanism?
BTW, @nolouch will change resource_manager.proto
, Some changes may need to be made after it.
fn get_ru_setting(setting: &GroupSettings, is_read: bool) -> f64 { | ||
if setting.get_mode() == GroupMode::RuMode { | ||
if is_read { | ||
setting.get_r_u_settings().get_r_r_u().get_tokens() | ||
} else { | ||
setting.get_r_u_settings().get_w_r_u().get_tokens() | ||
} | ||
} else if is_read { | ||
setting.get_resource_settings().get_cpu().get_tokens() | ||
} else { | ||
setting.get_resource_settings().get_io_write().get_tokens() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about match (setting.get_mode(), is_read)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
let group1 = resouce_ctl.resource_group("test".as_bytes()); | ||
assert_eq!(group1.weight, 500); | ||
let group2 = resouce_ctl.resource_group("test2".as_bytes()); | ||
assert_eq!(group1.weight, 250); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like group2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Signed-off-by: glorv <glorvs@163.com>
/test |
} | ||
|
||
pub fn get_resource_group(&self, name: &str) -> Option<Ref<'_, String, GroupSettings>> { | ||
self.resource_groups.get(&name.to_ascii_lowercase()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about restricting or converting on tidb side, it's wasteful to convert it to lowercase everytime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tidb will convert into lowercase then put it to resource manager(PD). That's mean the group name is case-insensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should be only use to update or get the resource group meta, so it is not called very often. All methods under ResourceController
won't do this convert because they are under the hot path.
fn gen_group_priority_factor(&self, setting: &GroupSettings, is_read: bool) -> u64 { | ||
let ru_settings = Self::get_ru_setting(setting, is_read); | ||
// TODO: ensure the result is a valid positive integer | ||
(self.total_ru_quota / ru_settings * 10.0) as u64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self.total_ru_quota / ru_settings * 10.0) as u64 | |
u64::max(self.total_ru_quota / ru_settings * 10.0) as u64, 1_u64) |
Signed-off-by: glorv <glorvs@163.com>
@@ -750,6 +752,11 @@ impl<E: Engine, L: LockManager, F: KvFormat> Storage<E, L, F> { | |||
const CMD: CommandKind = CommandKind::batch_get_command; | |||
// all requests in a batch have the same region, epoch, term, replica_read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure requests in a batch have the same resource group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is depend on the caller/use case. But in tidb I think in most case it it. In the common case, it unusual that two resource group access the same table region at the same time from the same tidb server. Though, in theory, this can happen, but I don't see the exact user case to do so.
@@ -1657,6 +1673,11 @@ impl<E: Engine, L: LockManager, F: KvFormat> Storage<E, L, F> { | |||
const CMD: CommandKind = CommandKind::raw_batch_get_command; | |||
// all requests in a batch have the same region, epoch, term, replica_read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
#[serde(rename_all = "kebab-case")] | ||
pub struct Config { | ||
#[online_config(skip)] | ||
pub enabled: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also add a sample in config-tempalte.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Connor1996 PTAL, do you feel ok with this new config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad name. Should be enable
.
(GroupMode::RuMode, false) => setting.get_r_u_settings().get_w_r_u().get_tokens(), | ||
// TODO: currently we only consider the cpu usage in the read path, we may also take | ||
// io read bytes into account later. | ||
(GroupMode::NativeMode, true) => setting.get_resource_settings().get_cpu().get_tokens(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the kvproto, changed NativeMode to Raw mode which comes from you. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needs to change GroupSettings
to ResourceGroup
:)
Signed-off-by: glorv <glorvs@163.com>
impl ResourceGroupManager { | ||
fn get_ru_setting(rg: &ResourceGroup, is_read: bool) -> f64 { | ||
match (rg.get_mode(), is_read) { | ||
(GroupMode::RuMode, true) => rg.get_r_u_settings().get_r_r_u().get_tokens(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think should use fill_rate
, the token is a state of the token bucket.
fn gen_group_priority_factor(&self, rg: &ResourceGroup, is_read: bool) -> u64 { | ||
let ru_settings = Self::get_ru_setting(rg, is_read); | ||
// TODO: ensure the result is a valid positive integer | ||
(self.total_ru_quota / ru_settings * 10.0) as u64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should total ru quota sum all resource group settings? It's at the global level more reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just need to total ru quota to be a reasonable big enough value so after cast (self.total_ru_quota / ru_settings) as integer, it doesn't loss much precision. I don't find a good way to set this value, so just use a big enough value by default.
|
||
impl GroupPriorityTracker { | ||
fn get_priority(&self, level: usize) -> u64 { | ||
// let level = match priority { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the code?
let mode = if is_ru_mode { | ||
GroupMode::RuMode | ||
} else { | ||
GroupMode::NativeMode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GroupMode::NativeMode | |
GroupMode::RawMode |
let group = GroupPriorityTracker { | ||
weight: priority_factor, | ||
virtual_time: AtomicU64::new(self.last_min_vt.load(Ordering::Acquire)), | ||
vt_delta_for_get, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is write will use field? otherwise no need is_read
field, so will two resource control for one resource group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write and read uses different ResouceController(and GroupPriorityTracker), only their config are the same.
|
||
// TODO: use different threshold for different resource type | ||
// needn't do update if the virtual different is less than 100ms/100KB. | ||
if min_vt + 100_000 >= max_vt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why choose 100ms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is choosed by intuition. As we adjust the vt of all group's each second, we leave some groups' value unchanged if the value is near the biggest value, so we can keep the small schedule advantage between them. This can let the tail lantency a bit stabler.
let mut extras1 = Extras::single_level(); | ||
extras1.set_metadata("test".as_bytes().to_owned()); | ||
assert_eq!(resouce_ctl.priority_of(&extras1), 25_000); | ||
assert_eq!(group1.current_vt(), 25_000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean vt
increased 25ms? will it weight too large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't care much about the vt's factor, we just care about the proportion between different groups.
|
||
fn calculate_factor(max_quota: u64, quota: u64) -> u64 { | ||
if quota > 0 { | ||
(max_quota as f64 * 10.0 / quota as f64).round() as u64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM
Signed-off-by: glorv <glorvs@163.com>
@sticnarf Could you please take a look |
extras.set_metadata(group_meta.clone()); | ||
let task_cell = if let Some(resource_ctl) = resource_ctl { | ||
TaskCell::new( | ||
TrackedFuture::new(ControlledFuture::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about integrating resource controller to the Runner
? Then, we needn't another group_meta
clone for each future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find a good way to do so. Because typically, when scheduling a task, the system may need to generate a priority value for it and update some internal state based on the task's consuming resources. It's hard to do the second step in the runner. In this implement, as we only track the cpu time usage, it's possible to support in the runner. In the future, if we want to track more resources, e.g. reading bytes, it's seems hard to do it in the runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this implement, as we only track the cpu time usage, it's possible to support in the runner. In the future, if we want to track more resources, e.g. reading bytes, it's seems hard to do it in the runner.
Makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
} | ||
|
||
fn add_resource_group(&self, name: Vec<u8>, ru_quota: u64) { | ||
let mut max_ru_quota = self.max_ru_quota.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use atomic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because when triggering adjust_all_resource_group_factors
we need to ensure the max ru quota is not changed. Since we don't assume that the CRUD of resource group happens very often, use a mutex can simple the logic.
If we use atomic here, when there are too new resource groups calling add at the same time and each of them triggers adjust all factors, we can ensure the later (which is large) takes effect for all groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to explain this mutex.
Signed-off-by: glorv <glorvs@163.com>
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test |
/test |
/merge |
@Connor1996: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
/merge |
@Connor1996: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: f866d91
|
Signed-off-by: glorv glorvs@163.com
What is changed and how it works?
Issue Number: Ref #13730
What's Changed:
Related changes
pingcap/docs
/pingcap/docs-cn
:Check List
Tests
Side effects
Release note