Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upuse tokio threadpool and thread local metrics for readpool #4486
Conversation
This comment has been minimized.
This comment has been minimized.
|
Hi contributor, thanks for your PR. This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically. |
| // Keep running stream producer | ||
| cpu_future.forget(); | ||
| // cpu_future.forget(); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
hicqu
Apr 11, 2019
Contributor
You can remove the line directly. It's OK because spawn has polled it internally. BTW I prefer to write
self.read_pool.spawn(...)?;
Ok(rx.then(|r| r.unwrap()))to make it more clear that spawn returns a Result<()>.
| assert_eq!(rx.recv().unwrap(), Ok(7)); | ||
| assert_eq!(rx.recv().unwrap(), Ok(4)); | ||
| // the recv order maybe: "Ok(2)Ok(4)Ok(7)Ok(3)" or “Ok(2)Ok(3)Ok(4)Ok(7)” or “Ok(2)Ok(4)Ok(3)Ok(7)” | ||
| print!("{:?}", rx.recv().unwrap()); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
fredchenbj
Apr 5, 2019
Author
Contributor
Before the recv order was certainly Ok(2)Ok(3)Ok(7)Ok(4), but now it's order changes every runs. So I am not sure whether it is a problem.
This comment has been minimized.
This comment has been minimized.
breeswish
Apr 5, 2019
Member
Then let's not check the recv order any more. Let's only check whether or not full is returned, since futurepool already has complete tests. This is a work-stealing pool and the scheduling order is not as predictable as the previous one.
| @@ -35,21 +35,21 @@ mod endpoint; | |||
| mod error; | |||
| pub mod local_metrics; | |||
| mod metrics; | |||
| mod readpool_context; | |||
| mod read_pool_impl; | |||
This comment has been minimized.
This comment has been minimized.
| } | ||
|
|
||
| #[inline] | ||
| fn thread_local_flush(pd_sender: &FutureScheduler<PdTask>) { |
This comment has been minimized.
This comment has been minimized.
siddontang
Apr 5, 2019
Contributor
I think using thread_local everywhere is long and redundant. If we want to represent thread_local, mostly, we can use tls instead.
This comment has been minimized.
This comment has been minimized.
| } | ||
| } | ||
|
|
||
| /// Tried to trigger a tick in current thread. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Thanks @fredchenbj It is a very cool feature. |
This comment has been minimized.
This comment has been minimized.
|
I think we should do more benchmarks /cc @breeswish please help @fredchenbj do some |
This comment has been minimized.
This comment has been minimized.
|
after this, I even think we can remove another threadpool, we can use future::lazy to wrap the task and so we can unify the thread pool. But we should also do the benchmark, IMO, tokio thread pool has a better performance than our thread pool @breeswish Another thing is to support dynamically changing thread number in the pool, but we must be careful about this, because now we will collect thread metrics and use thread ID as a label value. Dynamic thread means we may send too many label values to Prometheus. So maybe for the thread pool, we can use thread name instead of thread ID. /cc @overvenus |
|
Thanks a lot! Mostly fine. How about the metrics? Have you checked that they are working as intended? |
| .future_execute(priority, move |ctxd| { | ||
| tracker.attach_ctxd(ctxd); | ||
| .spawn_handle(priority, move || { | ||
| tracker.init_current_stage(); |
This comment has been minimized.
This comment has been minimized.
breeswish
Apr 5, 2019
Member
We can now mark state as initialized when tracker is built, so that this line doesn't need any more.
| ReadPoolContext::new(pd_worker.scheduler()) | ||
| }); | ||
| let pool = | ||
| coprocessor::ReadPoolImpl::build_read_pool(read_pool_cfg, pd_worker.scheduler(), "cop-fix"); |
This comment has been minimized.
This comment has been minimized.
breeswish
Apr 5, 2019
Member
Is the name really important? I guess most of time default name should be enough because the rest of the usage are in tests.
This comment has been minimized.
This comment has been minimized.
| storage::ReadPoolContext::new(pd_worker.scheduler()) | ||
| }); | ||
| let pd_worker = FutureWorker::new("test-pd-worker"); | ||
| let storage_read_pool = storage::ReadPoolImpl::build_read_pool( |
This comment has been minimized.
This comment has been minimized.
breeswish
Apr 5, 2019
Member
you may try to replace it using Builder::build_for_test as well. It may work.
| let read_pool = ReadPool::new( | ||
| "readpool", | ||
|
|
||
| let read_pool = ReadPoolImpl::build_read_pool( |
This comment has been minimized.
This comment has been minimized.
breeswish
Apr 5, 2019
Member
For this one maybe we can use Builder::from_config(..).build() (because we don't need on_tick or before_stop in this test). Similar for others.
| static LOCAL_KV_COMMAND_SCAN_DETAILS: RefCell<LocalIntCounterVec> = | ||
| RefCell::new(KV_COMMAND_SCAN_DETAILS.local()); | ||
|
|
||
| static LOCAL_PD_SENDER: RefCell<Option<FutureScheduler<PdTask>>> = |
This comment has been minimized.
This comment has been minimized.
| LOCAL_COPR_EXECUTOR_COUNT.with(|m| m.borrow_mut().flush()); | ||
| } | ||
|
|
||
| pub fn collect(region_id: u64, type_str: &str, metrics: ExecutorMetrics) { |
This comment has been minimized.
This comment has been minimized.
breeswish
Apr 5, 2019
Member
let's rename it to make it more clear. maybe.. thread_local_collect_executor_metrics?
| struct Context; | ||
|
|
||
| impl futurepool::Context for Context {} | ||
|
|
||
| #[test] | ||
| fn test_future_execute() { |
This comment has been minimized.
This comment has been minimized.
| assert_eq!(rx.recv().unwrap(), Ok(7)); | ||
| assert_eq!(rx.recv().unwrap(), Ok(4)); | ||
| // the recv order maybe: "Ok(2)Ok(4)Ok(7)Ok(3)" or “Ok(2)Ok(3)Ok(4)Ok(7)” or “Ok(2)Ok(4)Ok(3)Ok(7)” | ||
| print!("{:?}", rx.recv().unwrap()); |
This comment has been minimized.
This comment has been minimized.
breeswish
Apr 5, 2019
Member
Then let's not check the recv order any more. Let's only check whether or not full is returned, since futurepool already has complete tests. This is a work-stealing pool and the scheduling order is not as predictable as the previous one.
| } | ||
|
|
||
| #[inline] | ||
| fn thread_local_flush(pd_sender: &FutureScheduler<PdTask>) { |
This comment has been minimized.
This comment has been minimized.
| } | ||
| } | ||
|
|
||
| /// Tried to trigger a tick in current thread. |
This comment has been minimized.
This comment has been minimized.
| use crate::coprocessor::dag::executor::ExecutorMetrics; | ||
|
|
||
| thread_local! { | ||
| pub static LOCAL_COPR_REQ_HISTOGRAM_VEC: RefCell<LocalHistogramVec> = |
This comment has been minimized.
This comment has been minimized.
siddontang
Apr 8, 2019
Contributor
oh, we have so many metrics, it is better to use a structure to wrap all so we can only use one thread local var instead? /cc @breeswish
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
siddontang
Apr 9, 2019
Contributor
em, maybe we can do a benchmark, one local struct vs multi local vars
This comment has been minimized.
This comment has been minimized.
Signed-off-by: fredchenbj <cfworking@163.com>
|
Good job! I'm fine with this PR, as long as the metrics are working as intended. |
This comment has been minimized.
This comment has been minimized.
|
please paste your benchmark results too |
…-local-metrics-for-readpool
This comment has been minimized.
This comment has been minimized.
|
/run-integration-tests |
| use prometheus::local::*; | ||
|
|
||
| use crate::coprocessor::dag::executor::ExecutorMetrics; | ||
| pub struct TlsCop { |
This comment has been minimized.
This comment has been minimized.
| pub struct ReadPoolImpl; | ||
|
|
||
| impl ReadPoolImpl { | ||
| #[inline] |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Thanks @fredchenbj Great work!!! PTAL @breeswish @hicqu |
This comment has been minimized.
This comment has been minimized.
|
/run-integration-tests |
| ReadPool::new("store-read", &cfg.readpool.storage.build_config(), || { | ||
| storage::ReadPoolContext::new(pd_sender.clone()) | ||
| }); | ||
| let storage_read_pool = storage::ReadPoolImpl::build_read_pool( |
This comment has been minimized.
This comment has been minimized.
hicqu
Apr 11, 2019
Contributor
Personally I prefer storage::ReadPool. Impl looks like a private thing.
This comment has been minimized.
This comment has been minimized.
fredchenbj
Apr 11, 2019
Author
Contributor
ReadPool had been used, maybe use ReadPoolProducer. Is it ok?
This comment has been minimized.
This comment has been minimized.
hicqu
Apr 11, 2019
Contributor
Or ReadPoolContext? It just can build a ReadPool and handle some metrics. It's not a ReadPool indeed.
This comment has been minimized.
This comment has been minimized.
breeswish
Apr 11, 2019
Member
It just "derive"s the common ReadPool to create a specialized ReadPool that attached some name, some lifetime hook functions (like on_tick). That's why it was called ReadPoolImpl. Producer or Builder might not be a very good name because it will be confusing for functions like Producer:: tls_collect_executor_metrics.
This comment has been minimized.
This comment has been minimized.
hicqu
Apr 12, 2019
Contributor
Agree with Producer and Builder are not good enough. How about remove the struct?
| pub local_copr_rocksdb_perf_counter: RefCell<LocalIntCounterVec>, | ||
| local_copr_executor_count: RefCell<LocalIntCounterVec>, | ||
| local_copr_get_or_scan_count: RefCell<LocalIntCounterVec>, | ||
| local_cop_flow_stats: RefCell<HashMap<u64, crate::storage::FlowStatistics>>, |
This comment has been minimized.
This comment has been minimized.
hicqu
Apr 11, 2019
Contributor
There are too many RefCells. How about put the struct in a RefCell or Mutex? I think it's more clear.
This comment has been minimized.
This comment has been minimized.
breeswish
Apr 11, 2019
Member
Nice catch. You should arrange them as..
pub struct Xxx {
field: LocalIntCounter,
field_2: LocalIntCounter,
...
}
thread_local! {
pub static TLS_COP_METRICS: RefCell<TlsCop> = ...;
}In this way, we only need to check borrow once when updating multiple fields.
This comment has been minimized.
This comment has been minimized.
|
Rest LGTM. Thank you very much! |
This comment has been minimized.
This comment has been minimized.
|
Friendly ping @siddontang @breeswish @hicqu |
This comment has been minimized.
This comment has been minimized.
|
LGTM. |
Signed-off-by: fredchenbj <cfworking@163.com>
This comment has been minimized.
This comment has been minimized.
|
PTAL @breeswish |
90c8280
to
4338a7c
This comment has been minimized.
This comment has been minimized.
|
ping @siddontang @breeswish @hicqu , please take a look. |
|
LGTM PTAL @hicqu @breeswish |
* *:use tokio-threadpool and thread local metrics in Storage Signed-off-by: Breezewish <breezewish@pingcap.com>
fredchenbj commentedApr 5, 2019
What have you changed? (mandatory)
Before uses
futures-cpupoolto implementReadPool, buttokio-threadpoolis faster and more stable under high race condition or workload, so replace it to improve the performance for storage read and coprocessor request. Meanwhile, use thread local variable to replace context struct for metrics.What are the type of the changes? (mandatory)
Improvement (change which is an improvement to an existing feature).
How has this PR been tested? (mandatory)
Unit tests, integration tests, and partial manual tests.
Does this PR affect documentation (docs) update? (mandatory)
No.
Does this PR affect tidb-ansible update? (mandatory)
No.
Refer to a related PR or issue link (optional)
No.
Benchmark result if necessary (optional)
From the pic above, under high wordload and stable qps, the p99 latency of read reduced about 14%, and the p999 latency reduced about 20%.
Add a few positive/negative examples (optional)