Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch (raw) get requests to same region across batch commands #5598

Merged
merged 45 commits into from Oct 29, 2019

Conversation

@tabokie
Copy link
Member

tabokie commented Oct 9, 2019

What have you changed?

  • Batch get / raw-get requests with same priority to the same region.

To avoid request timeout due to batch too large or short of timeslice, add one thread of interval timer and periodically issues a forced submit.

This change will reduce readpool tasking pressure and in turn cpu usage. The effect is relevent to gRPC connection count (contribute to qps in one stream) and batch wait duration.

What is the type of the changes?

  • Improvement (a change which is an improvement to an existing feature)

How is the PR tested?

  • Unit test

Does this PR affect documentation (docs) or should it be mentioned in the release notes?

No

Does this PR affect tidb-ansible?

Will do

Refer to a related PR or issue link (optional)

This PR is split from #5382

Benchmark result if necessary (optional)

improve 17% qps for oltp_point_select bench (grpc-connection-count=3, request-batch-wait-duration="1ms") without compromising other workloads.

improve 20% qps for oltp_point_select (grpc-connection-count=1 or 2, request-batch-wait-duration="1ms"), with a notable side effect on other workloads.

This PR has seen some major performance speed-ups on tikv master since it first started (#5363, #5654 ). Now request batch is only feasible under higher perssure.

Here is a result from sysbench oltp_point_select 1024 thread, on 1 tidb 1 tikv each on seperate server with 40 core. (with tidb point get grpc conn of 4)

before batch, tikv cpu ~ 900%

SQL statistics:
queries performed:
read: 82163942
write: 0
other: 0
total: 82163942
transactions: 82163942 (228121.25 per sec.)
queries: 82163942 (228121.25 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)

General statistics:
total time: 360.1744s
total number of events: 82163942

Latency (ms):
min: 0.23
avg: 4.49
max: 49.41
95th percentile: 7.17
sum: 368591486.94

Threads fairness:
events (avg/stddev): 80238.2246/61.32
execution time (avg/stddev): 359.9526/0.00

after batch, tikv cpu ~ 550%

SQL statistics:
queries performed:
read: 91619058
write: 0
other: 0
total: 91619058
transactions: 91619058 (254370.46 per sec.)
queries: 91619058 (254370.46 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)

General statistics:
total time: 360.1777s
total number of events: 91619058

Latency (ms):
min: 0.33
avg: 4.02
max: 45.34
95th percentile: 7.30
sum: 368587682.66

Threads fairness:
events (avg/stddev): 89471.7363/113.70
execution time (avg/stddev): 359.9489/0.00

Any examples? (optional)

Signed-off-by: tabokie <xy.tao@outlook.com>
@tabokie tabokie requested a review from zhangjinpeng1987 Oct 9, 2019
@tabokie tabokie added the C: Perf label Oct 9, 2019
Signed-off-by: tabokie <xy.tao@outlook.com>
@tabokie tabokie force-pushed the tabokie:batch-get branch from 38ae3d2 to 200993b Oct 9, 2019
cleanup
Signed-off-by: tabokie <xy.tao@outlook.com>
@tabokie tabokie force-pushed the tabokie:batch-get branch from 488332c to 4eb0c21 Oct 9, 2019
@tabokie

This comment has been minimized.

Copy link
Member Author

tabokie commented Oct 9, 2019

/run-all-tests/bench

@mahjonp

This comment has been minimized.

Copy link
Contributor

mahjonp commented Oct 9, 2019

/bench tidb=pr/12569

Err(e) => return future::err(e),
};
for get in gets {
callback(

This comment has been minimized.

Copy link
@zhangjinpeng1987

zhangjinpeng1987 Oct 9, 2019

Member

Can we only use one callback for these get requests to reduce the cost?

This comment has been minimized.

Copy link
@tabokie

tabokie Oct 9, 2019

Author Member

The update roughly achieves -50% gRPC poll CPU and +9% qps in oltp_point_select.

src/storage/mod.rs Outdated Show resolved Hide resolved
let minibatcher = MiniBatcher::new(tx.clone(), self.minibatch_wait_millis);
let stopped = Arc::new(AtomicBool::new(false));
let minibatcher = Arc::new(Mutex::new(minibatcher));
if self.minibatch_wait_millis > 0 {

This comment has been minimized.

Copy link
@zhangjinpeng1987
@BusyJay

This comment has been minimized.

Copy link
Contributor

BusyJay commented Oct 9, 2019

Can you add some comments to help reviewer better understand the code?

Signed-off-by: tabokie <xy.tao@outlook.com>
Copy link
Contributor

BusyJay left a comment

Why doing batch inside the kv service instead of storage?

src/server/service/kv.rs Outdated Show resolved Hide resolved
src/server/service/kv.rs Show resolved Hide resolved
@zhangjinpeng1987

This comment has been minimized.

Copy link
Member

zhangjinpeng1987 commented Oct 9, 2019

/bench tidb=pr/12569

@tabokie

This comment has been minimized.

Copy link
Member Author

tabokie commented Oct 9, 2019

Why doing batch inside the kv service instead of storage?

@BusyJay Currently the batch is collected within the lifetime of one batch_commands stream, and batch response depends on the liveness of this stream. To batch at a lower level (and to batch cross threads), each command must held their own reference to the message channel, and we have to make a thread-safe batcher, which I think is a bit costly. The downside is batch effect is too dependent on grpc connection (high connection leads to scattered batch).

Signed-off-by: tabokie <xy.tao@outlook.com>
@sre-bot

This comment has been minimized.

Copy link
Collaborator

sre-bot commented Oct 9, 2019

tidb.grpc-connection-count=1, batch.timeout=0ms

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: be6163c823285eb7bd048c8d0377c7b05dfd464e
+++ tidb: a387fbf23e26f8f0dc6b140489d04860bc7ba942
--- tikv: 00b0234fecbe97f53700b125ce1f5793ba88531b
+++ tikv: 4eb0c21f0c91f1eefad69bbcc6fc743e3c3bf7a3
pd: 4acaa8c715d40a07f521dd85ef7dcd4118064289
================================================================================
test-1: < oltp_point_select >
    * QPS : 64453.41 ± 2.2211% (std=921.42) delta: -18.37% (p=0.000)
    * AvgMs : 3.97 ± 2.2670% (std=0.06) delta: 22.46%
    * PercentileMs99 : 8.28 ± 0.0000% (std=0.00) delta: 24.14%
            
test-2: < oltp_read_write >
    * QPS : 25687.45 ± 0.8507% (std=150.98) delta: -32.83% (p=0.000)
    * AvgMs : 200.13 ± 0.8644% (std=1.19) delta: 48.86%
    * PercentileMs99 : 312.21 ± 1.0781% (std=2.75) delta: 21.03%
            
test-3: < oltp_insert >
    * QPS : 9036.30 ± 1.8150% (std=119.24) delta: -59.13% (p=0.000)
    * AvgMs : 28.33 ± 1.8002% (std=0.37) delta: 144.70%
    * PercentileMs99 : 46.91 ± 1.1938% (std=0.40) delta: 97.28%
            
test-4: < oltp_update_index >
    * QPS : 7239.76 ± 2.6838% (std=156.80) delta: -58.45% (p=0.000)
    * AvgMs : 35.32 ± 2.8537% (std=0.72) delta: 143.50%
    * PercentileMs99 : 48.78 ± 0.8919% (std=0.43) delta: 53.81%
            
test-5: < oltp_update_non_index >
    * QPS : 12779.03 ± 1.0768% (std=100.24) delta: -57.16% (p=0.000)
    * AvgMs : 20.03 ± 1.0815% (std=0.16) delta: 133.54%
    * PercentileMs99 : 29.19 ± 0.0000% (std=0.00) delta: 38.28%
            

https://perf.pingcap.com

@tabokie

This comment has been minimized.

Copy link
Member Author

tabokie commented Oct 9, 2019

/run-all-tests

@siddontang

This comment has been minimized.

Copy link
Contributor

siddontang commented Oct 9, 2019

oh, seem the perf reduced. @tabokie

Signed-off-by: tabokie <xy.tao@outlook.com>
@tabokie

This comment has been minimized.

Copy link
Member Author

tabokie commented Oct 9, 2019

oh, seem the perf reduced. @tabokie

@siddontang Don't sweat, this PR only batch get request, should focus on oltp_point_select. As for oltp_point_select, the batch timeout is set to default 0ms in that run, I don't expect a performance gain considering gRPC conn is set to 1.

@tabokie

This comment has been minimized.

Copy link
Member Author

tabokie commented Oct 9, 2019

/release make dist_release/bench tidb=pr/12569

@tabokie tabokie force-pushed the tabokie:batch-get branch 3 times, most recently from 2270d98 to f9b9640 Oct 24, 2019
Signed-off-by: tabokie <xy.tao@outlook.com>
@tabokie tabokie force-pushed the tabokie:batch-get branch from f9b9640 to d88c1ed Oct 24, 2019
Copy link
Contributor

Little-Wallace left a comment

LGTM. Maybe we could use incremental_get instead of get in batch commands, after Improve Coprocessor Table Lookup Performance was merged.

Copy link
Member

breeswish left a comment

The rest looks fine to me

src/server/service/kv.rs Outdated Show resolved Hide resolved
src/server/service/kv.rs Outdated Show resolved Hide resolved
src/server/service/kv.rs Outdated Show resolved Hide resolved
src/server/service/kv.rs Outdated Show resolved Hide resolved
src/server/service/kv.rs Outdated Show resolved Hide resolved
src/server/service/kv.rs Show resolved Hide resolved
src/server/service/kv.rs Show resolved Hide resolved
src/server/service/kv.rs Show resolved Hide resolved
src/server/service/kv.rs Outdated Show resolved Hide resolved
Signed-off-by: tabokie <xy.tao@outlook.com>
Copy link
Contributor

Little-Wallace left a comment

LGTM

zhangjinpeng1987 and others added 2 commits Oct 25, 2019
Signed-off-by: tabokie <xy.tao@outlook.com>
@tabokie tabokie requested a review from breeswish Oct 28, 2019
@tikv tikv deleted a comment from sre-bot Oct 28, 2019
minor fix
Signed-off-by: tabokie <xy.tao@outlook.com>
tabokie added 4 commits Oct 29, 2019
Signed-off-by: tabokie <xy.tao@outlook.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
zhangjinpeng1987 and others added 2 commits Oct 29, 2019
Signed-off-by: tabokie <xy.tao@outlook.com>
Copy link
Member

zhangjinpeng1987 left a comment

LGTM

Copy link
Contributor

Little-Wallace left a comment

LGTM

@Little-Wallace Little-Wallace merged commit a9c1081 into tikv:master Oct 29, 2019
4 checks passed
4 checks passed
DCO All commits are signed off!
Details
idc-jenkins-ci-tikv/integration-common-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-copr-test Jenkins job succeeded.
Details
idc-jenkins-ci/test Jenkins job succeeded.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.