Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coprocessor: return Chunk encoded data #5461

Merged
merged 28 commits into from Sep 26, 2019

Conversation

@wshwsh12
Copy link
Contributor

commented Sep 14, 2019

What have you changed?

Return chunk-encoded data when the requested EncodeType is TypeArrow.

What is the type of the changes?

  • New feature (a change which adds functionality)
  • Improvement (a change which is an improvement to an existing feature)

How is the PR tested?

Unit test

Does this PR affect documentation (docs) or should it be mentioned in the release notes?

No,

Does this PR affect tidb-ansible?

No.

Refer to a related PR or issue link (optional)

pingcap/tidb#12023

Benchmark result if necessary (optional)

Simple select data. (TPCH 10g , table lineitem)

Type            Data                        Default format      Arrow format        Time delta      Speed delta
Decimal(15,2)   60000000 rows * 4 cols      27.81 sec           10.95 sec           -60%            +153%
Bigint(20)      60000000 rows * 4 cols      10.57 sec           6.56 sec            -40%            +60%
varchar(44)     300000000 rows * 1 col      41.08 sec           41.28 sec           0%              0%

When select data, the TiKV's CPU usage 400%(Default) changes into 800%(Arrow).
The memory usage has no obvious change.

Any examples? (optional)

@wshwsh12 wshwsh12 force-pushed the wshwsh12:arrow_encode branch 2 times, most recently from 8d3545d to e58d501 Sep 14, 2019
@breeswish breeswish added C: Copr and removed C: Server labels Sep 16, 2019
Copy link
Member

left a comment

Thanks!

@@ -404,7 +419,14 @@ impl<SS: 'static> BatchExecutorsRunner<SS> {
.collect_exec_stats(&mut self.exec_stats);

let mut sel_resp = SelectResponse::default();
sel_resp.set_chunks(chunks.into());
match self.config.encode_type {

This comment has been minimized.

Copy link
@breeswish

breeswish Sep 16, 2019

Member

Why not use the same field? It is byte array anyway and it should be suitable for both formats.

This comment has been minimized.

Copy link
@wshwsh12

wshwsh12 Sep 17, 2019

Author Contributor

Now use the different fields can also check the EncodeType. (One of the fields is empty.) If the TiKV is not support the feature, It can also use the right DecodeType in TiDB.
In fact, it can use the same field, but need a flag to get the EncodeType. In future, i will change the protobuf(add a flag and remove the field row_batch_data), and change the code to use the same field.

This comment has been minimized.

Copy link
@breeswish

breeswish Sep 22, 2019

Member

Is row_batch_data a newly added field? Then maybe we can just add a response_format instead of row_batch_data?

This comment has been minimized.

Copy link
@wshwsh12

wshwsh12 Sep 22, 2019

Author Contributor

Yes.

components/tidb_query/src/codec/batch/lazy_column.rs Outdated Show resolved Hide resolved
components/tidb_query/src/codec/batch/lazy_column.rs Outdated Show resolved Hide resolved
components/tidb_query/src/codec/batch/lazy_column.rs Outdated Show resolved Hide resolved
components/tidb_query/src/codec/chunk/column.rs Outdated Show resolved Hide resolved
components/tidb_query/src/codec/data_type/vector.rs Outdated Show resolved Hide resolved
@@ -906,42 +906,47 @@ impl<T: std::io::Write> TimeEncoder for T {}
pub trait TimeEncoder: NumberEncoder {
fn encode_time(&mut self, v: &Time) -> Result<()> {
if !v.is_zero() {
self.encode_u16(v.time.year() as u16)?;
self.encode_u32_le(v.time.hour() as u32)?;

This comment has been minimized.

Copy link
@breeswish
@breeswish

This comment has been minimized.

Copy link
Member

commented Sep 16, 2019

Additionally have you run the integration tests and benchmarks?

wshwsh12 added 3 commits Sep 11, 2019
Signed-off-by: wshwsh12 <793703860@qq.com>
t
Signed-off-by: wshwsh12 <793703860@qq.com>
Signed-off-by: wshwsh12 <793703860@qq.com>
@wshwsh12 wshwsh12 force-pushed the wshwsh12:arrow_encode branch from c2df6ac to 2a69603 Sep 16, 2019
wshwsh12 added 6 commits Sep 17, 2019
Signed-off-by: wshwsh12 <793703860@qq.com>
Signed-off-by: wshwsh12 <793703860@qq.com>
Signed-off-by: wshwsh12 <793703860@qq.com>
Signed-off-by: wshwsh12 <793703860@qq.com>
Signed-off-by: wshwsh12 <793703860@qq.com>
Signed-off-by: wshwsh12 <793703860@qq.com>
@wshwsh12

This comment has been minimized.

Copy link
Contributor Author

commented Sep 17, 2019

@breeswish Update. PTAL again.

@wshwsh12 wshwsh12 requested a review from breeswish Sep 17, 2019
@breeswish

This comment has been minimized.

Copy link
Member

commented Sep 22, 2019

This PR changes a lot of codec part. Could you run integration test against a TiDB PR that enables chunk format by default? So that we can know the correctness of this PR.

Copy link
Member

left a comment

Could you also make some benchmarks? I guess this PR will cause regressions due to decoding to Datum first, which is a known drawback and is accepted, but I want to know how much regressions there will be.

components/tidb_query/src/codec/mysql/time/mod.rs Outdated Show resolved Hide resolved
);
let nanoseconds = 1000 * self.read_u32()?;
let _ = self.read_u16();

This comment has been minimized.

Copy link
@breeswish

breeswish Sep 22, 2019

Member

Why there are fields ignored? What is this field?

This comment has been minimized.

Copy link
@sticnarf

sticnarf Sep 26, 2019

Contributor

It's a padding in Go's struct. Maybe better to have a comment.

components/tidb_query/src/expr/ctx.rs Outdated Show resolved Hide resolved
let year = i32::from(self.read_u16()?);
let buf = self.read_bytes(5)?;
let (month, day, hour, minute, second) = (
let hour = self.read_u32_le()?;

This comment has been minimized.

Copy link
@breeswish

breeswish Sep 22, 2019

Member

Could you provide a reference of TiDB's implementation so that we can cross verify this logic?

@wshwsh12

This comment has been minimized.

Copy link
Contributor Author

commented Sep 22, 2019

This PR changes a lot of codec part. Could you run integration test against a TiDB PR that enables chunk format by default? So that we can know the correctness of this PR.

Of course. TiDB PR: pingcap/tidb#12023 And it passed all integration test.

Signed-off-by: wshwsh12 <793703860@qq.com>
@breeswish

This comment has been minimized.

Copy link
Member

commented Sep 23, 2019

Please resolve #5461 (comment) and provide a benchmark. The rest LGTM.

wshwsh12 added 3 commits Sep 23, 2019
Signed-off-by: wshwsh12 <793703860@qq.com>
fix ci
Signed-off-by: wshwsh12 <793703860@qq.com>
Signed-off-by: wshwsh12 <793703860@qq.com>
wshwsh12 added 3 commits Sep 25, 2019
Signed-off-by: wshwsh12 <793703860@qq.com>
Signed-off-by: wshwsh12 <793703860@qq.com>
@wshwsh12 wshwsh12 requested a review from sticnarf Sep 25, 2019
Signed-off-by: wshwsh12 <793703860@qq.com>
col.append_null().unwrap();
}
Some(val) => {
col.append_f64(f64::from(*val)).unwrap();

This comment has been minimized.

Copy link
@sticnarf

sticnarf Sep 26, 2019

Contributor

Don't we need to check field type to see if this is f32 or f64 here?

This comment has been minimized.

Copy link
@wshwsh12

wshwsh12 Sep 26, 2019

Author Contributor

the check is in the function append_f64

This comment has been minimized.

Copy link
@sticnarf

sticnarf Sep 26, 2019

Contributor

Ah, I see. But like the previous comment, I think we can also move the branch out of loop

Copy link
Contributor

left a comment

the rest lgtm

wshwsh12 added 3 commits Sep 26, 2019
Signed-off-by: wshwsh12 <793703860@qq.com>
Signed-off-by: wshwsh12 <793703860@qq.com>
@wshwsh12 wshwsh12 requested a review from sticnarf Sep 26, 2019
fix typo
Signed-off-by: wshwsh12 <793703860@qq.com>
@wshwsh12 wshwsh12 requested review from sticnarf and breeswish Sep 26, 2019
Copy link
Contributor

left a comment

Great work

let field_type = &schema[offset];
fields.push(field_type.clone());
}
let mut chunk = Chunk::new(&fields, logical_rows.as_ref().len());

This comment has been minimized.

Copy link
@breeswish

breeswish Sep 26, 2019

Member

TODO: We can avoid cloning the field type in next PR.

@breeswish

This comment has been minimized.

Copy link
Member

commented Sep 26, 2019

/run-all-tests tidb=pr/12023

@breeswish

This comment has been minimized.

Copy link
Member

commented Sep 26, 2019

/run-all-tests tidb-test=pr/898

@breeswish

This comment has been minimized.

Copy link
Member

commented Sep 26, 2019

/merge

@sre-bot sre-bot added the S: CanMerge label Sep 26, 2019
@sre-bot

This comment has been minimized.

Copy link
Collaborator

commented Sep 26, 2019

Your auto merge job has been accepted, waiting for 5546

@sre-bot

This comment has been minimized.

Copy link
Collaborator

commented Sep 26, 2019

/run-all-tests

@sre-bot sre-bot merged commit 5548b4e into tikv:master Sep 26, 2019
6 checks passed
6 checks passed
DCO All commits are signed off!
Details
idc-jenkins-ci-tikv/integration-common-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-compatibility-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-cop-push-down-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-ddl-test Jenkins job succeeded.
Details
idc-jenkins-ci/test Jenkins job succeeded.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.