New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract LazyBatchColumn #4211
Extract LazyBatchColumn #4211
Conversation
Signed-off-by: Breezewish <breezewish@pingcap.com>
Signed-off-by: Breezewish <breezewish@pingcap.com>
Signed-off-by: Breezewish <breezewish@pingcap.com>
…tch_column_1 Signed-off-by: Breezewish <breezewish@pingcap.com>
…batch_column_2 Signed-off-by: Breezewish <breezewish@pingcap.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use Chunk
(ported from TiDB
) directly likeTiDB
did? Do you have any performance result which compares Chunk
and LazyBatchColumn
?
pub enum LazyBatchColumn { | ||
/// Ensure that small datum values (i.e. Int, Real, Time) are stored compactly. | ||
/// Notice that there is an extra 1 byte for datum to store the flag, so there are 9 bytes. | ||
Raw(Vec<SmallVec<[u8; 9]>>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will Raw
store big datum values like json? What if need returns a JSON column while needn't decode it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will
Raw
store big datum values like json?
Yes. Raw
can store anything that doesn't need to be decoded during the process. SmallVec
is just a Vec that stores small values inline but also support storing large values in the heap. So it can hold data of any length (just like a normal Vec), but will be highly efficient (compared to Vec) when storing small values.
What if need returns a JSON column while needn't decode it?
It won't be decoded, as answered above,
The performance difference is too obvious that you even don't need a benchmark to see its difference. To access an element in BatchColumn, it's a single memory access. To access chunk, there is much more cost. Additionally, for DateTime like types, there are deserialization cost at each access so that it is much worse. |
@AndreMouche Here is the benchmark result: test coprocessor::codec::chunk::column::tests::bench_chunk_batch_column_1 ... bench: 28 ns/iter (+/- 10) There is simply 2800% performance difference even when deserialization process is not involved. #[bench]
fn bench_chunk_batch_column_1(b: &mut test::Bencher) {
let data = vec![
Datum::Null,
Datum::I64(-1),
Datum::I64(12),
Datum::I64(1024),
];
let field = field_type(FieldTypeTp::Long);
let mut column = Column::new(&field, data.len());
for v in &data {
column.append_datum(v).unwrap();
}
b.iter(|| {
let result =
test::black_box(&column).get_datum(test::black_box(1), test::black_box(&field));
result.unwrap();
});
}
#[bench]
fn bench_chunk_batch_column_2(b: &mut test::Bencher) {
let mut column = crate::coprocessor::codec::batch::VectorValue::with_capacity(
4,
cop_datatype::EvalType::Int,
);
column.push_int(None);
column.push_int(Some(-1));
column.push_int(Some(12));
column.push_int(Some(1024));
let slice = column.as_int_slice();
b.iter(|| {
let val = test::black_box(&slice)[test::black_box(1)];
test::black_box(val);
});
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This reverts commit 522b066.
I review the bench code again, and I think it's unfair and we should adjust the code like the following:
code
|
Signed-off-by: Breezewish <breezewish@pingcap.com>
Signed-off-by: Breezewish <breezewish@pingcap.com>
What have you changed? (mandatory)
This PR extracts LazyBatchColumn from
rows.rs
tolazy_column.rs
.Extracted from #3898, based on #4208 and #4209.
What are the type of the changes? (mandatory)