Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract LazyBatchColumn #4211

Merged
merged 7 commits into from Feb 17, 2019

Conversation

breezewish
Copy link
Member

@breezewish breezewish commented Feb 14, 2019

What have you changed? (mandatory)

This PR extracts LazyBatchColumn from rows.rs to lazy_column.rs.

Extracted from #3898, based on #4208 and #4209.

What are the type of the changes? (mandatory)

  • Engineering

Signed-off-by: Breezewish <breezewish@pingcap.com>
Signed-off-by: Breezewish <breezewish@pingcap.com>
Signed-off-by: Breezewish <breezewish@pingcap.com>
breezewish and others added 3 commits February 15, 2019 12:15
Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest LGTM.

src/coprocessor/codec/batch/lazy_column.rs Show resolved Hide resolved
Copy link
Member

@AndreMouche AndreMouche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use Chunk(ported from TiDB) directly likeTiDB did? Do you have any performance result which compares Chunk and LazyBatchColumn?

pub enum LazyBatchColumn {
/// Ensure that small datum values (i.e. Int, Real, Time) are stored compactly.
/// Notice that there is an extra 1 byte for datum to store the flag, so there are 9 bytes.
Raw(Vec<SmallVec<[u8; 9]>>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Raw store big datum values like json? What if need returns a JSON column while needn't decode it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Raw store big datum values like json?

Yes. Raw can store anything that doesn't need to be decoded during the process. SmallVec is just a Vec that stores small values inline but also support storing large values in the heap. So it can hold data of any length (just like a normal Vec), but will be highly efficient (compared to Vec) when storing small values.

What if need returns a JSON column while needn't decode it?

It won't be decoded, as answered above,

@breezewish
Copy link
Member Author

Could we use Chunk(ported from TiDB) directly likeTiDB did? Do you have any performance result which compares Chunk and LazyBatchColumn?

The performance difference is too obvious that you even don't need a benchmark to see its difference. To access an element in BatchColumn, it's a single memory access. To access chunk, there is much more cost. Additionally, for DateTime like types, there are deserialization cost at each access so that it is much worse.

@breezewish
Copy link
Member Author

breezewish commented Feb 15, 2019

@AndreMouche Here is the benchmark result:

test coprocessor::codec::chunk::column::tests::bench_chunk_batch_column_1 ... bench: 28 ns/iter (+/- 10)
test coprocessor::codec::chunk::column::tests::bench_chunk_batch_column_2 ... bench: 1 ns/iter (+/- 0)

There is simply 2800% performance difference even when deserialization process is not involved.

    #[bench]
    fn bench_chunk_batch_column_1(b: &mut test::Bencher) {
        let data = vec![
            Datum::Null,
            Datum::I64(-1),
            Datum::I64(12),
            Datum::I64(1024),
        ];
        let field = field_type(FieldTypeTp::Long);
        let mut column = Column::new(&field, data.len());
        for v in &data {
            column.append_datum(v).unwrap();
        }
        b.iter(|| {
            let result =
                test::black_box(&column).get_datum(test::black_box(1), test::black_box(&field));
            result.unwrap();
        });
    }

    #[bench]
    fn bench_chunk_batch_column_2(b: &mut test::Bencher) {
        let mut column = crate::coprocessor::codec::batch::VectorValue::with_capacity(
            4,
            cop_datatype::EvalType::Int,
        );

        column.push_int(None);
        column.push_int(Some(-1));
        column.push_int(Some(12));
        column.push_int(Some(1024));

        let slice = column.as_int_slice();

        b.iter(|| {
            let val = test::black_box(&slice)[test::black_box(1)];
            test::black_box(val);
        });
    }

Copy link
Member

@AndreMouche AndreMouche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@breezewish breezewish merged commit 522b066 into tikv:master Feb 17, 2019
@breezewish breezewish deleted the ___batch_extract/batch_column_2 branch February 17, 2019 15:32
AndreMouche added a commit that referenced this pull request Feb 18, 2019
@AndreMouche
Copy link
Member

I review the bench code again, and I think it's unfair and we should adjust the code like the following:
Result

running 2 tests
test coprocessor::codec::batch::lazy_column::benches::bench_chunk_batch_column_2                       ... bench:           1 ns/iter (+/- 0)
test coprocessor::codec::chunk::column::tests::bench_chunk_batch_column_1                              ... bench:           1 ns/iter (+/- 0)

code

 #[bench]
    fn bench_chunk_batch_column_1(b: &mut test::Bencher) {
        let data = vec![
            Datum::Null,
            Datum::I64(-1),
            Datum::I64(12),
            Datum::I64(1024),
        ];
        let field = field_type(FieldTypeTp::Long);
        let mut column = Column::new(&field, data.len());
        for v in &data {
            column.append_datum(v).unwrap();
        }
        b.iter(|| {
            let result =
                test::black_box(&column).get_i64(1);
                //get_datum(test::black_box(1), test::black_box(&field)); //prviouse code
            result.unwrap();
        });
    }

#[bench]
    fn bench_chunk_batch_column_2(b: &mut test::Bencher) {
        let mut column = crate::coprocessor::codec::batch::VectorValue::with_capacity(
            4,
            cop_datatype::EvalType::Int,
        );

        column.push_int(None);
        column.push_int(Some(-1));
        column.push_int(Some(12));
        column.push_int(Some(1024));

        b.iter(|| {
            let slice = column.as_int_slice();
            let val = test::black_box(&slice)[test::black_box(1)];
            test::black_box(val);
        });
    }

dcalvin pushed a commit to dcalvin/tikv that referenced this pull request Feb 22, 2019
Signed-off-by: Breezewish <breezewish@pingcap.com>
sticnarf pushed a commit to sticnarf/tikv that referenced this pull request Oct 27, 2019
Signed-off-by: Breezewish <breezewish@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants