Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Vectorized Evaluation framework #4322

Merged
merged 7 commits into from Apr 2, 2019

Conversation

Projects
None yet
4 participants
@breeswish
Copy link
Member

commented Mar 6, 2019

What have you changed? (mandatory)

This PR adds a vectorized evaluation framework, which is able to evaluate expressions column by column. Also this PR provides some dummy functions to show how function signatures can be written in this framework.

This is extracted from #3898.

What are the type of the changes? (mandatory)

  • New feature (change which adds functionality)

How has this PR been tested? (mandatory)

  • TODO: Add unit test for the framework

@breeswish breeswish added the C: Copr label Mar 6, 2019

Re-extract from upstream
Signed-off-by: Breezewish <breezewish@pingcap.com>

@breeswish breeswish force-pushed the breeswish:____batch_extract/t3/3 branch from 834b3f4 to 8020dd9 Mar 25, 2019

// Maybe we can remove this field in future.
field_type: FieldType,
},
ColumnRef { offset: usize },

This comment has been minimized.

Copy link
@breeswish

breeswish Mar 25, 2019

Author Member

A note for reviewers: We now pass FieldType in schema, so we don't need to store field_type any more.

Split single test function into multiples
Signed-off-by: Breezewish <breezewish@pingcap.com>
assert_eq!(rows, v.len());
for i in 0..rows {
result.push(f(context, payload, &v[i])?);
}

This comment has been minimized.

Copy link
@kennytm

kennytm Mar 25, 2019

Contributor
for value in v {
    result.push(f(context, payload, value)?);
}

This comment has been minimized.

Copy link
@breeswish

breeswish Mar 26, 2019

Author Member

Hmmm. I think this kind of style (including your comment in other unary / binary function) is not as easy to understand as current ones, because they use different syntax for functions serve as the same purpose? While if we access it using index, the syntax looks to be very unified.

This comment has been minimized.

Copy link
@kennytm

kennytm Mar 26, 2019

Contributor

I don't see what this syntax is unifying with, could you elaborate? The for loop with iterator (including .zip) is used quite a lot within TiKV, so I don't think it is really harder to understand.

This comment has been minimized.

Copy link
@breeswish

breeswish Mar 26, 2019

Author Member

The syntax now looks like:

unary function scalar value: for _ in ... { ... f(arg0) }
unary function vector value: for i in ... { ... f(arg0[i]) }
binary function scalar scalar value: for _ in ... { ... f(arg0, arg1) }
binary function scalar vector value: for i in ... { ... f(arg0, arg1[i]) }
binary function vector scalar value: for i in ... { ... f(arg0[i], arg1) }
binary function vector vector value: for i in ... { ... f(arg0[i], arg1[i]) }

which looks pretty much the same and is quite easy to understand the all by only looking at one.

This comment has been minimized.

Copy link
@rleungx

This comment has been minimized.

Copy link
@breeswish

breeswish Mar 27, 2019

Author Member

Actually we disabled this clippy rule for the reason that this rule does not always produce more readable code :)

let rhs = Arg1::borrow_vector_value(rhs);
for i in 0..rows {
result.push(f(context, payload, lhs, &rhs[i])?);
}

This comment has been minimized.

Copy link
@kennytm

kennytm Mar 25, 2019

Contributor

(ditto)

let rhs = Arg1::borrow_scalar_value(rhs);
for i in 0..rows {
result.push(f(context, payload, &lhs[i], rhs)?);
}

This comment has been minimized.

Copy link
@kennytm

kennytm Mar 25, 2019

Contributor

(ditto)

let rhs = Arg1::borrow_vector_value(rhs);
for i in 0..rows {
result.push(f(context, payload, &lhs[i], &rhs[i])?);
}

This comment has been minimized.

Copy link
@kennytm

kennytm Mar 25, 2019

Contributor
for (left, right) in lhs.iter().zip(rhs) {
fn as_ref(&self) -> &VectorValue {
match self {
RpnStackNodeVectorValue::Owned(ref value) => &value,
RpnStackNodeVectorValue::Ref(ref value) => *value,

This comment has been minimized.

Copy link
@kennytm

kennytm Mar 25, 2019

Contributor

Why are we still using the ref keyword 🤔

Show resolved Hide resolved src/coprocessor/dag/rpn_expr/function.rs
assert_eq!(rows, v.len());
for i in 0..rows {
result.push(f(context, payload, &v[i])?);
}

This comment has been minimized.

Copy link
@rleungx

breeswish added some commits Apr 1, 2019

@AndreMouche
Copy link
Member

left a comment

Could we split this PR into two PRs? one for expr_eval.rs one for functions?

Show resolved Hide resolved src/coprocessor/dag/rpn_expr/function.rs
Show resolved Hide resolved src/coprocessor/dag/rpn_expr/types/expr_eval.rs Outdated
},
{
let mut col = LazyBatchColumn::decoded_with_capacity_and_tp(5, EvalType::Int);
col.mut_decoded().push_int(Some(1));

This comment has been minimized.

Copy link
@AndreMouche

AndreMouche Apr 2, 2019

Member

Could we use the loop to make the code more clear?

This comment has been minimized.

Copy link
@breeswish

breeswish Apr 2, 2019

Author Member

What kind of loop are you expecting? Looks like current code is already the most readable one. No branches, no checks, no states.

This comment has been minimized.

Copy link
@AndreMouche

AndreMouche Apr 2, 2019

Member

How about

    for v1,v2 in data{
             col1.mut_decoded().push_real(v1);
             col2.mut_decoded().push_int(v2)
     }

And when we want to add more rows one day, just append in data?

This comment has been minimized.

Copy link
@breeswish

breeswish Apr 2, 2019

Author Member

According to my understanding, the real problem is that we don't have a utility function to push multiple values in VectorValue? If we have such function, then we achieve both simplicity and DRY. Let's do it later because we will have one when aggregation function is ready.

Split tests
Signed-off-by: Breezewish <breezewish@pingcap.com>
@breeswish

This comment has been minimized.

Copy link
Member Author

commented Apr 2, 2019

/run-integration-tests

@AndreMouche
Copy link
Member

left a comment

LGTM

@breeswish

This comment has been minimized.

Copy link
Member Author

commented Apr 2, 2019

/run-integration-tests

@rleungx

rleungx approved these changes Apr 2, 2019

];

// Col0, FnB, Col1, Const0, FnD, Const1, FnC, FnA

This comment has been minimized.

Copy link
@rleungx

rleungx Apr 2, 2019

Member

remove this line?

#[test]
fn test_eval_fail_2() {
/// FnFoo(v) performs v * 2.

This comment has been minimized.

Copy link
@rleungx

rleungx Apr 2, 2019

Member

ditto

}

// FnFoo only accepts 1 parameter but we will give 2.

This comment has been minimized.

Copy link
@rleungx

rleungx Apr 2, 2019

Member

ditto

#[test]
fn test_eval_fail_3() {
// Expects real argument, receives int argument.

This comment has been minimized.

Copy link
@rleungx

rleungx Apr 2, 2019

Member

ditto

@breeswish breeswish merged commit 03108b5 into tikv:master Apr 2, 2019

4 of 5 checks passed

idc-jenkins-ci-tikv/integration-common-test Jenkins job failed
Details
DCO All commits are signed off!
Details
idc-jenkins-ci-tikv/integration-compatibility-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-ddl-test Jenkins job succeeded.
Details
idc-jenkins-ci/test Jenkins job succeeded.
Details

@breeswish breeswish deleted the breeswish:____batch_extract/t3/3 branch Apr 2, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.