-
Notifications
You must be signed in to change notification settings - Fork 3.7k
GH-46708: [C++][Gandiva] Added zero return values for castDECIMAL_utf8 #46709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
…AL_utf8 if in value can not be parsed
|
||
int num_records = 1; | ||
auto invalid_in = MakeArrowArrayUtf8({"1.345"}, {true}); | ||
auto in_batch_1 = arrow::RecordBatch::Make(schema, num_records, {invalid_in}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto in_batch_1 = arrow::RecordBatch::Make(schema, num_records, {invalid_in}); | |
ASSERT_OK_AND_ASSIGN(auto in_batch_1, arrow::RecordBatch::Make(schema, num_records, {invalid_in})); |
BTW, why do we need _1
suffix here? It seems that there is only one record batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_1 was just copy paste, fixed.
arrow::RecordBatch::Make does not return Result<> so ASSERT_OK_AND_ASSIGN should not work, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry.
auto int_literal = TreeExprBuilder::MakeLiteral((int32_t)100); | ||
auto int_literal_multiply = TreeExprBuilder::MakeLiteral((int32_t)10); | ||
auto string_literal = TreeExprBuilder::MakeStringLiteral("foo"); | ||
auto cast_multiply_literal = | ||
TreeExprBuilder::MakeFunction("castDECIMAL", {int_literal_multiply}, decimal_type_10_0); | ||
auto cast_int_literal = | ||
TreeExprBuilder::MakeFunction("castDECIMAL", {int_literal}, decimal_type_38_30); | ||
auto cast_string_func = | ||
TreeExprBuilder::MakeFunction("castDECIMAL", {string_literal}, decimal_type_38_30); | ||
auto multiply_func = | ||
TreeExprBuilder::MakeFunction("multiply", {cast_multiply_literal, cast_int_literal}, decimal_type_38_27); | ||
auto equal_func = TreeExprBuilder::MakeFunction("equal", {multiply_func, cast_string_func}, arrow::boolean()); | ||
auto expr = TreeExprBuilder::MakeExpression(equal_func, res_bool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not really repro sigsegv without that complex expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we don't need to reproduce SIGSEGV here.
Can we check only that out_high
/out_low
are set on error instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Let me try that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like if function failed to parse data, corresponding element will not be set in output array.
@kou can you suggest how do I test that castDecimal returned 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange. Fix works on my machine. Also test was green in this PR and also tried in docker x86 env.
[ RUN ] TestDecimal.TestCastDecimalVarCharInvalidInputInvalidOutput
/arrow2/cpp/src/gandiva/cache.cc:58: Creating gandiva cache with capacity of 5000
/arrow2/cpp/src/gandiva/engine.cc:282: Detected CPU Name : znver2
/arrow2/cpp/src/gandiva/engine.cc:283: Detected CPU Features: [ +prfchw -cldemote +avx +aes +sahf +pclmul -xop +crc32 +xsaves -avx512fp16 -sm4 +sse4.1 -avx512ifma +xsave -avx512pf +sse4.2 -tsxldtrk -ptwrite -widekl -sm3 -invpcid +64bit +xsavec -avx512vpopcntdq +cmov -avx512vp2intersect -avx512cd +movbe -avxvnniint8 -avx512er -amx-int8 -kl -sha512 -avxvnni -rtm +adx +avx2 -hreset -movdiri -serialize -vpclmulqdq -avx512vl -uintr +clflushopt -raoint -cmpccxadd +bmi -amx-tile +sse -gfni -avxvnniint16 -amx-fp16 +xsaveopt +rdrnd -avx512f -amx-bf16 -avx512bf16 -avx512vnni +cx8 -avx512bw +sse3 -pku +fsgsbase +clzero -mwaitx -lwp +lzcnt +sha -movdir64b -wbnoinvd -enqcmd -prefetchwt1 -avxneconvert -tbm -pconfig -amx-complex +ssse3 +cx16 +bmi2 +fma +popcnt -avxifma +f16c -avx512bitalg -rdpru +clwb +mmx +sse2 +rdseed -avx512vbmi2 -prefetchi +rdpid -fma4 -avx512vbmi -shstk -vaes -waitpkg -sgx +fxsr -avx512dq +sse4a]
Thread 1 "gandiva-project" received signal SIGSEGV, Segmentation fault.
0x00007ffff786fdfc in ?? ()
(gdb) bt
#0 0x00007ffff786fdfc in ?? ()
#1 0x00007fffffffd700 in ?? ()
#2 0x0000000108b76b98 in ?? ()
#3 0x0000000c9f2c9cd0 in ?? ()
#4 0x4674edea40000000 in ?? ()
#5 0x4674edea40000000 in ?? ()
#6 0x0000000c9f2c9cd0 in ?? ()
#7 0x85acef8100000000 in ?? ()
#8 0x000004ee2d6d415b in ?? ()
#9 0x00007fffffffd6b0 in ?? ()
#10 0x0000000002064191 in std::_Tuple_impl<0ul, unsigned char**, std::default_delete<unsigned char* []> >::_M_head (__t=...)
at /opt/rh/devtoolset-10/root/usr/include/c++/10/tuple:204
#11 0x000000000203d925 in gandiva::Projector::Evaluate (this=0x8d66a50, batch=..., selection_vector=0x0, pool=0x8768f40 <arrow::global_state+576>, output=0x7fffffffdaa0)
at /arrow2/cpp/src/gandiva/projector.cc:176
#12 0x000000000203d5e5 in gandiva::Projector::Evaluate (this=0x8d66a50, batch=..., pool=0x8768f40 <arrow::global_state+576>, output=0x7fffffffdaa0)
at /arrow2/cpp/src/gandiva/projector.cc:154
#13 0x0000000001f9ea88 in gandiva::TestDecimal_TestCastDecimalVarCharInvalidInputInvalidOutput_Test::TestBody (this=0x8aedb20)
at /arrow2/cpp/src/gandiva/tests/decimal_test.cc:1207
If you have repro even with my fix. Can you please share commands how do I repro it. What is the env?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated Issue with some additional information. #46708
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could reproduce this crash and confirm that this fix this crash by #46765.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry your comment is not clear. Do you say this fix is fine? Will you approve it then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I still don't understand why this case is crashed. But I'm OK with this change.
Fixed PR comments
@github-actions autotune |
fixed formatting
|
||
auto status = Projector::Make(schema, {expr}, TestConfiguration(), &projector); | ||
EXPECT_TRUE(status.ok()) << status.message(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto status = Projector::Make(schema, {expr}, TestConfiguration(), &projector); | |
EXPECT_TRUE(status.ok()) << status.message(); | |
ASSERT_OK(Projector::Make(schema, {expr}, TestConfiguration(), &projector)); |
auto int_literal = TreeExprBuilder::MakeLiteral((int32_t)100); | ||
auto int_literal_multiply = TreeExprBuilder::MakeLiteral((int32_t)10); | ||
auto string_literal = TreeExprBuilder::MakeStringLiteral("foo"); | ||
auto cast_multiply_literal = | ||
TreeExprBuilder::MakeFunction("castDECIMAL", {int_literal_multiply}, decimal_type_10_0); | ||
auto cast_int_literal = | ||
TreeExprBuilder::MakeFunction("castDECIMAL", {int_literal}, decimal_type_38_30); | ||
auto cast_string_func = | ||
TreeExprBuilder::MakeFunction("castDECIMAL", {string_literal}, decimal_type_38_30); | ||
auto multiply_func = | ||
TreeExprBuilder::MakeFunction("multiply", {cast_multiply_literal, cast_int_literal}, decimal_type_38_27); | ||
auto equal_func = TreeExprBuilder::MakeFunction("equal", {multiply_func, cast_string_func}, arrow::boolean()); | ||
auto expr = TreeExprBuilder::MakeExpression(equal_func, res_bool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I still don't understand why this case is crashed. But I'm OK with this change.
auto int_literal = TreeExprBuilder::MakeLiteral(static_cast<int32_t>(100)); | ||
auto int_literal_multiply = TreeExprBuilder::MakeLiteral(static_cast<int32_t>(10)); | ||
auto string_literal = TreeExprBuilder::MakeStringLiteral("foo"); | ||
auto cast_multiply_literal = TreeExprBuilder::MakeFunction( | ||
"castDECIMAL", {int_literal_multiply}, decimal_type_10_0); | ||
auto cast_int_literal = | ||
TreeExprBuilder::MakeFunction("castDECIMAL", {int_literal}, decimal_type_38_30); | ||
auto cast_string_func = | ||
TreeExprBuilder::MakeFunction("castDECIMAL", {string_literal}, decimal_type_38_30); | ||
auto multiply_func = TreeExprBuilder::MakeFunction( | ||
"multiply", {cast_multiply_literal, cast_int_literal}, decimal_type_38_27); | ||
auto equal_func = TreeExprBuilder::MakeFunction( | ||
"equal", {multiply_func, cast_string_func}, arrow::boolean()); | ||
auto expr = TreeExprBuilder::MakeExpression(equal_func, res_bool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment why we need to use this expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment in code.
equal(multiply(castDecimal(10), castDecimal(100)), castDECIMAL("foo"))
This is minimal expression I have found to be able to reproduce SIGSEGV. If I remove elements from it crash won't happen.
I would like to make short test with just castDECIMAL but in case of parse error I can't extract returned 0 values in test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit c97e21a. There were 68 benchmark results with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 10 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <denis.tarasyuk@dremio.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <denis.tarasyuk@dremio.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <denis.tarasyuk@dremio.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <denis.tarasyuk@dremio.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…AL_utf8 (apache#46709) ### Rationale for this change castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector. ### What changes are included in this PR? Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed. Added corresponding test that reproduces SIGSEGV in projector. ### Are these changes tested? Yes ### Are there any user-facing changes? No * GitHub Issue: apache#46708 Authored-by: DenisTarasyuk <denis.tarasyuk@dremio.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Rationale for this change
castDECIMAL_utf8 has undefined behavior if input value can not be parsed, which causes SIGSEGV for some expressions in Projector.
What changes are included in this PR?
Setting 0 to out_high, out_low in function castDECIMAL_utf8 to have those values initialised if input value can not be parsed.
Added corresponding test that reproduces SIGSEGV in projector.
Are these changes tested?
Yes
Are there any user-facing changes?
No