Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix predicate pushdown for Parquet decimal columns #9338

Merged
merged 1 commit into from
Sep 24, 2021

Conversation

alexjo2144
Copy link
Member

@alexjo2144 alexjo2144 commented Sep 22, 2021

First four commits are from #9326

The overflow detection constructs a BigDecimal which was missing the decimal scale argument. This resulted in the statistics incorrectly detecting an overflow in many cases.

Example:
Decimal(5, 3) has a maximum value of 99.999. Before the fix, any value over 0.099 would result in an overflow. 0.099 would be represented as 99, which is still lower than the max value, but 0.100 is represented as 100, triggering the overflow logic.

@alexjo2144
Copy link
Member Author

alexjo2144 commented Sep 22, 2021

Marking this as a WIP until the base PR is merged

@martint
Copy link
Member

martint commented Sep 22, 2021

Can we separate that last commit from the previous ones? It seems like this is an independent issue and we should be able to fix it without waiting for the other commits.

@alexjo2144 alexjo2144 force-pushed the alexjo/parquet-decimal-pushdown branch from 16af5e6 to 5826ea5 Compare September 23, 2021 14:23
@alexjo2144 alexjo2144 removed the WIP label Sep 23, 2021
@alexjo2144 alexjo2144 force-pushed the alexjo/parquet-decimal-pushdown branch from 5826ea5 to 578b680 Compare September 23, 2021 14:35
@alexjo2144 alexjo2144 force-pushed the alexjo/parquet-decimal-pushdown branch from ca19425 to b4620bf Compare September 23, 2021 15:26
Comment on lines -226 to +271
Slice zero = encodeScaledValue(new BigDecimal("0"), type.getScale());
Slice hundred = encodeScaledValue(new BigDecimal("100"), type.getScale());
Slice zero = unscaledDecimal(BigInteger.valueOf(0L));
Slice hundred = unscaledDecimal(BigInteger.valueOf(100L));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectations here didn't match the code. The encoding is done here using unscaledDecimal https://github.com/trinodb/trino/blob/master/lib/trino-parquet/src/main/java/io/trino/parquet/ParquetTypeUtils.java#L264

@alexjo2144 alexjo2144 force-pushed the alexjo/parquet-decimal-pushdown branch from b4620bf to 8e5ef18 Compare September 23, 2021 15:29
@alexjo2144
Copy link
Member Author

Updated to include tests for long decimals as well

@findepi
Copy link
Member

findepi commented Sep 24, 2021

The overflow detection constructs a BigDecimal which was missing the decimal scale argument. This resulted in the statistics detecting an overflow any time a decimal is present.

this I do not understand

i know the code was not correct, but i don't see yet why it was incorrect in every case.

other than that -- LGTM

@alexjo2144
Copy link
Member Author

every case was a bit of an overstatement. I updated the PR description to be clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants