Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve decoding of short decimals in parquet reader #14260

Merged
merged 3 commits into from Oct 4, 2022

Conversation

raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Sep 22, 2022

Description

Improves performance by using simpler dedicated decoding logic for each possible byte array length

Non-technical explanation

Improves performance of reading short decimal columns from parquet files

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive, Delta, Iceberg
* Improve performance of reading short decimal columns from parquet files. ({issue}`14260`)

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as for the other one? Do we need to introduce feature flag for optimized reader?

@raunaqmorarka
Copy link
Member Author

Same question as for the other one? Do we need to introduce feature flag for optimized reader?

Since this is a targeted improvement within existing code and not a big rewrite, it doesn't need to be behind a new flag.

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this is a good change. It might be better than what's there today, but I think there other less complex ways to gain performance in that area.

To illustrate, here are some benchmarks using bulk warmup mode:

These are the results for the implementation in trunk:

Benchmark                               (byteArrayLength)   Mode  Cnt  Score   Error  Units
BenchmarkShortDecimalColumnReader.read                  1  thrpt    5  7.306 ± 0.138  ops/s
BenchmarkShortDecimalColumnReader.read                  2  thrpt    5  7.291 ± 0.044  ops/s
BenchmarkShortDecimalColumnReader.read                  3  thrpt    5  7.076 ± 0.102  ops/s
BenchmarkShortDecimalColumnReader.read                  4  thrpt    5  6.881 ± 0.029  ops/s
BenchmarkShortDecimalColumnReader.read                  5  thrpt    5  6.891 ± 0.084  ops/s
BenchmarkShortDecimalColumnReader.read                  6  thrpt    5  6.850 ± 0.091  ops/s
BenchmarkShortDecimalColumnReader.read                  7  thrpt    5  6.681 ± 0.100  ops/s
BenchmarkShortDecimalColumnReader.read                  8  thrpt    5  6.706 ± 0.073  ops/s

These are the results for this PR:

Benchmark                               (byteArrayLength)   Mode  Cnt  Score   Error  Units
BenchmarkShortDecimalColumnReader.read                  1  thrpt    5  8.631 ± 0.237  ops/s
BenchmarkShortDecimalColumnReader.read                  2  thrpt    5  8.681 ± 0.169  ops/s
BenchmarkShortDecimalColumnReader.read                  3  thrpt    5  8.491 ± 0.164  ops/s
BenchmarkShortDecimalColumnReader.read                  4  thrpt    5  8.536 ± 0.089  ops/s
BenchmarkShortDecimalColumnReader.read                  5  thrpt    5  8.428 ± 0.171  ops/s
BenchmarkShortDecimalColumnReader.read                  6  thrpt    5  8.465 ± 0.125  ops/s
BenchmarkShortDecimalColumnReader.read                  7  thrpt    5  8.386 ± 0.473  ops/s
BenchmarkShortDecimalColumnReader.read                  8  thrpt    5  8.796 ± 0.129  ops/s

These are the results for a much simpler change: fc84465

Benchmark                               (byteArrayLength)   Mode  Cnt  Score   Error  Units
BenchmarkShortDecimalColumnReader.read                  1  thrpt    5  9.259 ± 0.116  ops/s
BenchmarkShortDecimalColumnReader.read                  2  thrpt    5  9.363 ± 0.130  ops/s
BenchmarkShortDecimalColumnReader.read                  3  thrpt    5  9.040 ± 0.083  ops/s
BenchmarkShortDecimalColumnReader.read                  4  thrpt    5  9.092 ± 0.091  ops/s
BenchmarkShortDecimalColumnReader.read                  5  thrpt    5  8.971 ± 0.118  ops/s
BenchmarkShortDecimalColumnReader.read                  6  thrpt    5  8.739 ± 0.210  ops/s
BenchmarkShortDecimalColumnReader.read                  7  thrpt    5  8.711 ± 0.183  ops/s
BenchmarkShortDecimalColumnReader.read                  8  thrpt    5  8.602 ± 0.133  ops/s

Benchmark                               (byteArrayLength)   Mode  Cnt  Score Before   Score After    Units
BenchmarkShortDecimalColumnReader.read                  1  thrpt   45  4.979 ± 0.097  6.026 ± 0.084  ops/s
BenchmarkShortDecimalColumnReader.read                  2  thrpt   45  4.828 ± 0.187  5.868 ± 0.068  ops/s
BenchmarkShortDecimalColumnReader.read                  3  thrpt   45  4.033 ± 0.247  5.700 ± 0.132  ops/s
BenchmarkShortDecimalColumnReader.read                  4  thrpt   45  4.324 ± 0.147  5.735 ± 0.088  ops/s
BenchmarkShortDecimalColumnReader.read                  5  thrpt   45  4.402 ± 0.156  5.598 ± 0.089  ops/s
BenchmarkShortDecimalColumnReader.read                  6  thrpt   45  4.226 ± 0.178  5.033 ± 0.231  ops/s
BenchmarkShortDecimalColumnReader.read                  7  thrpt   45  4.070 ± 0.148  5.055 ± 0.300  ops/s
BenchmarkShortDecimalColumnReader.read                  8  thrpt   45  3.960 ± 0.198  5.226 ± 0.197  ops/s

Co-authored-by: Martin Traverso <mtraverso@gmail.com>
@raunaqmorarka raunaqmorarka merged commit 7b4079a into trinodb:master Oct 4, 2022
@raunaqmorarka raunaqmorarka deleted the pqr-short branch October 4, 2022 04:56
@github-actions github-actions bot added this to the 399 milestone Oct 4, 2022
@martint martint mentioned this pull request Oct 5, 2022
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

6 participants