Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize PLAIN values decoders in parquet reader #15308

Merged
merged 7 commits into from
Dec 21, 2022

Conversation

raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Dec 6, 2022

Description

Optimize PLAIN values decoders in parquet reader

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive, Hudi, Delta, Iceberg
* Improve performance of reading parquet files. ({issue}`15308`)

Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@raunaqmorarka
Copy link
Member Author

Unpartitioned: TPCH improved by 23%, TPCDS improved by 16%
Parquet PLAIN optimized decoders unpartitioned 1k.pdf

Partitioned: TPCH improved by 16%, TPCDS improved by 10%
Parquet PLAIN optimized decoders partitioned 1k.pdf

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor comments.

Math.round(
// number of base-10 digits
Math.floor(Math.log10(
Math.pow(2, 8 * numBytes - 1) - 1)))); // max value stored in numBytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why - 1 in 8 * numBytes - 1 ? That deserves a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recollect why atm :)
I've added a comment and separate commit to clarify that this code comes from org.apache.parquet.schema.Types.BasePrimitiveBuilder.maxPrecision and it hasn't been changed (only moved and re-used in another test) in this PR.

raunaqmorarka and others added 6 commits December 20, 2022 11:40
Benchmark                               (byteArrayLength)   Mode  Cnt  Score Before    Score After      Units
BenchmarkShortDecimalColumnReader.read                  1  thrpt   20  40.981 ± 0.507  615.721 ± 46.383	ops/s
BenchmarkShortDecimalColumnReader.read                  2  thrpt   20  37.989 ± 2.349  410.337 ± 22.499	ops/s
BenchmarkShortDecimalColumnReader.read                  3  thrpt   20  40.601 ± 2.036  315.921 ±  4.257	ops/s
BenchmarkShortDecimalColumnReader.read                  4  thrpt   20  32.151 ± 4.108  248.169 ±  6.893	ops/s
BenchmarkShortDecimalColumnReader.read                  5  thrpt   20  36.531 ± 0.672  218.322 ± 14.917	ops/s
BenchmarkShortDecimalColumnReader.read                  6  thrpt   20  35.485 ± 0.811  182.155 ±  8.835	ops/s
BenchmarkShortDecimalColumnReader.read                  7  thrpt   20  31.884 ± 2.041  158.451 ± 11.035	ops/s
BenchmarkShortDecimalColumnReader.read                  8  thrpt   20  34.153 ± 1.703  145.576 ± 15.406	ops/s

Benchmark                               Mode  Cnt  Score Before    Score After     Units
BenchmarkLongDecimalColumnReader.read  thrpt   20  32.412 ± 1.376  42.309 ± 2.181  ops/s

Co-authored-by: Raunaq Morarka <raunaqmorarka@gmail.com>
Decoding of short decimals from fixed length byte arrays is
further optimized by reading longs from input stream

Benchmark                               (byteArrayLength)   Mode  Cnt  Score Before      Score After      Units
BenchmarkShortDecimalColumnReader.read                  1  thrpt   10  488.708 ± 34.142  835.392 ± 18.137 ops/s
BenchmarkShortDecimalColumnReader.read                  2  thrpt   10  349.453 ± 44.897  740.347 ± 26.838 ops/s
BenchmarkShortDecimalColumnReader.read                  3  thrpt   10  301.128 ± 16.122  628.967 ± 19.039 ops/s
BenchmarkShortDecimalColumnReader.read                  4  thrpt   10  228.342 ± 25.225  553.662 ± 46.802 ops/s
BenchmarkShortDecimalColumnReader.read                  5  thrpt   10  204.246 ± 21.218  461.545 ± 27.278 ops/s
BenchmarkShortDecimalColumnReader.read                  6  thrpt   10  188.714 ±  6.309  419.211 ± 15.646 ops/s
BenchmarkShortDecimalColumnReader.read                  7  thrpt   10  134.187 ± 17.130  382.849 ± 41.732 ops/s
BenchmarkShortDecimalColumnReader.read                  8  thrpt   10  121.228 ± 16.433  352.332 ±  5.156 ops/s

Co-authored-by: Krzysztof Skrzypczynski <krzysztof.skrzypczynski@starburstdata.com>
Benchmark                         (encoding)  (positionLength)                             (type)   Mode  Cnt  Score Before       Score After        Units
BenchmarkBinaryColumnReader.read       PLAIN    VARIABLE_0_100                          UNBOUNDED  thrpt   10     7.747 ±  0.898    26.172 ±   0.649 ops/s
BenchmarkBinaryColumnReader.read       PLAIN    VARIABLE_0_100          VARCHAR_ASCII_BOUND_EXACT  thrpt   10     7.789 ±  0.745    25.133 ±   2.639 ops/s
BenchmarkBinaryColumnReader.read       PLAIN    VARIABLE_0_100              CHAR_ASCII_BOUND_HALF  thrpt   10     7.288 ±  0.267    23.284 ±   0.305 ops/s
BenchmarkBinaryColumnReader.read       PLAIN    VARIABLE_0_100  CHAR_BOUND_HALF_PADDING_SOMETIMES  thrpt   10     7.013 ±  0.352    23.091 ±   0.332 ops/s
BenchmarkBinaryColumnReader.read       PLAIN   VARIABLE_0_1000                          UNBOUNDED  thrpt   10  1591.392 ± 34.505  6363.201 ± 530.036 ops/s
BenchmarkBinaryColumnReader.read       PLAIN   VARIABLE_0_1000          VARCHAR_ASCII_BOUND_EXACT  thrpt   10  1583.339 ± 21.816  6348.760 ± 269.550 ops/s
BenchmarkBinaryColumnReader.read       PLAIN   VARIABLE_0_1000              CHAR_ASCII_BOUND_HALF  thrpt   10  1364.905 ± 14.346  4033.757 ± 257.096 ops/s
BenchmarkBinaryColumnReader.read       PLAIN   VARIABLE_0_1000  CHAR_BOUND_HALF_PADDING_SOMETIMES  thrpt   10  1358.043 ± 16.595  4253.996 ± 185.930 ops/s
BenchmarkBinaryColumnReader.read       PLAIN          FIXED_10                          UNBOUNDED  thrpt   10    13.759 ±  0.255    80.423 ±   1.798 ops/s
BenchmarkBinaryColumnReader.read       PLAIN          FIXED_10          VARCHAR_ASCII_BOUND_EXACT  thrpt   10    13.577 ±  0.453    47.512 ±   2.130 ops/s
BenchmarkBinaryColumnReader.read       PLAIN          FIXED_10              CHAR_ASCII_BOUND_HALF  thrpt   10    11.307 ±  0.514    44.744 ±   1.693 ops/s
BenchmarkBinaryColumnReader.read       PLAIN          FIXED_10  CHAR_BOUND_HALF_PADDING_SOMETIMES  thrpt   10    11.467 ±  0.310    39.829 ±   0.967 ops/s
BenchmarkBinaryColumnReader.read       PLAIN         FIXED_100                          UNBOUNDED  thrpt   10   917.125 ± 47.660  4938.964 ± 332.846 ops/s
BenchmarkBinaryColumnReader.read       PLAIN         FIXED_100          VARCHAR_ASCII_BOUND_EXACT  thrpt   10   905.055 ±  9.226  1704.363 ±  51.928 ops/s
BenchmarkBinaryColumnReader.read       PLAIN         FIXED_100              CHAR_ASCII_BOUND_HALF  thrpt   10   746.498 ± 18.657  2644.755 ±  87.034 ops/s
BenchmarkBinaryColumnReader.read       PLAIN         FIXED_100  CHAR_BOUND_HALF_PADDING_SOMETIMES  thrpt   10   748.867 ± 15.252  2598.540 ± 181.460 ops/s

Co-authored-by: Raunaq Morarka <raunaqmorarka@gmail.com>
@raunaqmorarka raunaqmorarka merged commit 1802a8f into trinodb:master Dec 21, 2022
@raunaqmorarka raunaqmorarka deleted the pqr-plain branch December 21, 2022 03:27
@github-actions github-actions bot added this to the 404 milestone Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants