New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize decoders for DELTA_BINARY_PACKED parquet encoding #15850
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
skrzypo987
reviewed
Jan 26, 2023
lib/trino-parquet/src/main/java/io/trino/parquet/ParquetReaderUtils.java
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/reader/decoders/DeltaBinaryPackedDecoders.java
Show resolved
Hide resolved
raunaqmorarka
force-pushed
the
pqr-v2-int
branch
from
January 26, 2023 10:55
1cc8a52
to
31b0c5d
Compare
Added optimized delta decoders for INTEGER and BIGINT trino types Benchmark (bitWidth) Mode Cnt Score before Score after Units BenchmarkIntColumnReader.read 0 thrpt 20 117.071 ± 7.703 865.445 ± 20.106 ops/s BenchmarkIntColumnReader.read 3 thrpt 20 104.166 ± 4.900 268.090 ± 42.779 ops/s BenchmarkIntColumnReader.read 4 thrpt 20 98.350 ± 4.357 320.060 ± 45.229 ops/s BenchmarkIntColumnReader.read 6 thrpt 20 64.612 ± 21.224 302.133 ± 43.548 ops/s BenchmarkIntColumnReader.read 7 thrpt 20 99.094 ± 9.955 281.936 ± 43.971 ops/s BenchmarkIntColumnReader.read 8 thrpt 20 95.858 ± 5.498 285.217 ± 41.485 ops/s BenchmarkIntColumnReader.read 11 thrpt 20 88.263 ± 9.428 270.472 ± 44.151 ops/s BenchmarkIntColumnReader.read 15 thrpt 20 81.247 ± 8.551 267.604 ± 44.837 ops/s BenchmarkIntColumnReader.read 20 thrpt 20 66.282 ± 9.662 207.038 ± 35.746 ops/s BenchmarkIntColumnReader.read 25 thrpt 20 77.153 ± 7.748 209.671 ± 36.137 ops/s BenchmarkIntColumnReader.read 32 thrpt 20 92.607 ± 3.672 395.909 ± 47.766 ops/s Benchmark (bitWidth) Mode Cnt Score before Score after Units BenchmarkLongColumnReader.read 0 thrpt 20 199.566 ± 15.208 806.087 ± 51.283 ops/s BenchmarkLongColumnReader.read 4 thrpt 20 163.637 ± 10.026 579.107 ± 28.904 ops/s BenchmarkLongColumnReader.read 8 thrpt 20 154.460 ± 4.185 538.973 ± 21.534 ops/s BenchmarkLongColumnReader.read 10 thrpt 20 148.364 ± 2.676 513.435 ± 14.800 ops/s BenchmarkLongColumnReader.read 15 thrpt 20 146.103 ± 6.923 514.479 ± 15.324 ops/s BenchmarkLongColumnReader.read 20 thrpt 20 132.407 ± 6.520 442.656 ± 13.898 ops/s BenchmarkLongColumnReader.read 25 thrpt 20 118.700 ± 7.232 421.344 ± 35.498 ops/s BenchmarkLongColumnReader.read 30 thrpt 20 117.756 ± 1.767 404.390 ± 34.178 ops/s BenchmarkLongColumnReader.read 35 thrpt 20 106.358 ± 1.739 318.364 ± 41.330 ops/s BenchmarkLongColumnReader.read 40 thrpt 20 91.588 ± 4.196 346.890 ± 14.496 ops/s BenchmarkLongColumnReader.read 45 thrpt 20 86.491 ± 3.393 322.405 ± 15.961 ops/s BenchmarkLongColumnReader.read 50 thrpt 20 79.182 ± 1.353 308.200 ± 7.586 ops/s BenchmarkLongColumnReader.read 55 thrpt 20 85.522 ± 1.183 296.408 ± 8.309 ops/s BenchmarkLongColumnReader.read 60 thrpt 20 70.403 ± 1.877 276.162 ± 6.587 ops/s BenchmarkLongColumnReader.read 64 thrpt 20 74.248 ± 3.132 524.803 ± 64.244 ops/s Benchmark (size) Mode Cnt Score Error Units BenchmarkReadUleb128Long.readUleb128Long 1000 thrpt 30 56.499 ± 0.621 ops/ms BenchmarkReadUleb128Long.readUleb128Long 10000 thrpt 30 4.820 ± 0.260 ops/ms BenchmarkReadUleb128Long.readUleb128LongLoop 1000 thrpt 30 38.991 ± 2.311 ops/ms BenchmarkReadUleb128Long.readUleb128LongLoop 10000 thrpt 30 3.380 ± 0.278 ops/ms Co-authored-by: Raunaq Morarka <raunaqmorarka@gmail.com>
Benchmark (bitWidth) Mode Cnt Before After Units BenchmarkByteColumnReader.read 0 thrpt 30 171.680 ± 12.559 748.504 ± 6.711 ops/s BenchmarkByteColumnReader.read 1 thrpt 30 193.714 ± 0.804 593.429 ± 28.570 ops/s BenchmarkByteColumnReader.read 2 thrpt 30 182.886 ± 3.220 624.125 ± 56.425 ops/s BenchmarkByteColumnReader.read 3 thrpt 30 185.313 ± 2.019 570.076 ± 41.381 ops/s BenchmarkByteColumnReader.read 4 thrpt 30 175.380 ± 2.142 581.016 ± 28.967 ops/s BenchmarkByteColumnReader.read 5 thrpt 30 172.296 ± 2.929 572.173 ± 31.615 ops/s BenchmarkByteColumnReader.read 6 thrpt 30 168.721 ± 0.843 551.679 ± 35.310 ops/s BenchmarkByteColumnReader.read 7 thrpt 30 180.839 ± 3.180 823.503 ± 15.124 ops/s BenchmarkByteColumnReader.read 8 thrpt 30 162.636 ± 2.574 523.664 ± 31.919 ops/s Co-authored-by: Raunaq Morarka <raunaqmorarka@gmail.com>
Benchmark (bitWidth) Mode Cnt Before After Units BenchmarkShortColumnReader.read 0 thrpt 30 201.502 ± 14.020 976.715 ± 28.892 ops/s BenchmarkShortColumnReader.read 1 thrpt 30 173.992 ± 10.272 624.690 ± 37.268 ops/s BenchmarkShortColumnReader.read 2 thrpt 30 168.042 ± 5.116 556.310 ± 52.423 ops/s BenchmarkShortColumnReader.read 3 thrpt 30 174.832 ± 4.403 577.811 ± 27.119 ops/s BenchmarkShortColumnReader.read 4 thrpt 30 172.531 ± 3.270 582.771 ± 38.262 ops/s BenchmarkShortColumnReader.read 8 thrpt 30 145.744 ± 12.112 490.298 ± 43.535 ops/s BenchmarkShortColumnReader.read 10 thrpt 30 152.312 ± 3.486 506.218 ± 9.371 ops/s BenchmarkShortColumnReader.read 11 thrpt 30 153.093 ± 5.410 503.974 ± 12.769 ops/s BenchmarkShortColumnReader.read 14 thrpt 30 138.288 ± 5.873 438.434 ± 27.987 ops/s BenchmarkShortColumnReader.read 16 thrpt 30 147.998 ± 1.930 410.992 ± 31.457 ops/s Co-authored-by: Raunaq Morarka <raunaqmorarka@gmail.com>
Benchmark (encoding) Mode Cnt Before After Units BenchmarkInt32ToLongColumnReader.read PLAIN thrpt 20 399.059 ± 34.474 504.970 ± 8.555 ops/s BenchmarkInt32ToLongColumnReader.read DELTA_BINARY_PACKED thrpt 20 113.250 ± 4.856 339.272 ± 3.497 ops/s
raunaqmorarka
force-pushed
the
pqr-v2-int
branch
from
January 26, 2023 21:12
31b0c5d
to
45428fd
Compare
sopel39
approved these changes
Jan 27, 2023
skrzypo987
approved these changes
Jan 30, 2023
raunaqmorarka
changed the title
Optimize decoders for integers in DELTA_BINARY_PACKED parquet encoding
Optimize decoders for numeric types in DELTA_BINARY_PACKED parquet encoding
Jan 30, 2023
raunaqmorarka
changed the title
Optimize decoders for numeric types in DELTA_BINARY_PACKED parquet encoding
Optimize decoders for DELTA_BINARY_PACKED parquet encoding
Jan 30, 2023
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Optimize decoders for DELTA_BINARY_PACKED parquet encoding
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: