New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid multiple read requests on small parquet files #19127
Conversation
56de0f8
to
ac6e6e1
Compare
Do you have any benchmarks @raunaqmorarka ? |
5409456
to
7f7eed7
Compare
04602b6
to
3b796e8
Compare
3b796e8
to
afee295
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/MemoryParquetDataSource.java
Show resolved
Hide resolved
5839229
to
1a4ad55
Compare
1a4ad55
to
2da1d18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, within the tests of the object storage connectors, we use almost completely for reading Parquet content the memory implementation.
This doesn't reflect however the reality in production.
I'd recommend using the session without small file threshold by default (on the query runner setup) and only when explicitly needed make use of this feature.
.../trino-hive/src/test/java/io/trino/plugin/hive/s3/TestTrinoS3FileSystemAccessOperations.java
Outdated
Show resolved
Hide resolved
.../trino-hive/src/test/java/io/trino/plugin/hive/s3/TestTrinoS3FileSystemAccessOperations.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMergeSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Show resolved
Hide resolved
2da1d18
to
3985e55
Compare
Read files below parquet.small-file-threshold size in a single file system request to avoid multiple small read requests
3985e55
to
f84dedd
Compare
The corresponding orc feature |
Description
Read files below parquet.small-file-threshold size in a single file system request to avoid multiple small read requests
Additional context and related issues
This is similar to
hive.orc.tiny-stripe-threshold
in ORC readerRelease notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: