New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix writing Hive timestamps array/map/struct timestamps from Trino to Parquet #6717
Conversation
Re: this comment
The code currently in this PR should have the same performance as, but the code that converts
I have a cleaner version here that uses the same code to handle bare values and values in arrays, and, thinking about it after stepping back for a bit, I'm not sure it will have a significant impact. Basically, it creates an extra object or two for each value being written (including boxing and then unboxing primitives). Another option would be to go through Hive's Another approach that might avoid both issues is to use objects similar to the ones in the original code, but creating a new one for each array/map/row entry. This will increase costs for those types, but not for regular values of any type. I'll take look at this. (This looks promising, but runs into some issues with under-documented Hive code and some weird things we do in |
I figured out the third approach I mentioned in my previous comment, and this PR is now using it. This will slightly lower performance for data being written in arrays, maps, and structs, but should have no impact on regular data. |
5c79f51
to
22445af
Compare
The first four commits in this PR ("Clean up in FieldSetterFactory" and "Fix writing long timestamps in map/array/row to Hive") should get timestamps working in arrays et al. in Parquet. The rest of the commits dramatically refactor I'm starting to think I should have made |
a42b87e
to
f09a6d9
Compare
@jirassimok would it make sense to separate fix from potentially-an-improvement into their own PRs? |
Oh, yeah, I had that in my previous comment, but I forgot to leave it in when I updated the PR and changed the comment. I'll split it off. |
d17afa8
to
54b33d6
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/FieldSetterFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetFieldSetterFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetFieldSetterFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/FieldSetterFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/FieldSetterFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/FieldSetterFactory.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/hive/TestHiveStorageFormats.java
Outdated
Show resolved
Hide resolved
e5620c9
to
758557d
Compare
- Clean up FieldSetterFactory.create's type conditions - Rename FieldSetterFactory.BigintFieldBuilder to BigintFieldSetter
758557d
to
2bd233a
Compare
2bd233a
to
2928bca
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/HiveWriteUtils.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/HiveWriteUtils.java
Outdated
Show resolved
Hide resolved
62c9206
to
236bf10
Compare
236bf10
to
bc16f70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!
The test failure in Iceberg is #5758. |
Merged, thanks! |
The commits up to "Add tests for Hive struct timestamps inserted from Trino" are from #6622.
This addresses #6760.