-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[Feature](orc-reader) Implement new merge io facility for orc reader. #52085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
morrySnow
merged 3 commits into
apache:branch-3.1
from
kaka11chen:cherry-pick-45966_3.1
Jun 25, 2025
Merged
[Feature](orc-reader) Implement new merge io facility for orc reader. #52085
morrySnow
merged 3 commits into
apache:branch-3.1
from
kaka11chen:cherry-pick-45966_3.1
Jun 25, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
2c2efee
to
1881c74
Compare
…apache#45966) related: apache/doris-thirdparty#270 Problem Summary: The original merge io mechanism `MergeRangeFileReader` requires that the range must be read in order, and the ranges can be out of order, so the range cannot be read back. And if you turn on delayed materialization of orc complex types, you will need to present a stream readback scenario, such as `select struct_element(info, 'age'), id from test_orc_struct, where struct_element(info, 'name') = 'Alice'`. When late materialization is turned on, the present stream of the parent node `info` will be read first after `name` is read. When reading `age`, the parent node `info` needs to be read back. So the late materialization of the orc complex type cannot be turned on at present.
1881c74
to
4474baa
Compare
run buildall |
… of orc-reader. (apache#51102) ### What problem does this PR solve? Related PR: apache#45966 Fix merge range not sorted in new merge io facility of orc-reader. Because the ranges taken from std::unordered_map<orc::StreamId, io::PrefetchRange>&ranges are not sorted, merging adjacent ranges will have a very poor effect.
run buildall |
…filtered by row group stats, despite stripe stats remaining unfiltered. (apache#50185) Related PR: apache/doris-thirdparty#306 Problem Summary: When all row groups are filtered by row group stats, despite stripe stats remaining unfiltered, stream map is not clear, which caused read error data. ``` ERROR 1105 (HY000): errCode = 2, detailMessage = (172.20.32.136)[INTERNAL_ERROR]Orc row reader nextBatch failed. reason = Buffer error in ZlibDecompressionStream::NextDecompress. ```
2d73404
to
2855cb7
Compare
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
morrySnow
approved these changes
Jun 25, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry-pick main PR: #45966, Fix bugs PR: #50185 #51102