Skip to content

[Feature](orc-reader) Implement new merge io facility for orc reader. #52085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 25, 2025

Conversation

kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Jun 20, 2025

Cherry-pick main PR: #45966, Fix bugs PR: #50185 #51102

@kaka11chen kaka11chen requested a review from morrySnow as a code owner June 20, 2025 12:24
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaka11chen kaka11chen marked this pull request as draft June 20, 2025 12:25
@kaka11chen kaka11chen force-pushed the cherry-pick-45966_3.1 branch 2 times, most recently from 2c2efee to 1881c74 Compare June 20, 2025 12:53
…apache#45966)

related: apache/doris-thirdparty#270

Problem Summary:

The original merge io mechanism `MergeRangeFileReader` requires that the
range must be read in order, and the ranges can be out of order, so the
range cannot be read back.
And if you turn on delayed materialization of orc complex types, you
will need to present a stream readback scenario, such as `select
struct_element(info, 'age'), id from test_orc_struct, where
struct_element(info, 'name') = 'Alice'`.
When late materialization is turned on, the present stream of the parent
node `info` will be read first after `name` is read. When reading `age`,
the parent node `info` needs to be read back. So the late
materialization of the orc complex type cannot be turned on at present.
@kaka11chen kaka11chen force-pushed the cherry-pick-45966_3.1 branch from 1881c74 to 4474baa Compare June 24, 2025 08:45
@kaka11chen
Copy link
Contributor Author

run buildall

… of orc-reader. (apache#51102)

### What problem does this PR solve?

Related PR: apache#45966

Fix merge range not sorted in new merge io facility of orc-reader.
Because the ranges taken from std::unordered_map<orc::StreamId, io::PrefetchRange>&ranges are not sorted, merging adjacent ranges will have a very poor effect.
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen marked this pull request as ready for review June 24, 2025 09:18
…filtered by row group stats, despite stripe stats remaining unfiltered. (apache#50185)

Related PR: apache/doris-thirdparty#306

Problem Summary:
When all row groups are filtered by row group stats, despite stripe
stats remaining unfiltered, stream map is not clear, which caused read
error data.

```
ERROR 1105 (HY000): errCode = 2, detailMessage = (172.20.32.136)[INTERNAL_ERROR]Orc row reader nextBatch failed. reason = Buffer error in ZlibDecompressionStream::NextDecompress.
```
@kaka11chen kaka11chen force-pushed the cherry-pick-45966_3.1 branch from 2d73404 to 2855cb7 Compare June 24, 2025 11:57
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 20.85% (44/211) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 41.39% (11020/26628)
Line Coverage 32.23% (94577/293477)
Region Coverage 31.30% (48733/155709)
Branch Coverage 27.96% (25137/89912)

@morrySnow morrySnow merged commit 2684cc0 into apache:branch-3.1 Jun 25, 2025
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants