Skip to content

Commit

Permalink
Fix reading from large Parquet files
Browse files Browse the repository at this point in the history
This is a fix for #2730. When
merging small reads, if the first range and second range are more than 2
GB apart, mergeAdjacentDiskRanges() throw sn ArithmeticException because
merging those two ranges is too big to fit in a DiskRange. The correct
behavior is to not merge those ranges because this implies the ranges
are farther apart than maxReadSizeBytes.
  • Loading branch information
awishnick authored and martint committed Mar 19, 2020
1 parent 7d20860 commit 6a590dc
Showing 1 changed file with 9 additions and 2 deletions.
Expand Up @@ -241,8 +241,15 @@ public static List<DiskRange> mergeAdjacentDiskRanges(Collection<DiskRange> disk
DiskRange last = ranges.get(0);
for (int i = 1; i < ranges.size(); i++) {
DiskRange current = ranges.get(i);
DiskRange merged = last.span(current);
if (merged.getLength() <= maxReadSizeBytes && last.getEnd() + maxMergeDistanceBytes >= current.getOffset()) {
DiskRange merged = null;
boolean blockTooLong = false;
try {
merged = last.span(current);
}
catch (ArithmeticException e) {
blockTooLong = true;
}
if (!blockTooLong && merged.getLength() <= maxReadSizeBytes && last.getEnd() + maxMergeDistanceBytes >= current.getOffset()) {
last = merged;
}
else {
Expand Down

0 comments on commit 6a590dc

Please sign in to comment.