Fix rare crash when transforming sliced nested arrays #3171
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes a rather complicated-to-find bug that would cause transformation operators accessing specific nested arrays to produce oversized arrays when used after a slicing operation. This most commonly happened after
head
andtail
, but could also occur afterwhere
or the hiddenrebatch
operator.When we slice an array, we don't actually modify the array itself. Instead, we just modify the validity bitmap and adjust the internal offset and length values of the array. Now there were two related problems, one being a misunderstanding and the other a bug in Apache Arrow:
arrow::StructArray::Flatten()
instead ofarrow::StructArray::fields()
. The former performs a logical and operation on the null bitmaps of the struct array and the nested array.arrow::FieldPath::Get(...)
function returns an array pointing into the passed array or batch. The function recreates the array from the underlying child array data, thus returning the original nested array, as if the outer array were not sliced at all.This PR fixes these two issues, and also as a bonus a bug that caused
tail
to return too many events when its input were lots of really small slices.