We implement our compare binary comparison operation using the arrow_ord kernels. This uses the f32/f64 total_cmp, which includes NaN as >= all positive normal numbers.
This causes inconsistent behavior when pushing down a comparison operation against NaN to a file column that contains NaN.
Directly evaluating the following comparison results in [false, false, true, true]:
PrimitiveArray([1.0, 2.0, NaN, NaN]) >= f32::NAN
However, this is not the case when the filter is pushed down into a scan, because our min/max stats do not consider NaN. The pruning expression that gets pushed down is
$.max < f32::NAN --> 2.0 < f32::NAN = true
which causes the whole thing to get pruned and the result becomes [false, false, false, false]
|
fn compute_min_max<'a, T>(iter: impl Iterator<Item = &'a T>, dtype: &DType) -> Option<MinMaxResult> |
|
where |
|
T: Into<ScalarValue> + NativePType, |
|
{ |
|
// `total_compare` function provides a total ordering (even for NaN values). |
|
// However, we exclude NaNs from min max as they're not useful for any purpose where min/max would be used |
We need to either
- Make the stats contain NaN, and preserve the
total_cmp ordering in pushdown
- Have a fallback
compare kernel for float arrays with NaNCount > 0 that masks out the NaNs to false
We implement our
comparebinary comparison operation using the arrow_ord kernels. This uses the f32/f64total_cmp, which includes NaN as >= all positive normal numbers.This causes inconsistent behavior when pushing down a comparison operation against NaN to a file column that contains NaN.
Directly evaluating the following comparison results in
[false, false, true, true]:However, this is not the case when the filter is pushed down into a scan, because our min/max stats do not consider NaN. The pruning expression that gets pushed down is
which causes the whole thing to get pruned and the result becomes
[false, false, false, false]vortex/vortex-array/src/arrays/primitive/compute/min_max.rs
Lines 43 to 48 in bba0e63
We need to either
total_cmpordering in pushdowncomparekernel for float arrays with NaNCount > 0 that masks out the NaNs to false