Skip to content

Conversation

@danking
Copy link
Contributor

@danking danking commented Nov 8, 2024

The code as written fails when the "other" [min, max] interval has size greater than 1.

Call the left-hand-side's interval: [lmin, lmax] and the right-hand-side's interval: [rmin, rmax]. The current expression is:

(lmin == rmin) && (rmin == lmax)
||
(lmin == rmax) && (rmax == lmax)

By transitivity, in either conjunction, lmin == lmax. That means the column has a single known value: lmin (equivalently: lmax).

If the right-hand-side is only ever literals or expressions thereof, its interval is always [x, x]. In that case, this expression is tantamount to: is the column constant and equal to x?

If the right-hand-side is either another column or some non-deterministic expression, the interval could be, for example: [10, 11]. If the column is known to be the constant value 10 (i.e. its min and max are 10), we cannot prune this chunk! The rows where the right-hand-side is 11 would satisfy the inequality.

I'm also open to: (lmin == lmax) && (lmin == rmin) with a comment indicating that, because we only have literals, rmin is necessarily equal to rmax and thus rmin is the, known, right-hand-side value.

The code as written fails when the "other" [min, max] interval has size greater than 1.

Call the left-hand-side's interval: `[lmin, lmax]` and the right-hand-side's interval: `[rmin,
rmax]`. The current expression is:

    (lmin == rmin) && (rmin == lmax)
    ||
    (lmin == rmax) && (rmax == lmax)

By transitivity, in either conjunction, lmin == lmax. That means the column has a single known
value: lmin (equivalently: lmax).

If the right-hand-side is only ever literals or expressions thereof, its interval is always `[x,
x]`. In that case, this expression is tantamount to: is the column constant and equal to `x`?

If the right-hand-side is either another column or some non-deterministic expression, the interval
could be, for example: `[10, 11]`. If the column is known to be the constant value `10` (i.e. its
min and max are 10), we cannot prune this chunk! The rows where the right-hand-side is 11 would
satisfy the inequality.

I'm also open to: `(lmin == lmax) && (lmin == rmin)` with a comment indicating that, because we only
have literals, rmin is necessarily equal to rmax and thus rmin _is_ the, known, right-hand-side
value.
@danking danking requested a review from robert3005 November 8, 2024 18:21
@danking danking marked this pull request as ready for review November 8, 2024 18:21
@robert3005
Copy link
Contributor

one hand side in this transformation is literal, the other can be other expression. I was basically trying to prune a != b but maybe that's not that useful

@robert3005
Copy link
Contributor

One thing I didn't add is simple evaluation tests. We should have a simple table to make sure the filtered table is what we expect. This can happen independently of this pr.

@lwwmanning lwwmanning merged commit 6c1d328 into develop Nov 8, 2024
@lwwmanning lwwmanning deleted the dk/prune-not-eq-refinement branch November 8, 2024 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants