New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crash when partializing agg with HAVING #1544
Fix crash when partializing agg with HAVING #1544
Conversation
fac4cf0
to
cfec1c4
Compare
Codecov Report
@@ Coverage Diff @@
## master #1544 +/- ##
==========================================
+ Coverage 91.59% 91.91% +0.31%
==========================================
Files 144 142 -2
Lines 21949 21005 -944
==========================================
- Hits 20104 19306 -798
+ Misses 1845 1699 -146
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm but want to have @svenklemm take a look too
d83c9ce
to
96f9efe
Compare
"target list."))); | ||
|
||
partialize_agg_paths(input_rel); | ||
partialize_agg_paths(output_rel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should only be called on either input_rel or output_rel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about that, since it is reasonable that only the output_rel
would actually have any AggRefs
given that this is the rel that is doing the aggregation. But I am just following what was there before (although I moved this code into a function to minimize duplication). Would be good to get some input from the person that wrote this code (looks like @cevian or @gayyappan according to git blame).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I changed this to only act on the output_rel
and also put a guard in planner.c to only call this in case of UPPERREL_GROUP_AGG
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JLockerman could you comment on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gayyappan I'd defer to sven on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look @JLockerman. looped you in as you wrote the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to try and clarify the reasoning behind this change according to how I understand this code.
The input_rel
is the rel that something was computed from and output_rel
is the result. Since we want to partialize aggregates, we should only be interested in the stage that applies aggregations to an input_rel
to produce an aggregate output_rel
, i.e., the UPPERREL_GROUP_AGG
stage. In this case, the input_rel
is typically a base or join rel (no aggregates) and the output_rel
the result of applying an aggregate on top of the base/join rel in input_rel
.
In any later stage (after aggregates), the input_rel
would already have aggregates applied (which we partialized in a previous call to this function), and no aggregates are computed in those stages. Therefore, there should never be a case where we'd have to partialize an input_rel
(unless we somehow missed to do it on the output_rel
in a previous step).
Please let me know if I am missing something.
96f9efe
to
eff2f0c
Compare
eff2f0c
to
e3bd03e
Compare
This change fixes an assertion-based crash that happened when using the `partialize_agg` function together with HAVING clauses. For instance, ``` SELECT time_bucket('1 day', time), device, __timescaledb_internal.partialize_agg(avg(temp)) GROUP BY 1, 2 HAVING avg(temp) > 3; ``` would crash because the HAVING clause's aggregate didn't have its `Aggref` node set to partial aggregate mode. Regular partial aggregations executed by the planner (i.e., those not induced by the `partialize_agg` function) have their HAVING aggs transparently moved to the target list during planning so that the finalize node can use it when applying the final filter on the resulting groups. However, it doesn't make much sense to transparently do that when partializing with `partialize_agg` since it would be odd to return more columns than requested by the user. Therefore, the caller would have to do that manually. This, in fact, is also done when materializing continuous aggregates. For this reason, HAVING clauses with `partialize_agg` are blocked, except in cases where the planner transparently reduces the HAVING expression to a simple filter (e.g., `HAVING device > 3`). Apart from fixing this issue, this change also refectors some of the related code and adds tests for some error cases.
e3bd03e
to
989a132
Compare
This change fixes an assertion-based crash that happened when using
the
partialize_agg
function together with HAVING clauses. For instance,would crash because the HAVING clause's aggregate didn't have its
Aggref
node set to partial aggregate mode.Regular partial aggregations executed by the planner (i.e., those not
induced by the
partialize_agg
function) have their HAVING aggstransparently moved to the target list during planning so that the
finalize node can use it when applying the final filter on the
resulting groups. However, it doesn't make much sense to transparently
do that when partializing with
partialize_agg
since it would be oddto return more columns than requested by the user. Therefore, the
caller would have to do that manually. This, in fact, is also done
when materializing continuous aggregates.
For this reason, HAVING clauses with
partialize_agg
are blocked,except in cases where the planner transparently reduces the HAVING
expression to a simple filter (e.g.,
HAVING device > 3
).Apart from fixing this issue, this change also refectors some of the
related code and adds tests for some error cases.