Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't add unneded null test in sorted query plan for minmax #6442

Merged
merged 1 commit into from
Jan 10, 2024

Conversation

akuzm
Copy link
Member

@akuzm akuzm commented Dec 18, 2023

If we use NULLS LAST, we don't need the null test. This saves some CPU.

Disable-check: force-changelog-file

Copy link

codecov bot commented Dec 18, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (5676ace) 87.36% compared to head (b93171a) 87.32%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6442      +/-   ##
==========================================
- Coverage   87.36%   87.32%   -0.04%     
==========================================
  Files         187      187              
  Lines       41895    41851      -44     
  Branches     9331     9312      -19     
==========================================
- Hits        36602    36548      -54     
- Misses       3624     3632       +8     
- Partials     1669     1671       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@akuzm akuzm marked this pull request as ready for review January 8, 2024 11:50
Copy link

github-actions bot commented Jan 8, 2024

@antekresic, @fabriziomello: please review this pull request.

Powered by pull-review

Comment on lines 632 to 636
/*
* Build "sort IS NOT NULL" expression. Not that target can still be NULL.
* We don't need it if the order is NULLS LAST.
*/
if (nulls_first)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dependent on if descending or ascending order is used. Do we have tests for all four combinations of ASC/DESC and NULLS FIRST/LAST?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to hit the desc nulls last case. ts_preprocess_first_last_aggregates tries the desc nulls first ordering first, and it always succeeds (no reason why it would not, because it just builds a sorted path). I tried to address this as well by trying both paths and making a cost-based decision, but Postgres has zero cost for is null clauses, so it doesn't work. I decided to stick to the most simple change in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you cannot hit this case, can you add an Assert (or Ensure) to make sure that we do not sort this incorrectly in some yet-to-be-found case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to change the ts_preprocess_first_last_aggregates, because it's an old function that is a copy of Postgres' preprocess_minmax_aggregates. As I mentioned, I tried to switch it to the cost-based decision, and it didn't work well because of the limitations of the PG cost model. If we limit the changes in this PR to this single if statement that it's adding, do you think it has correctness problems by itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the coverage problem that I mentioned exists in the current TimescaleDB code. I don't want to tacke it because it's a different change that is potentially problematic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I have understood something wrong, but this discussion is based on that you are doing a refactoring to avoid pruning NULL values when not necessary, which would be when you want the first value in an ascending order (with null values sorted after non-null values) and when you want the last value in descending order (with null values sorted before non-null values).

AFAICT, the current code will remove the IS NOT NULL when nulls_first is true, which is when reverse in the calling function is true, which in turn is when the ordering operator is ">" (descending order). This is correct when you want the last value (because null values are treated as larger than other values), but it is not correct when you want the first value (because then you would get a null value where you in the previous code would get a non-null value). The opposite holds when the ordering operator is "<" (ascending order).

If you can either explain to me where I misunderstood the change, add tests to make sure that we cannot pick the wrong value, or add assertions that capture a potential case where we could have picked the wrong value, it would be great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path always takes the first value (LIMIT 1). NULLS LAST is independent of DESC/ASC, it means that nulls go last, so in this case we don't have to filter out the null values. Like this:

test=# select * from (values (1), (2), (null)) t(x) order by x nulls last;
 x 
───
 1
 2
 ¤

test=# select * from (values (1), (2), (null)) t(x) order by x desc nulls last;
 x 
───
 2
 1
 ¤

nulls last is specified in SortGroupClause explicitly, independent of ascending/descending order. So it doesn't matter if the sorting is desc or asc, for the purpose of removing the null test we only have to check if nulls go first or last.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coverage problem is that the calling code currently doesn't exercise the combinations of desc nulls last and asc nulls first, so effectively this optimization is only applied for asc nulls last, i.e. the first function. But it is still correct for desc nulls last.

Copy link
Member

@svenklemm svenklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the column is a partitioning column we can always skip the NULL check independent of sort attributes as those have NOT NULL constraint

src/planner/agg_bookend.c Outdated Show resolved Hide resolved
@akuzm
Copy link
Member Author

akuzm commented Jan 10, 2024

If the column is a partitioning column we can always skip the NULL check independent of sort attributes as those have NOT NULL constraint

Yeah, I also thought about this, but it is not very straightforward, let's merge this simple optimization for now.

If we use NULLS LAST, we don't need the null test. This saves some CPU.
@akuzm akuzm enabled auto-merge (rebase) January 10, 2024 14:40
@akuzm akuzm merged commit 9c75b03 into timescale:main Jan 10, 2024
40 of 41 checks passed
@akuzm akuzm deleted the nulls-last branch January 10, 2024 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants