Speed up chunk search by restriction clauses #4492

akuzm · 2022-07-05T10:59:42Z

We don't have to look up the dimension slices for dimensions for which
we don't have restrictions.

This also fixes a planning time regression introduced in 2.7. It was introduced in this commit 37190e8a8#diff-e488f45c83647e5d9415d438f2e79eb2780639b7888c7533890b6c577e02d03dL852-L857
We stopped using the fast path for case where there are no restrictions, and the normal path turned out to be slower. This commit speeds it up to the previous level.

The lookup logic this PR introduces is basically the same as the one introduced for chunk_point_find_chunk_id by #4390 . The difference is that here the conditions may be underspecified, so multiple chunks match.

erimatnor · 2022-07-06T12:02:18Z

For reviewing purposes, it would be good to understand what change introduced the regression.

codecov · 2022-08-02T15:12:56Z

Codecov Report

Merging #4492 (87dfc27) into main (a26a597) will increase coverage by 0.03%.
The diff coverage is 96.59%.

@@            Coverage Diff             @@
##             main    #4492      +/-   ##
==========================================
+ Coverage   90.82%   90.85%   +0.03%     
==========================================
  Files         224      224              
  Lines       42344    41654     -690     
==========================================
- Hits        38457    37843     -614     
+ Misses       3887     3811      -76

Impacted Files	Coverage Δ
src/chunk.h	`100.00% <ø> (ø)`
src/compat/compat.h	`94.59% <87.50%> (-0.53%)`	⬇️
src/hypertable_restrict_info.c	`94.50% <91.89%> (-1.35%)`	⬇️
src/chunk_scan.c	`98.47% <98.73%> (+2.61%)`	⬆️
src/chunk.c	`88.37% <100.00%> (-6.73%)`	⬇️
src/planner/expand_hypertable.c	`93.94% <100.00%> (-0.03%)`	⬇️
tsl/src/nodes/compress_dml/compress_dml.c	`90.47% <0.00%> (-9.53%)`	⬇️
src/chunk_constraint.c	`90.07% <0.00%> (-3.88%)`	⬇️
tsl/src/partialize_finalize.c	`92.44% <0.00%> (-3.73%)`	⬇️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a26a597...87dfc27. Read the comment docs.

akuzm · 2022-08-02T15:21:13Z

For reviewing purposes, it would be good to understand what change introduced the regression.

I updated the description.

akuzm · 2022-08-09T16:51:45Z

https://grafana.ops.savannah-dev.timescale.com/d/NdmLnOk4z/compare-benchmark-runs?orgId=1&var-branch=main&var-branch=release&var-branch=pr&var-postgres=14&var-run1=379&var-run2=397&var-threshold=0.02

Looks like this does improve some queries, although not all of them.

svenklemm · 2022-08-21T15:35:56Z

src/hypertable_restrict_info.c

@@ -524,6 +549,18 @@ gather_restriction_dimension_vectors(const HypertableRestrictInfo *hri)
 														   open->lower_strategy,
 														   open->lower_bound);

+				/*
+				 * If we have a nontrivial condition for the second index column


What does nontrivial mean exactly?

Just any restrictions, actually. We represent the restrictions as intervals, and "no restriction" corresponds to (-inf, +inf) interval, that is which I meant by "trivial". I'll reword the comment to make it more clear.

Hmm that is weird why do you need backwards scan for constraints on the 2nd index column scan direction shouldnt matter for that.

If you have <= for the 2nd column, and scan direction is forward, you can't use the 2nd column to find the starting point for the scan. So you use only the first column, and go through the rest of values sequentially. For backwards, you know where the first record is that matches both the conditions on the 1st and the 2nd columns.

More about this in postgres code: https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/nbtsearch.c#L928

We have this optimization in several other places, so I decided to add it here as well.

src/chunk.c

svenklemm · 2022-08-23T11:45:51Z

src/hypertable_restrict_info.c

+				 * starting point.
+				 * If not, prefer forward direction, because backwards scan is
+				 * slightly slower for some reason.
+				 * Ideally we need some other index type than btree for this.


Why do you need non-btree index here, what would we gain by other index type.

The btree index is not so suitable for queries like find interval that contains the given point and find intervals that intersect with the given interval. Idk what it should be, we also have this dimension id column on which we should filter first.

I mean, in all queries we make the third column (range_end) is only compared in a sequential scan over index entries, not used to find the starting/ending point.

erimatnor

Looks ok, but I left some comments you might want to review.

erimatnor · 2022-08-31T07:59:22Z

src/hypertable_restrict_info.c

+	 * Filter out the dimensions for which we don't have a nontrivial
+	 * restriction.


t would be good, for the reader's understanding, to add why this is useful in addition to what is being done. Also, what is meant by "filter out"? Are the restrictions "filtered out" those we'd like to use or those we'd like to ignore?

For example:

Suggested change

* Filter out the dimensions for which we don't have a nontrivial

* restriction.

* Remove restrictions that "select" everything in their dimension, since they don't further restrict the result and instead increase scan time.

Btw, do we need to include "trivial" restrictions in HypertableRestrictInfo to begin with? Why not exclude them from the beginning so that we need not filter them out later?

Btw, do we need to include "trivial" restrictions in HypertableRestrictInfo to begin with? Why not exclude them from the beginning so that we need not filter them out later?

Originally I just didn't want to change the algorithm too much, because it would require different code for initializing the dimension restrict infos on demand.

But now I think this approach also has value for queries when there are multiple conditions on a dimension, that end up adding to the entire dimension. This isn't likely to arise in simple hand-written queries, but more likely in auto-generated queries or more complex queries that use views. So we can consider this approach of "removing empty ones in the end" to be an additional optimization.

We don't have to look up the dimension slices for dimensions for which we don't have restrictions. Also sort chunks by ids before looking up the metadata, because this gives more favorable table access patterns (closer to sequential). This fixes a planning time regression introduced in 2.7.

akuzm self-assigned this Jul 5, 2022

akuzm marked this pull request as ready for review August 2, 2022 15:21

svenklemm reviewed Aug 21, 2022

View reviewed changes

svenklemm reviewed Aug 23, 2022

View reviewed changes

src/chunk.c Outdated Show resolved Hide resolved

svenklemm reviewed Aug 23, 2022

View reviewed changes

svenklemm approved these changes Aug 24, 2022

View reviewed changes

horzsolt requested a review from erimatnor August 30, 2022 11:44

erimatnor approved these changes Aug 31, 2022

View reviewed changes

akuzm force-pushed the chunk-restriction branch from a932008 to 09fea64 Compare September 9, 2022 15:48

akuzm enabled auto-merge (rebase) September 9, 2022 15:50

akuzm force-pushed the chunk-restriction branch from 09fea64 to 87dfc27 Compare September 12, 2022 09:33

akuzm merged commit 4e47302 into timescale:main Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up chunk search by restriction clauses #4492

Speed up chunk search by restriction clauses #4492

akuzm commented Jul 5, 2022 •

edited

erimatnor commented Jul 6, 2022

codecov bot commented Aug 2, 2022 •

edited

akuzm commented Aug 2, 2022

akuzm commented Aug 9, 2022

svenklemm Aug 21, 2022

akuzm Aug 22, 2022

svenklemm Aug 23, 2022

akuzm Aug 23, 2022

svenklemm Aug 23, 2022

akuzm Aug 23, 2022

akuzm Aug 23, 2022

erimatnor left a comment

erimatnor Aug 31, 2022

erimatnor Aug 31, 2022

akuzm Sep 9, 2022

		* Filter out the dimensions for which we don't have a nontrivial
		* restriction.

	* Filter out the dimensions for which we don't have a nontrivial
	* restriction.
	* Remove restrictions that "select" everything in their dimension, since they don't further restrict the result and instead increase scan time.

Speed up chunk search by restriction clauses #4492

Speed up chunk search by restriction clauses #4492

Conversation

akuzm commented Jul 5, 2022 • edited

erimatnor commented Jul 6, 2022

codecov bot commented Aug 2, 2022 • edited

Codecov Report

akuzm commented Aug 2, 2022

akuzm commented Aug 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erimatnor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akuzm commented Jul 5, 2022 •

edited

codecov bot commented Aug 2, 2022 •

edited