Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for fast pruning of inlined functions #1877

Merged
merged 1 commit into from Jun 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
28 changes: 28 additions & 0 deletions src/planner.c
Expand Up @@ -795,6 +795,33 @@ timescaledb_get_relation_info_hook(PlannerInfo *root, Oid relation_objectid, boo
switch (classify_relation(root, rel, &ht))
{
case TS_REL_HYPERTABLE:
{
#if PG12_GE
/* This only works for PG12 because for earlier versions the inheritance
* expansion happens too early during the planning phase
*/
RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
Query *query = root->parse;
/* Mark hypertable RTEs we'd like to expand ourselves.
* Hypertables inside inlineable functions don't get marked during the query
* preprocessing step. Therefore we do an extra try here. However, we need to
* be careful for UPDATE/DELETE as Postgres (in at least version 12) plans them
* in a complicated way (see planner.c:inheritance_planner). First, it runs the
* UPDATE/DELETE through the planner as a simulated SELECT. It uses the results
* of this fake planning to adapt its own UPDATE/DELETE plan. Then it's planned
* a second time as a real UPDATE/DELETE, but with requiredPerms set to 0, as it
* assumes permission checking has been done already during the first planner call.
* We don't want to touch the UPDATE/DELETEs, so we need to check all the regular
* conditions here that are checked during preprocess_query, as well as the
* condition that rte->requiredPerms is not requiring UPDATE/DELETE on this rel.
*/
if (ts_guc_enable_optimizations && ts_guc_enable_constraint_exclusion && inhparent &&
rte->ctename == NULL && !IS_UPDL_CMD(query) && query->resultRelation == 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, since resultRelation == 0 for INSERT, UPDATE, and DELETE it seems like the check !IS_UPDL_CMD(query) is redundant.

You mention that Postgres runs the queries through the planner as simulated selects, but I cannot see where that happens and attaching a debugger and trying some queries does not allow me to produce a query where query->resultRelation == 0 and query->commandType != CMD_SELECT.

I have probably missed where this happens, but could you please point me to where Postgres runs the UPDATE or DELETE as a simulated SELECT or show me a statement that will trigger this situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It happens in planner.c in Postgres source, around line 1300:

		/*
		 * Make a deep copy of the parsetree for this planning cycle to mess
		 * around with, and change it to look like a SELECT.  (Hack alert: the
		 * target RTE still has updatedCols set if this is an UPDATE, so that
		 * expand_partitioned_rtentry will correctly update
		 * subroot->partColsUpdated.)
		 */
		subroot->parse = copyObject(root->parse);

		subroot->parse->commandType = CMD_SELECT;
		subroot->parse->resultRelation = 0;

Indeed that makes !IS_UPDL_CMD(query) check redundant, because that'll always evaluate to true on v12. However, since I'd say Postgres applies a bit of a hack for planning update/delete's in v12, I think it's good to check anyway and not rely on the Postgres hack completely. If you disagree I'd be happy to remove it though.

Queries that trigger this behavior can actually be found in TimescaleDb tests already. For example, when running the tests without the extra ACL permission check, regressions come up. Here's one for an UPDATE, but a similar test exists for DELETE.

-- ConstraintAwareAppend NOT applied for UPDATE
EXPLAIN (costs off)
UPDATE "one_Partition"
SET series_1 = 8
WHERE series_1 IN (SELECT series_1 FROM "one_Partition" WHERE series_1 > series_val());
                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Update on _hyper_1_1_chunk <<<<------- incorrect
   Update on _hyper_1_1_chunk
   Update on _hyper_1_2_chunk
   Update on _hyper_1_3_chunk
   ->  Hash Join
         Hash Cond: (_hyper_1_1_chunk.series_1 = "one_Partition".series_1)
         ->  Seq Scan on _hyper_1_1_chunk
         ->  Hash
               ->  HashAggregate
                     Group Key: "one_Partition".series_1
etc...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pointer, it explained a lot.

From the code, it looks like commandType and resultRelation are both reset, so (as you point out) they are not both needed.

Being a little defensive is always good, but if all are checks in the condition part of the if-statement, it make it hard for future developer to understand what is necessary and what is redundant. I would suggest that you add assertions for the "extra conditions" that you expect to hold, which will allow them to be removed in release builds but trigger failures in debug builds if Postgres changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some experimenting it would seem I cannot move them from the if-statement to the assert unfortunately.
The Postgres planner code I gave you is only part of the full, rather complicated, inheritance planner. The first step of the planner is what I showed - where it simulates it as a SELECT to find out which partitions to process. However, it then uses the results of this to modify its own UPDATE/DELETE plan and then runs it through the planner again. This time, it actually resets the requiredPerms field to 0 though.
This means that:

  • In the first part of the planner, commandType==SELECT and requiredPerms are set to UPDATE/DELETE
  • In the second part, it's called again but now with commandType==UPDATE/DELETE and requiredPerms==0.
    The reset happens in inherit.c function expand_single_inheritance_child

These points together means we do need to check all conditions in the if-statement.
I will add some extra comments to this part though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then they need to be kept, but please add tests that make sure that nobody accidentally removes them and triggers a regression. That is, a test that will fail if any of the checks are removed, but not otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm adding extra tests now. Will post a new version soon.

query->rowMarks == NIL && (rte->requiredPerms & (ACL_UPDATE | ACL_DELETE)) == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the conditions earlier in this if-statement, it seems like there is no need to check for UPDATE or DELETE permissions (the statement is not an UPDATE nor a DELETE at this point).

Did I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above

{
rte_mark_for_expansion(rte);
}
#endif
ts_create_private_reloptinfo(rel);
#if PG12_GE
/* in earlier versions this is done during expand_hypertable_inheritance() below */
Expand All @@ -809,6 +836,7 @@ timescaledb_get_relation_info_hook(PlannerInfo *root, Oid relation_objectid, boo
}
#endif
break;
}
case TS_REL_CHUNK:
case TS_REL_CHUNK_CHILD:
{
Expand Down
42 changes: 42 additions & 0 deletions test/expected/plan_hypertable_cache-12.out
@@ -0,0 +1,42 @@
-- This file and its contents are licensed under the Apache License 2.0.
-- Please see the included NOTICE for copyright information and
-- LICENSE-APACHE for a copy of the license.
-- test hypertable classification when hypertable is not in cache
-- https://github.com/timescale/timescaledb/issues/1832
\set PREFIX 'EXPLAIN (costs off)'
CREATE TABLE test (a int, time timestamptz NOT NULL);
SELECT create_hypertable('public.test', 'time');
create_hypertable
-------------------
(1,public,test,t)
(1 row)

INSERT INTO test SELECT i, '2020-04-01'::date-10-i from generate_series(1,20) i;
CREATE OR REPLACE FUNCTION test_f(_ts timestamptz)
RETURNS SETOF test LANGUAGE SQL STABLE PARALLEL SAFE
AS $f$
SELECT DISTINCT ON (a) * FROM test WHERE time >= _ts ORDER BY a, time DESC
$f$;
:PREFIX SELECT * FROM test_f(now());
QUERY PLAN
-------------------------------------------------
Unique
-> Sort
Sort Key: test.a, test."time" DESC
-> Custom Scan (ChunkAppend) on test
Chunks excluded during startup: 4
(5 rows)

-- create new session
\c
-- plan output should be identical to previous session
:PREFIX SELECT * FROM test_f(now());
QUERY PLAN
-------------------------------------------------
Unique
-> Sort
Sort Key: test.a, test."time" DESC
-> Custom Scan (ChunkAppend) on test
Chunks excluded during startup: 4
(5 rows)

149 changes: 149 additions & 0 deletions test/expected/plan_hypertable_inline.out
@@ -0,0 +1,149 @@
-- This file and its contents are licensed under the Apache License 2.0.
-- Please see the included NOTICE for copyright information and
-- LICENSE-APACHE for a copy of the license.
-- test hypertable classification when query is in an inlineable function
\set PREFIX 'EXPLAIN (costs off)'
CREATE TABLE test (a int, b bigint NOT NULL);
SELECT create_hypertable('public.test', 'b', chunk_time_interval=>10);
create_hypertable
-------------------
(1,public,test,t)
(1 row)

INSERT INTO test SELECT i, i FROM generate_series(1, 20) i;
CREATE OR REPLACE FUNCTION test_f(_ts bigint)
RETURNS SETOF test LANGUAGE SQL STABLE
as $f$
SELECT DISTINCT ON (a) * FROM test WHERE b >= _ts AND b <= _ts + 2
$f$;
-- plans must be the same in both cases
-- specifically, the first plan should not contain the parent hypertable
-- as that is a sign the pruning was not done successfully
:PREFIX SELECT * FROM test_f(5);
QUERY PLAN
------------------------------------------------------------------------------
Unique
-> Sort
Sort Key: _hyper_1_1_chunk.a
-> Index Scan using _hyper_1_1_chunk_test_b_idx on _hyper_1_1_chunk
Index Cond: ((b >= '5'::bigint) AND (b <= '7'::bigint))
(5 rows)

:PREFIX SELECT DISTINCT ON (a) * FROM test WHERE b >= 5 AND b <= 5 + 2;
QUERY PLAN
------------------------------------------------------------------------------
Unique
-> Sort
Sort Key: _hyper_1_1_chunk.a
-> Index Scan using _hyper_1_1_chunk_test_b_idx on _hyper_1_1_chunk
Index Cond: ((b >= 5) AND (b <= 7))
(5 rows)

-- test with FOR UPDATE
CREATE OR REPLACE FUNCTION test_f(_ts bigint)
RETURNS SETOF test LANGUAGE SQL STABLE
as $f$
SELECT * FROM test WHERE b >= _ts AND b <= _ts + 2 FOR UPDATE
$f$;
-- pruning should not be done by TimescaleDb in this case
-- specifically, the parent hypertable must exist in the output plan
:PREFIX SELECT * FROM test_f(5);
QUERY PLAN
------------------------------------------------------------------------------------
Subquery Scan on test_f
-> LockRows
-> Append
-> Seq Scan on test
Filter: ((b >= '5'::bigint) AND (b <= '7'::bigint))
-> Index Scan using _hyper_1_1_chunk_test_b_idx on _hyper_1_1_chunk
Index Cond: ((b >= '5'::bigint) AND (b <= '7'::bigint))
(7 rows)

:PREFIX SELECT * FROM test WHERE b >= 5 AND b <= 5 + 2 FOR UPDATE;
QUERY PLAN
------------------------------------------------------------------------------
LockRows
-> Append
-> Seq Scan on test
Filter: ((b >= 5) AND (b <= 7))
-> Index Scan using _hyper_1_1_chunk_test_b_idx on _hyper_1_1_chunk
Index Cond: ((b >= 5) AND (b <= 7))
(6 rows)

-- test with CTE
-- these cases are just to make sure we're everything is alright with
-- the way we identify hypertables to prune chunks - we abuse ctename
-- for this purpose. So double-check if we're not breaking plans
-- with CTEs here.
CREATE OR REPLACE FUNCTION test_f(_ts bigint)
RETURNS SETOF test LANGUAGE SQL STABLE
as $f$
WITH ct AS MATERIALIZED (
SELECT DISTINCT ON (a) * FROM test WHERE b >= _ts AND b <= _ts + 2
)
SELECT * FROM ct
$f$;
:PREFIX SELECT * FROM test_f(5);
QUERY PLAN
--------------------------------------------------------------------------------------
CTE Scan on ct
CTE ct
-> Unique
-> Sort
Sort Key: _hyper_1_1_chunk.a
-> Index Scan using _hyper_1_1_chunk_test_b_idx on _hyper_1_1_chunk
Index Cond: ((b >= '5'::bigint) AND (b <= '7'::bigint))
(7 rows)

:PREFIX
WITH ct AS MATERIALIZED (
SELECT DISTINCT ON (a) * FROM test WHERE b >= 5 AND b <= 5 + 2
)
SELECT * FROM ct;
QUERY PLAN
--------------------------------------------------------------------------------------
CTE Scan on ct
CTE ct
-> Unique
-> Sort
Sort Key: _hyper_1_1_chunk.a
-> Index Scan using _hyper_1_1_chunk_test_b_idx on _hyper_1_1_chunk
Index Cond: ((b >= 5) AND (b <= 7))
(7 rows)

-- CTE within CTE
:PREFIX
WITH ct AS MATERIALIZED (
SELECT * FROM test_f(5)
)
SELECT * FROM ct;
QUERY PLAN
----------------------------------------------------------------------------------------------
CTE Scan on ct
CTE ct
-> CTE Scan on ct ct_1
CTE ct
-> Unique
-> Sort
Sort Key: _hyper_1_1_chunk.a
-> Index Scan using _hyper_1_1_chunk_test_b_idx on _hyper_1_1_chunk
Index Cond: ((b >= '5'::bigint) AND (b <= '7'::bigint))
(9 rows)

-- CTE within NO MATERIALIZED CTE
:PREFIX
WITH ct AS NOT MATERIALIZED (
SELECT * FROM test_f(5)
)
SELECT * FROM ct;
QUERY PLAN
--------------------------------------------------------------------------------------
CTE Scan on ct
CTE ct
-> Unique
-> Sort
Sort Key: _hyper_1_1_chunk.a
-> Index Scan using _hyper_1_1_chunk_test_b_idx on _hyper_1_1_chunk
Index Cond: ((b >= '5'::bigint) AND (b <= '7'::bigint))
(7 rows)

1 change: 1 addition & 0 deletions test/sql/.gitignore
Expand Up @@ -14,6 +14,7 @@
/plan_expand_hypertable-*.sql
/plan_hashagg_optimized-*.sql
/plan_hashagg-*.sql
/plan_hypertable_cache-*.sql
/plan_ordered_append-*.sql
/query-*.sql
/sort_optimization-*.sql
Expand Down
3 changes: 2 additions & 1 deletion test/sql/CMakeLists.txt
Expand Up @@ -34,7 +34,6 @@ set(TEST_FILES
pg_dump.sql
pg_dump_unprivileged.sql
plain.sql
plan_hypertable_cache.sql
plan_ordered_append.sql
reindex.sql
relocate_extension.sql
Expand Down Expand Up @@ -107,6 +106,7 @@ if ((${PG_VERSION_MAJOR} GREATER_EQUAL "12"))
list(APPEND TEST_FILES
generated_columns.sql
misc.sql
plan_hypertable_inline.sql
tableam.sql
)
endif()
Expand All @@ -123,6 +123,7 @@ set(TEST_TEMPLATES
plan_expand_hypertable.sql.in
#hashagg is different in 9.6 and 10 because of hashagg parallelism
plan_hashagg.sql.in
plan_hypertable_cache.sql.in
query.sql.in
# Query tests differ in how PostgreSQL deals with projections on partitioned tables.
sort_optimization.sql.in
Expand Down
75 changes: 75 additions & 0 deletions test/sql/plan_hypertable_inline.sql
@@ -0,0 +1,75 @@
-- This file and its contents are licensed under the Apache License 2.0.
-- Please see the included NOTICE for copyright information and
-- LICENSE-APACHE for a copy of the license.

-- test hypertable classification when query is in an inlineable function

\set PREFIX 'EXPLAIN (costs off)'

CREATE TABLE test (a int, b bigint NOT NULL);
SELECT create_hypertable('public.test', 'b', chunk_time_interval=>10);
INSERT INTO test SELECT i, i FROM generate_series(1, 20) i;

CREATE OR REPLACE FUNCTION test_f(_ts bigint)
RETURNS SETOF test LANGUAGE SQL STABLE
as $f$
SELECT DISTINCT ON (a) * FROM test WHERE b >= _ts AND b <= _ts + 2
$f$;

-- plans must be the same in both cases
-- specifically, the first plan should not contain the parent hypertable
-- as that is a sign the pruning was not done successfully
:PREFIX SELECT * FROM test_f(5);

:PREFIX SELECT DISTINCT ON (a) * FROM test WHERE b >= 5 AND b <= 5 + 2;
Comment on lines +22 to +24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few negative tests missing here: cases where the pruning should not be done.

  • This was discussed briefly above, but neither of these statements trigger a situation where resultRelation == 0 and commandType != CMD_SELECT so if that is a situation that can occur, it would be good to have a test that you do not mark it for expansion.

  • Since CTEs should not be expanded, please add that as a test case to make sure that it works.

  • Similarly for queries with row marks (SELECT ... FOR UPDATE and friends). Please add tests to make sure that they are not expanded even if they should without the row marks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to have these tests explicitly in this file? These code paths are actually triggered by existing tests in TimescaleDb (see my comment above for one example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps to elaborate a bit more - the code path added allows to do pruning of inlineable functions, but the logic is this a query where a function has been inlined is not used anywhere in the code. Rather, by just postponing the marking of which tables need pruning to a later stage, when inlining has taken place already, we can just do our pruning as usual. This is why existing negative tests actually touch all of these code paths already. They haven't been selected for pruning during the first try in the planner preprocess hook, so they'll get another try in this hook. Whether or not there's an inlineable function doesn't change anything for the negative tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have tests here that explicitly check when and how inlining takes place, both positive (for this query the function should be inlined) and negative (for this function, the query should not be inlined). It is true that this is already done elsewhere, but I think it is useful to have explicit tests for whatever changes you add both to make it clear that you've done the job correctly, but also to make sure that future changes does not cause regression.

It is not necessary to have a query that demonstrate that pruning is done because of the inlining of functions (this happens elsewhere), but having tests demonstrate that the inlining takes place when expected and not when not expected is useful in maintaining the code, so, something like this:

  • One reference query (as you already have added).
  • One query that demonstrate that inlining is done (you have this already).
  • One query that demonstrates that inlining is not done any more if FOR UPDATE is added. Best is probably to reuse the existing one so that it is clear what the difference is when you read the result output.
  • One query that demonstrates that when using a CTE it is not triggered. (I am not entirely sure what kind of query this would be. Superficially it seems like a function in a TE should still be inlined in the "main" query, but an example query would make that clear.)



-- test with FOR UPDATE
CREATE OR REPLACE FUNCTION test_f(_ts bigint)
RETURNS SETOF test LANGUAGE SQL STABLE
as $f$
SELECT * FROM test WHERE b >= _ts AND b <= _ts + 2 FOR UPDATE
$f$;


-- pruning should not be done by TimescaleDb in this case
-- specifically, the parent hypertable must exist in the output plan
:PREFIX SELECT * FROM test_f(5);

:PREFIX SELECT * FROM test WHERE b >= 5 AND b <= 5 + 2 FOR UPDATE;

-- test with CTE
-- these cases are just to make sure we're everything is alright with
-- the way we identify hypertables to prune chunks - we abuse ctename
-- for this purpose. So double-check if we're not breaking plans
-- with CTEs here.
CREATE OR REPLACE FUNCTION test_f(_ts bigint)
RETURNS SETOF test LANGUAGE SQL STABLE
as $f$
WITH ct AS MATERIALIZED (
SELECT DISTINCT ON (a) * FROM test WHERE b >= _ts AND b <= _ts + 2
)
SELECT * FROM ct
$f$;

:PREFIX SELECT * FROM test_f(5);

:PREFIX
WITH ct AS MATERIALIZED (
SELECT DISTINCT ON (a) * FROM test WHERE b >= 5 AND b <= 5 + 2
)
SELECT * FROM ct;

-- CTE within CTE
:PREFIX
WITH ct AS MATERIALIZED (
SELECT * FROM test_f(5)
)
SELECT * FROM ct;

-- CTE within NO MATERIALIZED CTE
:PREFIX
WITH ct AS NOT MATERIALIZED (
SELECT * FROM test_f(5)
)
SELECT * FROM ct;