[Bug]: `INSERT INTO distributed_hypertable SELECT * FROM another_distributed_hypertable;` fails in PG15 #4983

lkshminarayanan · 2022-11-16T09:43:04Z

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Data node, Distributed hypertable

What happened?

Given two distributed_hypertables hyper1 and hyper2, inserting tuples fetched from hyper1 into hyper2 using the INSERT INTO hyper2 SELECT * FROM hyper1 query fails if it was preceded by a INSERT INTO hyper1 ... query.

The query returns the following error :

postgres=# INSERT INTO hyper2 SELECT * FROM hyper1;
ERROR:  request already in progress on port 5432

It should have instead succeeded after inserting the tuples from hyper1 into hyper2.

The data_node testcase in PG15 CI fails due to the same issue.

TimescaleDB version affected

2.9.0

PostgreSQL version used

15.0

What operating system did you use?

Ubuntu 22.04

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

-- Run the following SQL queries in Timescale 2.9.0 on PG15.

SELECT node_name, database, node_created, database_created, extension_created   
FROM add_data_node('data_node_1', host => 'localhost', database => 'TESTDB_1');
SELECT node_name, database, node_created, database_created, extension_created
FROM add_data_node('data_node_2', host => 'localhost', database => 'TESTDB_2');

CREATE TABLE hyper1 (time timestamptz, location int, temp float);               
CREATE TABLE hyper2 (LIKE hyper1); 

SELECT create_distributed_hypertable('hyper1', 'time', 'location');
SELECT create_distributed_hypertable('hyper2', 'time', 'location');

INSERT INTO hyper1 VALUES ('2022-01-01 00:00:00', 10, 23.45);
INSERT INTO hyper2 SELECT * FROM hyper1;

The text was updated successfully, but these errors were encountered:

lkshminarayanan · 2022-11-16T09:49:08Z

As mentioned in description, the data_node testcase fails in PG15 due to the same issue but it doesn't fail in PG14. Based on the test postmaster logs, I think the following is the reason

PG14 implements INSERT INTO hyper2 SELECT * FROM hyper1; as a multiple INSERT INTO public.hyper2("time", location, temp) VALUES ($1, $2, $3) but
PG15 implements it as multiple COPY public.hyper1 ("time", location, temp) FROM STDIN WITH (FORMAT binary).

And the COPY fails due to some connection object that has a wrong status.

lkshminarayanan · 2022-11-16T10:29:30Z

One possible reason is, the previous insert, i.e. INSERT INTO hyper1 VALUES ('2022-01-01 00:00:00', 10, 23.45);, is not resetting the connection status once it completes.

lkshminarayanan · 2022-11-16T11:54:16Z

One possible reason is, the previous insert, i.e. INSERT INTO hyper1 VALUES ('2022-01-01 00:00:00', 10, 23.45);, is not resetting the connection status once it completes.

This is not the case, I have verified it - the previous INSERT resets the status properly.

The issue happens because the get_copy_connection_to_data_node gets an idle connection and then calls remote_connection_begin_copy to start copy on it, as a part of which the connection's status is updated to CONN_COPY_IN. But later somewhere for some reason get_single_response_nonblocking invoked with the same connection leading to this error.

sb230132 · 2022-11-17T05:29:18Z

Thanks @lkshminarayanan for some insights into the problem. I thought of giving a try and found few more details about this issue.

Query which is failing is:

postgres=# INSERT INTO hyper2 SELECT * FROM hyper1;
ERROR:  request already in progress on port 5432

Above issue is specific to PG15. As you said, above query is creating a Custom Scan node which is of type DataNodeCopy.
In comparison to PG14, same query generates, Custom Scan node which is of type DataNodeDispatch.
This is the main difference between PG14 and PG15.

What i notice is COPY is not allowed for INSERT.. SELECT query.
In file tsl/src/planner.c this function tsl_create_distributed_insert_path() decides if planner should generate a COPY node or DISPATCH node.
On PG15 above functions decides to go with generation of COPY node. This is because in below code rte->subquery is NULL.

if (distributed_rtes_walker((Node *) rte->subquery, &distributed) &&
							distributed)

Further debugged to find out why rte->subquery is NULL and found that on PG15, there seem to be a commit which improves performance of subqueries and as part of this commit they set rte->subquery = NULL.
Once a subquery is flattened, there is no use to keep rte->subquery referring to subqueries, thus they set it to NULL.
Here is PG15 commit which makes this change.

commit 64919aaab45076445051245c9bcd48dd84abebe7
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Jul 6 14:23:09 2021 -0400

    Reduce the cost of planning deeply-nested views.
    
    Joel Jacobson reported that deep nesting of trivial (flattenable)
    views results in O(N^3) growth of planning time for N-deep nesting.
    It turns out that a large chunk of this cost comes from copying around
    the "subquery" sub-tree of each view's RTE_SUBQUERY RTE.  But once we
    have successfully flattened the subquery, we don't need that anymore,
    because the planner isn't going to do anything else interesting with
    that RTE.  We already zap the subquery pointer during setrefs.c (cf.
    add_rte_to_flat_rtable), but it's useless baggage earlier than that
    too.  Clearing the pointer as soon as pull_up_simple_subquery is done
    with the RTE reduces the cost from O(N^3) to O(N^2); which is still
    not great, but it's quite a lot better.  Further improvements will
    require rethinking of the RTE data structure, which is being considered
    in another thread.
    
    Patch by me; thanks to Dean Rasheed for review.
    
    Discussion: https://postgr.es/m/797aff54-b49b-4914-9ff9-aa42564a4d7d@www.fastmail.com

To fix this issue, we can no more rely on rte->subquery to decide if subquery has distributed hypertables.

sb230132 · 2022-11-17T06:41:07Z

Below patch along with patch from #4954 fixes data_node test.

@@ -315,12 +315,37 @@ tsl_create_distributed_insert_path(PlannerInfo *root, ModifyTablePath *mtpath, I
                                        if (rte->rtekind == RTE_SUBQUERY)
                                        {
                                                distributed = false;
+#if PG15_GE
+                                               Node *jtnode = (Node *) root->parse->jointree;
+                                               if (IsA(jtnode, FromExpr))
+                                               {
+                                                       FromExpr *f = (FromExpr *) jtnode;
+                                                       ListCell *l;
+                                                       foreach (l, f->fromlist)
+                                                       {
+                                                               Node *n = (Node *) lfirst(l);
+                                                               if (IsA(n, RangeTblRef))
+                                                               {
+                                                                       RangeTblEntry *r =
+                                                                               planner_rt_fetch(((RangeTblRef *) n)->rtindex, root);
+                                                                       if (r->rtekind == RTE_RELATION &&
+                                                                               distributed_rtes_walker((Node *) r, &distributed) &&
+                                                                               distributed)
+                                                                       {
+                                                                               copy_possible = false;
+                                                                               break;
+                                                                       }
+                                                               }
+                                                       }
+                                               }
+#else
                                                if (distributed_rtes_walker((Node *) rte->subquery, &distributed) &&
                                                        distributed)
                                                {
                                                        copy_possible = false;
                                                        break;
                                                }
+#endif
                                        }
                                }
                        }

lkshminarayanan · 2022-11-17T08:30:18Z

@sb230132 Nice! The test case dist_hypertable-15 also fails with the same issue. In addition to the failure, it also has a few other explain output mismatches but looks like atleast some of them are related to this issue.

INSERT .. SELECT query containing distributed hypertables generates plan with DataNodeCopy node which is not supported. Issue is in function tsl_create_distributed_insert_path() where we decide if we should generate DataNodeCopy or DataNodeDispatch node based on the kind of query. In PG15 for INSERT .. SELECT query timescaledb planner generates DataNodeCopy as rte->subquery is set to NULL. This is because of a commit in PG15 where rte->subquery is set to NULL as part of a fix. This patch checks if SELECT subquery has distributed hypertables or not by looking into root->parse->jointree which represents subquery. Fixes timescale#4983

INSERT .. SELECT query containing distributed hypertables generates plan with DataNodeCopy node which is not supported. Issue is in function tsl_create_distributed_insert_path() where we decide if we should generate DataNodeCopy or DataNodeDispatch node based on the kind of query. In PG15 for INSERT .. SELECT query timescaledb planner generates DataNodeCopy as rte->subquery is set to NULL. This is because of a commit in PG15 where rte->subquery is set to NULL as part of a fix. This patch checks if SELECT subquery has distributed hypertables or not by looking into root->parse->jointree which represents subquery. Fixes #4983

lkshminarayanan added bug pg15 labels Nov 16, 2022

sb230132 self-assigned this Nov 17, 2022

sb230132 mentioned this issue Nov 17, 2022

INSERT .. SELECT on distributed hypertable fails on PG15 #4990

Merged

sb230132 closed this as completed in #4990 Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: `INSERT INTO distributed_hypertable SELECT * FROM another_distributed_hypertable;` fails in PG15 #4983

[Bug]: `INSERT INTO distributed_hypertable SELECT * FROM another_distributed_hypertable;` fails in PG15 #4983

lkshminarayanan commented Nov 16, 2022 •

edited

lkshminarayanan commented Nov 16, 2022 •

edited

lkshminarayanan commented Nov 16, 2022 •

edited

lkshminarayanan commented Nov 16, 2022 •

edited

sb230132 commented Nov 17, 2022

sb230132 commented Nov 17, 2022

lkshminarayanan commented Nov 17, 2022

[Bug]: INSERT INTO distributed_hypertable SELECT * FROM another_distributed_hypertable; fails in PG15 #4983

[Bug]: INSERT INTO distributed_hypertable SELECT * FROM another_distributed_hypertable; fails in PG15 #4983

Comments

lkshminarayanan commented Nov 16, 2022 • edited

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

lkshminarayanan commented Nov 16, 2022 • edited

lkshminarayanan commented Nov 16, 2022 • edited

lkshminarayanan commented Nov 16, 2022 • edited

sb230132 commented Nov 17, 2022

sb230132 commented Nov 17, 2022

lkshminarayanan commented Nov 17, 2022

[Bug]: `INSERT INTO distributed_hypertable SELECT * FROM another_distributed_hypertable;` fails in PG15 #4983

[Bug]: `INSERT INTO distributed_hypertable SELECT * FROM another_distributed_hypertable;` fails in PG15 #4983

lkshminarayanan commented Nov 16, 2022 •

edited

lkshminarayanan commented Nov 16, 2022 •

edited

lkshminarayanan commented Nov 16, 2022 •

edited

lkshminarayanan commented Nov 16, 2022 •

edited