Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: INSERT INTO distributed_hypertable SELECT * FROM another_distributed_hypertable; fails in PG15 #4983

Closed
lkshminarayanan opened this issue Nov 16, 2022 · 6 comments · Fixed by #4990
Assignees

Comments

@lkshminarayanan
Copy link
Contributor

lkshminarayanan commented Nov 16, 2022

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Data node, Distributed hypertable

What happened?

Given two distributed_hypertables hyper1 and hyper2, inserting tuples fetched from hyper1 into hyper2 using the INSERT INTO hyper2 SELECT * FROM hyper1 query fails if it was preceded by a INSERT INTO hyper1 ... query.

The query returns the following error :

postgres=# INSERT INTO hyper2 SELECT * FROM hyper1;
ERROR:  request already in progress on port 5432

It should have instead succeeded after inserting the tuples from hyper1 into hyper2.

The data_node testcase in PG15 CI fails due to the same issue.

TimescaleDB version affected

2.9.0

PostgreSQL version used

15.0

What operating system did you use?

Ubuntu 22.04

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

-- Run the following SQL queries in Timescale 2.9.0 on PG15.

SELECT node_name, database, node_created, database_created, extension_created   
FROM add_data_node('data_node_1', host => 'localhost', database => 'TESTDB_1');
SELECT node_name, database, node_created, database_created, extension_created
FROM add_data_node('data_node_2', host => 'localhost', database => 'TESTDB_2');

CREATE TABLE hyper1 (time timestamptz, location int, temp float);               
CREATE TABLE hyper2 (LIKE hyper1); 

SELECT create_distributed_hypertable('hyper1', 'time', 'location');
SELECT create_distributed_hypertable('hyper2', 'time', 'location');

INSERT INTO hyper1 VALUES ('2022-01-01 00:00:00', 10, 23.45);
INSERT INTO hyper2 SELECT * FROM hyper1;
@lkshminarayanan
Copy link
Contributor Author

lkshminarayanan commented Nov 16, 2022

As mentioned in description, the data_node testcase fails in PG15 due to the same issue but it doesn't fail in PG14. Based on the test postmaster logs, I think the following is the reason

  • PG14 implements INSERT INTO hyper2 SELECT * FROM hyper1; as a multiple INSERT INTO public.hyper2("time", location, temp) VALUES ($1, $2, $3) but
  • PG15 implements it as multiple COPY public.hyper1 ("time", location, temp) FROM STDIN WITH (FORMAT binary).

And the COPY fails due to some connection object that has a wrong status.

@lkshminarayanan
Copy link
Contributor Author

lkshminarayanan commented Nov 16, 2022

One possible reason is, the previous insert, i.e. INSERT INTO hyper1 VALUES ('2022-01-01 00:00:00', 10, 23.45);, is not resetting the connection status once it completes.

@lkshminarayanan
Copy link
Contributor Author

lkshminarayanan commented Nov 16, 2022

One possible reason is, the previous insert, i.e. INSERT INTO hyper1 VALUES ('2022-01-01 00:00:00', 10, 23.45);, is not resetting the connection status once it completes.

This is not the case, I have verified it - the previous INSERT resets the status properly.

The issue happens because the get_copy_connection_to_data_node gets an idle connection and then calls remote_connection_begin_copy to start copy on it, as a part of which the connection's status is updated to CONN_COPY_IN. But later somewhere for some reason get_single_response_nonblocking invoked with the same connection leading to this error.

@sb230132 sb230132 self-assigned this Nov 17, 2022
@sb230132
Copy link
Contributor

Thanks @lkshminarayanan for some insights into the problem. I thought of giving a try and found few more details about this issue.

Query which is failing is:

postgres=# INSERT INTO hyper2 SELECT * FROM hyper1;
ERROR:  request already in progress on port 5432

Above issue is specific to PG15. As you said, above query is creating a Custom Scan node which is of type DataNodeCopy.
In comparison to PG14, same query generates, Custom Scan node which is of type DataNodeDispatch.
This is the main difference between PG14 and PG15.

What i notice is COPY is not allowed for INSERT.. SELECT query.
In file tsl/src/planner.c this function tsl_create_distributed_insert_path() decides if planner should generate a COPY node or DISPATCH node.
On PG15 above functions decides to go with generation of COPY node. This is because in below code rte->subquery is NULL.

if (distributed_rtes_walker((Node *) rte->subquery, &distributed) &&
							distributed)

Further debugged to find out why rte->subquery is NULL and found that on PG15, there seem to be a commit which improves performance of subqueries and as part of this commit they set rte->subquery = NULL.
Once a subquery is flattened, there is no use to keep rte->subquery referring to subqueries, thus they set it to NULL.
Here is PG15 commit which makes this change.

commit 64919aaab45076445051245c9bcd48dd84abebe7
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Jul 6 14:23:09 2021 -0400

    Reduce the cost of planning deeply-nested views.
    
    Joel Jacobson reported that deep nesting of trivial (flattenable)
    views results in O(N^3) growth of planning time for N-deep nesting.
    It turns out that a large chunk of this cost comes from copying around
    the "subquery" sub-tree of each view's RTE_SUBQUERY RTE.  But once we
    have successfully flattened the subquery, we don't need that anymore,
    because the planner isn't going to do anything else interesting with
    that RTE.  We already zap the subquery pointer during setrefs.c (cf.
    add_rte_to_flat_rtable), but it's useless baggage earlier than that
    too.  Clearing the pointer as soon as pull_up_simple_subquery is done
    with the RTE reduces the cost from O(N^3) to O(N^2); which is still
    not great, but it's quite a lot better.  Further improvements will
    require rethinking of the RTE data structure, which is being considered
    in another thread.
    
    Patch by me; thanks to Dean Rasheed for review.
    
    Discussion: https://postgr.es/m/797aff54-b49b-4914-9ff9-aa42564a4d7d@www.fastmail.com

To fix this issue, we can no more rely on rte->subquery to decide if subquery has distributed hypertables.

@sb230132
Copy link
Contributor

Below patch along with patch from #4954 fixes data_node test.

@@ -315,12 +315,37 @@ tsl_create_distributed_insert_path(PlannerInfo *root, ModifyTablePath *mtpath, I
                                        if (rte->rtekind == RTE_SUBQUERY)
                                        {
                                                distributed = false;
+#if PG15_GE
+                                               Node *jtnode = (Node *) root->parse->jointree;
+                                               if (IsA(jtnode, FromExpr))
+                                               {
+                                                       FromExpr *f = (FromExpr *) jtnode;
+                                                       ListCell *l;
+                                                       foreach (l, f->fromlist)
+                                                       {
+                                                               Node *n = (Node *) lfirst(l);
+                                                               if (IsA(n, RangeTblRef))
+                                                               {
+                                                                       RangeTblEntry *r =
+                                                                               planner_rt_fetch(((RangeTblRef *) n)->rtindex, root);
+                                                                       if (r->rtekind == RTE_RELATION &&
+                                                                               distributed_rtes_walker((Node *) r, &distributed) &&
+                                                                               distributed)
+                                                                       {
+                                                                               copy_possible = false;
+                                                                               break;
+                                                                       }
+                                                               }
+                                                       }
+                                               }
+#else
                                                if (distributed_rtes_walker((Node *) rte->subquery, &distributed) &&
                                                        distributed)
                                                {
                                                        copy_possible = false;
                                                        break;
                                                }
+#endif
                                        }
                                }
                        }

@lkshminarayanan
Copy link
Contributor Author

@sb230132 Nice! The test case dist_hypertable-15 also fails with the same issue. In addition to the failure, it also has a few other explain output mismatches but looks like atleast some of them are related to this issue.

sb230132 added a commit to sb230132/timescaledb that referenced this issue Nov 17, 2022
INSERT .. SELECT query containing distributed hypertables generates plan
with DataNodeCopy node which is not supported. Issue is in function
tsl_create_distributed_insert_path() where we decide if we should
generate DataNodeCopy or DataNodeDispatch node based on the kind of
query. In PG15 for INSERT .. SELECT query timescaledb planner generates
DataNodeCopy as rte->subquery is set to NULL. This is because of a commit
in PG15 where rte->subquery is set to NULL as part of a fix.

This patch checks if SELECT subquery has distributed hypertables or not
by looking into root->parse->jointree which represents subquery.

Fixes timescale#4983
sb230132 added a commit to sb230132/timescaledb that referenced this issue Nov 17, 2022
INSERT .. SELECT query containing distributed hypertables generates plan
with DataNodeCopy node which is not supported. Issue is in function
tsl_create_distributed_insert_path() where we decide if we should
generate DataNodeCopy or DataNodeDispatch node based on the kind of
query. In PG15 for INSERT .. SELECT query timescaledb planner generates
DataNodeCopy as rte->subquery is set to NULL. This is because of a commit
in PG15 where rte->subquery is set to NULL as part of a fix.

This patch checks if SELECT subquery has distributed hypertables or not
by looking into root->parse->jointree which represents subquery.

Fixes timescale#4983
sb230132 added a commit to sb230132/timescaledb that referenced this issue Nov 17, 2022
INSERT .. SELECT query containing distributed hypertables generates plan
with DataNodeCopy node which is not supported. Issue is in function
tsl_create_distributed_insert_path() where we decide if we should
generate DataNodeCopy or DataNodeDispatch node based on the kind of
query. In PG15 for INSERT .. SELECT query timescaledb planner generates
DataNodeCopy as rte->subquery is set to NULL. This is because of a commit
in PG15 where rte->subquery is set to NULL as part of a fix.

This patch checks if SELECT subquery has distributed hypertables or not
by looking into root->parse->jointree which represents subquery.

Fixes timescale#4983
sb230132 added a commit to sb230132/timescaledb that referenced this issue Nov 17, 2022
INSERT .. SELECT query containing distributed hypertables generates plan
with DataNodeCopy node which is not supported. Issue is in function
tsl_create_distributed_insert_path() where we decide if we should
generate DataNodeCopy or DataNodeDispatch node based on the kind of
query. In PG15 for INSERT .. SELECT query timescaledb planner generates
DataNodeCopy as rte->subquery is set to NULL. This is because of a commit
in PG15 where rte->subquery is set to NULL as part of a fix.

This patch checks if SELECT subquery has distributed hypertables or not
by looking into root->parse->jointree which represents subquery.

Fixes timescale#4983
sb230132 added a commit to sb230132/timescaledb that referenced this issue Nov 17, 2022
INSERT .. SELECT query containing distributed hypertables generates plan
with DataNodeCopy node which is not supported. Issue is in function
tsl_create_distributed_insert_path() where we decide if we should
generate DataNodeCopy or DataNodeDispatch node based on the kind of
query. In PG15 for INSERT .. SELECT query timescaledb planner generates
DataNodeCopy as rte->subquery is set to NULL. This is because of a commit
in PG15 where rte->subquery is set to NULL as part of a fix.

This patch checks if SELECT subquery has distributed hypertables or not
by looking into root->parse->jointree which represents subquery.

Fixes timescale#4983
sb230132 added a commit to sb230132/timescaledb that referenced this issue Nov 17, 2022
INSERT .. SELECT query containing distributed hypertables generates plan
with DataNodeCopy node which is not supported. Issue is in function
tsl_create_distributed_insert_path() where we decide if we should
generate DataNodeCopy or DataNodeDispatch node based on the kind of
query. In PG15 for INSERT .. SELECT query timescaledb planner generates
DataNodeCopy as rte->subquery is set to NULL. This is because of a commit
in PG15 where rte->subquery is set to NULL as part of a fix.

This patch checks if SELECT subquery has distributed hypertables or not
by looking into root->parse->jointree which represents subquery.

Fixes timescale#4983
sb230132 added a commit that referenced this issue Nov 17, 2022
INSERT .. SELECT query containing distributed hypertables generates plan
with DataNodeCopy node which is not supported. Issue is in function
tsl_create_distributed_insert_path() where we decide if we should
generate DataNodeCopy or DataNodeDispatch node based on the kind of
query. In PG15 for INSERT .. SELECT query timescaledb planner generates
DataNodeCopy as rte->subquery is set to NULL. This is because of a commit
in PG15 where rte->subquery is set to NULL as part of a fix.

This patch checks if SELECT subquery has distributed hypertables or not
by looking into root->parse->jointree which represents subquery.

Fixes #4983
SachinSetiya pushed a commit that referenced this issue Nov 28, 2022
INSERT .. SELECT query containing distributed hypertables generates plan
with DataNodeCopy node which is not supported. Issue is in function
tsl_create_distributed_insert_path() where we decide if we should
generate DataNodeCopy or DataNodeDispatch node based on the kind of
query. In PG15 for INSERT .. SELECT query timescaledb planner generates
DataNodeCopy as rte->subquery is set to NULL. This is because of a commit
in PG15 where rte->subquery is set to NULL as part of a fix.

This patch checks if SELECT subquery has distributed hypertables or not
by looking into root->parse->jointree which represents subquery.

Fixes #4983
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants