INSERT on a distributed hypertable is significantly slower than COPY #3025

ryanbooz · 2021-03-12T21:49:27Z

Relevant system information:

OS: Timescale Forge (although non-Forge experiences similar performance)
PostgreSQL version: 12.5
TimescaleDB version: 2.0.1
Installation method: Timescale Forge

Describe the bug
Using INSERT on a distributed hypertable is significantly slower than inserting into a regular hypertable with INSERT. Based on feedback in the linked Slack conversation, I tested INSERT/COPY performance using TSBS on a regular and distributed hypertable. Across 8 different tests (results below), it is evident that INSERT actually performs worse on a distributed hypertable performs worse than a single-node. Even though we expect a batched INSERT statement to be slower than COPY, the expectation is that more nodes would still, in some way, improve ingest performance with INSERT in a similar fashion as COPY. That does not appear to be the case.

All tests inserted 36,000,000 rows, in batches of 4,000 rows, with TSBS from a pre-generated file (using FILEmode)

Single-node, regular hypertable (8cpu/32gb)

Number of Workers	Method	Performance
1	COPY	loaded 36000000 rows in 470.899sec with 1 workers (mean rate 76449.48 rows/sec)
1	INSERT	loaded 36000000 rows in 1144.999sec with 1 workers (mean rate 31441.09 rows/sec)
10	COPY	loaded 36000000 rows in 119.607sec with 10 workers (mean rate 300985.30 rows/sec)
10	INSERT	loaded 36000000 rows in 195.681sec with 10 workers (mean rate 183972.47 rows/sec)

Multi-node (1AN/4DN), distributed hypertable (4cpu/16gb)

Number of Workers	Method	Performance
1	COPY	loaded 36000000 rows in 403.772sec with 1 workers (mean rate 89159.29 rows/sec)
1	INSERT	loaded 36000000 rows in 2043.873sec with 1 workers (mean rate 17613.62 rows/sec)
10	COPY	loaded 36000000 rows in 95.850sec with 10 workers (mean rate 375585.24 rows/sec)
10	INSERT	loaded 36000000 rows in 272.150sec with 10 workers (mean rate 132280.02 rows/sec)

Upon further discussion, the working theory is that parameter binding on the Access Node is the bottleneck (understandable) and using something like Promscale does with array parameters instead could significantly improve performance in a distributed transaction.

Examples: INSERT INTO metric_table(time, value, series_id) SELECT * FROM unnest($1::TIMESTAMPTZ[], $2::DOUBLE PRECISION[], $3::BIGINT[]) a(t,v,s) ORDER BY s,t ON CONFLICT DO NOTHING

To Reproduce
There is sample code linked in the Slack conversation, although it was fairly inefficient locally to produce the results. Therefore I generated large files (36,000,000 rows each) for four different, consecutive hours. Using those four files, I tested the scenarios listed above on both a regular and distributed hypertable.

All ingests (tsbs_load) were done with a batch size of 4,000 rows because TSBS currently disallows batches above some ratio (for this test I couldn't go higher than ~4,300 rows/batch). To keep the tests even, I chose to do both COPY and INSERT with 4,000 rows.

Expected behavior
Even knowing that INSERT will carry a higher overhead than COPY in both regular and distributed hypertable inserts, the ratio of performance is expected to increase with additional data nodes (as we generally see with COPY performance).

Actual behavior
In this case, both INSERT tests were slower on multi-node than a single-node, even allowing for the smaller instance sizes.
The 1 worker tests, in particular, show a ~2x decrease in performance for multinode compared to single node. They also show a ~5x decrease in performance compared to copy whereas the same test on a regular hypertable only has a ~2x decrease in performance.

Even the 10 worker case shows a significant decrease for multinode vs singlenode, though it's harder to determine whether that's due to fewer cores on the AN than on the singlenode box.

Additional context
Original Slack conversation (it's a long thread with both @davidkohn88 and @k-rus involved): https://timescaledb.slack.com/archives/CPSD82EMU/p1614670969056700

Note that the situation here is a relatively common one, where workers are inserting from specific shards of a stream, the number of workers is based on the shards of the stream and can't be easily increased. The performance hit of going to multinode compared to singlenode in this case was so much (especially for wide tables) that the user had to stop using multinode and is looking for options other than Timescale, though they are otherwise happy. They need multinode for scaling out storage, and would have to implement some sort of two phase transactional inserts in order to use more than one worker per shard.

The text was updated successfully, but these errors were encountered:

davidkohn88 · 2021-03-16T18:00:20Z

I think this is most likely related to the parse/bind inefficiency:

timescaledb/tsl/src/data_node_dispatch.c

Line 195 in 10b3da0

* Tuples destined for a data node are batched in a tuple store until dispatched

Note that we generate predefined prepared statements and both generating them on the access and parsing them on the data node have significant overhead. We ran into this trouble with Promscale and got around it by using a trick with arrays where we sent arrays over the wire and had a prepared statement that unnested arrays rather than preparing for each input, you now have one variable length input per column: https://github.com/timescale/promscale/blob/d6a724027a90b69230aa4a77e93333845c0ca8e6/pkg/pgmodel/ingestor/insert.go#L357

The benefits for switching would be signficant, as all the code to determine whether we're in the fixed batch size case or if we're at the ends and just need to use the normal size would go away as the prepared statement would remain the same and the size of the array sent would change. Additionally, for prepared statements that are used only once (which will likely be the common case), the size of the prepared statement being sent over the wire can be as big or bigger than the data, this would significantly reduce the size of the prepared statement.

All types in postgres have arrays associated with them by default, so using this should be quite general-purpose.

cevian · 2021-03-24T14:35:42Z

Yes, in promscale we found that doing INSERT INTO metric_table(time, value, series_id) SELECT * FROM unnest($1::TIMESTAMPTZ[], $2::DOUBLE PRECISION[], $3::BIGINT[]) a(t,v,s) is much more efficient than INSERT INTO ... VALUES (...) there may be significant gains for switching to the former for the AN->DN communication.

erimatnor · 2021-04-13T12:28:35Z

Currently doing POC of the approach suggested by @davidkohn88 and @cevian.

A couple of issues arose so far with composite types and array types.

Composite/row types

When composite types are in an array and then unnested, they are unnested as separate columns. For instance:

# create type custom_type as (high int, low int);
CREATE TYPE

# select * from unnest(ARRAY[1, 2], ARRAY[(1,2)::custom_type,(3,5)::custom_type]) a;
 unnest | high | low 
--------+------+-----
      1 |    1 |   2
      2 |    3 |   5

So, if original table only had two columns, then unnested statement has three and the insert will fail. This can be handled by recreating the composite type in the target list:

# select x, (y,z) from unnest(ARRAY[1, 2], ARRAY[(1,2)::custom_type,(3,5)::custom_type]) a(x,y,z);
 x |  row  
---+-------
 1 | (1,2)
 2 | (3,5)
(2 rows)

However, this makes creating/deparsing the INSERT statement to send to the remote node a lot more complicated. We need to check for custom types and recreate them in the target list and we also need to ensure nested composite types works. It seems to work by itself:

# create type custom_type2 as (high int, low custom_type);
CREATE TYPE

# select * from unnest(ARRAY[1, 2], ARRAY[(1,(2, 3))::custom_type2,(3,(5,6))::custom_type2]) a;
 unnest | high |  low  
--------+------+-------
      1 |    1 | (2,3)
      2 |    3 | (5,6)

Array types

When a table has an array column, the unnest in the generated INSERT statement will unnest the arrays in arrays:

# select * from unnest(ARRAY[ARRAY[0,1], ARRAY[2,3]], ARRAY[ARRAY[4,5], ARRAY[6,7]]);
 unnest | unnest 
--------+--------
      0 |      4
      1 |      5
      2 |      6
      3 |      7
(4 rows)

This is obviously not the desired behavior as we expect the inner arrays to not be unnested. Currently don't have a good way to deal with this. It might not be an issue if the data types are serialized or parameterized:

# select * from unnest(ARRAY['{1,2}', '{2,3}'], ARRAY['{4,5}', '{6,7}']);
 unnest | unnest 
--------+--------
 {1,2}  | {4,5}
 {2,3}  | {6,7}
(2 rows)

erimatnor · 2021-04-16T12:18:28Z

Found a potential show-stopper for the array approach when using custom types. This is actually a more general issue that affects multi-node in general when custom types are used in arrays.

The issue is that the Oid of the array element type is serialized as part of send function of the array implementation. This leads to issues with custom types since the type's Oid typically differs between databases, given that CREATE TYPE has to be done in each database.

For instance, this simple test case fails on a distributed hypertable but works on a regular hypertable:

CREATE TABLE arrtable (time timestamptz, val mytype[]);
CREATE TABLE
SELECT create_distributed_hypertable('arrtable', 'time');
NOTICE:  adding not-null constraint to column "time"
DETAIL:  Time dimensions cannot have NULL values.
 create_distributed_hypertable 
-------------------------------
 (44,public,arrtable,t)
(1 row)

INSERT INTO arrtable VALUES (now(), ARRAY[(1,1)::mytype]);
ERROR:  [data_node_1]: wrong element type: 71230 vs 71091

IMO, this is actually a bug in PostgreSQL. A thread on psql-hackers also brings up this issue: https://www.postgresql.org/message-id/flat/20160426.132000.16866976.horiguchi.kyotaro%40lab.ntt.co.jp#761e7ec4f277b956232487091c8db8cd

davidkohn88 · 2021-04-16T18:13:08Z

Ugh. Well, that's annoying. I wonder if a custom serialization type that wasn't an array and an unpacking function that had the proper information would work? It'd follow the same basic approach (send things over as a blob rather than as deparsing the whole thing) and could basically just be a list of columns / types and the values list as a blob? It might be a lot simpler than the array approach anyway wouldn't have to do the per-column stuff etc...I don't know if that would run into SQL injection issues, but I don't think it would as we've already done the prepared statement etc on the AN side to do the routing.

I guess the real question is if during the POC were you able to get some numbers on more simple tables in similar situations to the one reported (ie a lot of columns / med-large number of rows per batch) to see what the impact of the approach would be on performance and whether it would be worth pursuing another way of accomplishing this? It seems like that might be accomplish-able with what you've got in order to make a go/no-go decision on a different approach with a similar philosophy...

One more thought:
What if we made a special node that produced rows by reading them from a socket/connection or whatever much like copy does when it's reading over the wire, so then you'd have a function that read waited to read rows and would produce them up the chain? (It seems like this also could be very useful for distributing certain kinds of work do the data nodes rather than doing it on the AN in the future...)

Something like:

INSERT INTO metric_table(time, value, series_id) SELECT * FROM row_producer(pid, context, whatever) a ( b, d, e) then row producer could just get a pipe of rows and go?

(I don't know how crazy this is to be honest, just spitballing here, I seem to remember that Greenplum has something like this for joins....where they can join to a table that exists on the access node equivalent by basically rewriting it as a SRF on the data node equivalent and shuffling the data over, moving the smaller table to the bigger data rather than the bigger to the smaller).

erimatnor · 2021-04-19T12:33:07Z

@davidkohn88 The POC revealed a good performance boost (~50%), but it is hard to know exactly if this is down only to the array-approach vs. other refactorings/improvements. For example, also found some improvements by simply doing more effecient async handling across multiple node. But I think we can assume the array/columnar approach is worthwhile if we can get it to work reliably.

The idea of a custom "container" type has come up as one potential solution. Another option would be to go with a hybrid approach where we fall back to the non-columnar insert for queries with custom types. But that's also messy and requires maintaining two code paths. We could also revert to "text" encoding in case of custom types.

erimatnor · 2021-04-19T12:34:48Z

Wanted to check if this is a problem for postgres_fdw, so created this test script:

DROP DATABASE IF EXISTS front;
DROP DATABASE IF EXISTS back;

CREATE DATABASE back;
\c back
DROP ROLE fuser;
CREATE ROLE fuser LOGIN PASSWORD 'foo';
CREATE TYPE mytype AS (low int, high int);
SET ROLE fuser;
CREATE TABLE farr (time timestamptz, val mytype[]);
RESET ROLE;
CREATE DATABASE front;
\c front
CREATE TYPE mytype AS (low int, high int);
CREATE EXTENSION postgres_fdw;
CREATE SERVER IF NOT EXISTS server1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'back', port '5432');
CREATE FOREIGN TABLE farr (time timestamptz, val mytype[]) SERVER server1;
CREATE USER MAPPING IF NOT EXISTS FOR PUBLIC SERVER server1 OPTIONS (user 'fuser', password 'foo');
INSERT INTO farr VALUES (now(), ARRAY[(1,1)::mytype]);

I realized, however, that postgres_fdw doesn't use binary encoding so doesn't suffer from this problem for that reason. I don't we want to use text encoding, at least not by default.

erimatnor · 2021-04-19T12:55:03Z

Looks like we can revert to text serialization just for the array that has a custom element type.

erimatnor · 2021-04-19T12:55:36Z

Closed by mistake, so reopened.

erimatnor · 2021-04-21T13:55:13Z

Array column types seems to pose a problem. For example:

CREATE TYPE cst AS (low int, high int);

CREATE TABLE single (time timestamptz, low cst, high cst[]);
SELECT create_hypertable('single', 'time');
PREPARE colinsert AS INSERT INTO single (time, low, high)
SELECT unnest($1::timestamptz[]), unnest($2::cst[]), unnest($3::cst[][]);

EXECUTE colinsert (ARRAY[now()], ARRAY[(1,2)::cst], ARRAY[ARRAY[(1,2)::cst]]);
ERROR:  column "high" is of type cst[] but expression is of type cst
LINE 2: ...ECT unnest($1::timestamptz[]), unnest($2::cst[]), unnest($3:...

This is because the unnest function unnests also arrays within arrays:

# select unnest(ARRAY[ARRAY[1,2]]);
 unnest 
--------
      1
      2
(2 rows)

@davidkohn88 Any ideas on this? Seems like a show stopper for this idea. I think we need to implement our own container type where we can properly roll up and unwind array elements.

davidkohn88 · 2021-04-21T14:51:11Z

Yea...those array functions leave a lot to be desired. And the behaviour of all multi-dimensional arrays is truly weird. It'd either be a container type, we could do this for scalars and fall back to the old behavior for arrays (which, while annoying, would cover like 99% of users anyway), we could use array types and write a special unnest-like function that just unnest-ed the first dimension of an array, which might be something we could contribute back upstream.

I don't know the scale of the problems with this approach, but I do wonder if the SRF that produces tuples from a stream from the AN is feasible? I think that one has major benefits in general and would speed a lot of things up and make for fewer memory management issues etc...but it might be such a PITA that it's not worth it...(though I do think that that would have applications well beyond the insert path, it would be reusable for joins on the DN and other bits that could be quite helpful for query optimization down the line).

erimatnor · 2021-04-22T07:34:06Z

I think using COPY between the AN and DNs, even in the case of INSERTs, makes sense and would essentially be the "stream" approach. There's an old issue for it. The only problem with using COPY is that it doesn't support things like ON CONFLICT, so we'd have to fallback to something else in any case.

ryanbooz · 2021-04-22T15:24:18Z

Just wanted to say that I enjoy watching you guys work this through. I'm learning a ton. Thanks for caring and working to find a great approach!

davidkohn88 · 2021-04-22T17:22:11Z

That's true, but the ON CONFLICT case is incredibly important, a lot of people use it and it's a huge benefit to the product. The other options are all much, much slower in many cases, so I'd be hesitant to make it so much slower than the normal INSERT path.
I do wonder whether you could reuse the copy infrastructure to make this streaming SRF thing,

The statement would look something like INSERT INTO foo (a, b, c) SELECT * FROM stream_srf('params') as x(a timestamptz, b int, c text) ON CONFLICT DO NOTHING the set of record means that you send over all the types as text and that should work in most cases, the arrays will probably still need to be sent in text rather than as binary depending on OID stuff, but I'm wondering if the stream_srf thing there could just be wired into the copy code essentially and just pull it out at the right time and produce tuples? I don't know if that makes any sense at all, but that was my sort of thought.

Is that a remotely feasible approach? I don't know what sort of roadblocks would be in the way for that one, but I guess some sort of communication between the front end PID and the back end PID is the problem?

Either way, I can see doing either the COPY approach, and fall back when doing ON CONFLICT or the ARRAY approach and fall back when doing inserts of other arrays as a stopgap for now and maybe moving to this approach later? If I were choosing between the two I'd probably choose the second as I think far more people use ON CONFLICT than insert arrays into hypertables...

erimatnor · 2021-04-23T11:30:53Z

Currently switching to pursue COPY approach. It is fairly simple approach and is a low-hanging fruit for good performance until we can invest more time in a custom function for columnar rollups.

cevian · 2021-04-23T15:23:56Z

@erimatnor the problem is this won't help with Promscale performance as Promscale needs ON CONFLICT. I think writing a unnest in C that undoes just one dimension shouldn't be too hard. see https://stackoverflow.com/questions/8137112/unnest-array-by-one-level and that will create a much more general solution.

Please keep in mind that we do want to benchmark Promscale on multinode fairly soon.

cevian · 2021-04-23T15:51:11Z

It seems like the low-hanging fruit is optimize either:

the inserting non-arrays but supporting ON CONFLICT.
inserts including arrays but without ON CONFLICT.

I'd argue for the former.

erimatnor · 2021-04-23T16:01:07Z

@cevian Even writing a new unnest function doesn't make it 100% general. There's a limit to the number of dimensions an array can have, so wrapping array types in another level of arrays is not possible if the column type already has the maximum number of dimensions. Ideally, we'd have to implement an entirely new container type and/or serialization format for "columnar data".

The dimension limit is somewhat of a corner-case and is likely a limitation we can live with. But it requires extra checks and validation to block such columns from distributed hypertables if we wrap columnar data in arrays.

I've already implemented the unnest approach, and the code is easy to adapt for a custom unnest function. The COPY approach is complementary (and will be faster still, I am sure).

So, I am not saying we are not going to optimize for the ON CONFLICT case. But when implementing it, we need to make sure that the code works for the general case, and not only Promscale.

cevian · 2021-04-23T16:20:11Z

FWIW i'd be super happy with having both optimizations the unnest approach for non-arrays and/or copy otherwise. I do understand the need for generality. I am just arguing against ONLY the copy optimization.

cevian · 2021-04-23T16:28:03Z

I did not realize the max dimensions was so small (only 6)

A new custom plan and executor node is added that implements `INSERT` using `COPY` in the backend (between access node and data nodes). `COPY` is significantly faster than executing an `INSERT` plan since tuples can be streamed to data nodes instead of dividing the original insert statement up in individual `INSERT` statements and batches for each data node. Thus `COPY` avoids the overhead of having to prepare statements, plan, and batch data for each prepared statement in the backend. Streaming tuples with `COPY` only works if there is no `ON CONFLICT` clause in the original `INSERT` statement, however. Neither is it possible to return the modified tuples in case of `ON CONFLICT DO UPDATE`, since there is no `RETURNING` clause support. Without `ON CONFLICT` we can support `RETURNING`, however, by simple returning the inserted tuples at the access node. Fixes timescale#3025 (for non-"ON CONFLICT" case).

A new custom plan and executor node is added that implements `INSERT` using `COPY` in the backend (between access node and data nodes). `COPY` is significantly faster than executing an `INSERT` plan since tuples can be streamed to data nodes instead of dividing the original insert statement into separate `INSERT` statements and batches for each data node. Thus `COPY` avoids the overhead of having to prepare statements, plan the query, and batch data for each data node. Streaming tuples with `COPY` only works if there is no `ON CONFLICT` clause in the original `INSERT` statement, however. Nor is it possible to return the modified tuples in case of `ON CONFLICT DO UPDATE`, since there is no `RETURNING` clause support. However, without `ON CONFLICT` we can support `RETURNING` by simply returning the inserted tuples at the access node since no modification of them occurs. In order to implement the new custom node, some refactoring has been done to the distributed `COPY` code. The basic `COPY` support functions have been moved to the connection module so that it is possible to "switch out" of `COPY_IN` mode in other parts of the code. For instance, when creating a remote chunk, the connection needs to be switched out of `COPY_IN` mode so that regular SQL statements can be sent again. It also makes sense wrap `libpq` calls within the connection module, including switching in and out of copy modes. Fixes timescale#3025 (for non-"ON CONFLICT" case).

binakot · 2021-04-27T06:33:52Z

Hi there. I was redirected from my question in slack: https://timescaledb.slack.com/archives/CPSD82EMU/p1619502753158000.

I subscribed to this issue and looking for news here.

Atm I working around distributed hypertable with time-dimension column as integer and did few experimental ingest with different chunk_time_interval. My test cluster in docker has 1 AN + 3 DN. And this is result on same setup:

SELECT * FROM create_distributed_hypertable('distirbuted_data', 'event_id', 'device_id', 
chunk_time_interval => 1000);
INSERT INTO distirbuted_data SELECT * FROM data;

INSERT 0 25869
Query returned successfully in 3 min 47 secs.

---

SELECT * FROM create_distributed_hypertable('distirbuted_data', 'event_id', 'device_id', 
chunk_time_interval => 10000);
INSERT INTO distirbuted_data SELECT * FROM data;

INSERT 0 25869
Query returned successfully in 1 min 57 secs.

---

SELECT * FROM create_distributed_hypertable('distirbuted_data', 'event_id', 'device_id', 
chunk_time_interval => 100000);
INSERT INTO distirbuted_data SELECT * FROM data;

INSERT 0 25869
Query returned successfully in 47 secs 300 msec.

---

SELECT * FROM create_distributed_hypertable('distirbuted_data', 'event_id', 'device_id', 
chunk_time_interval => 1000000);
INSERT INTO distirbuted_data SELECT * FROM data;

INSERT 0 25869
Query returned successfully in 10 secs 865 msec.

It's a dependency between chunk_time_interval and total ingest time with INSERT / SELECT.
Later I will try to do it with COPY and update my result here. It's interesting to know difference in performance for my case.

A new custom plan and executor node is added that implements `INSERT` using `COPY` in the backend (between access node and data nodes). `COPY` is significantly faster than executing an `INSERT` plan since tuples can be streamed to data nodes instead of dividing the original insert statement into separate `INSERT` statements and batches for each data node. Thus `COPY` avoids the overhead of having to prepare statements, plan the query, and batch data for each data node. Streaming tuples with `COPY` only works if there is no `ON CONFLICT` clause in the original `INSERT` statement, however. Nor is it possible to return the modified tuples in case of `ON CONFLICT DO UPDATE`, since there is no `RETURNING` clause support. However, without `ON CONFLICT` we can support `RETURNING` by simply returning the inserted tuples at the access node since no modification of them occurs. In order to implement the new custom node, some refactoring has been done to the distributed `COPY` code. The basic `COPY` support functions have been moved to the connection module so that it is possible to "switch out" of `COPY_IN` mode in other parts of the code. For instance, when creating a remote chunk, the connection needs to be switched out of `COPY_IN` mode so that regular SQL statements can be sent again. It also makes sense wrap `libpq` calls within the connection module, including switching in and out of copy modes. Fixes timescale#3025 (for non-"ON CONFLICT" case).

A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.

A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for #3025.

erimatnor · 2021-05-12T14:15:49Z

Merged PR to improve performance by using COPY in the backend. That doesn't solve the ON CONFLICT case, though, so leaving this issue open.

nikkhils · 2021-08-04T11:24:15Z

@erimatnor please note that #3413 is reporting degraded performance with 2.3 and possibly this optimization might not be helping (or even regressing) things.

nikkhils · 2023-03-01T06:00:56Z

The following test case now works.

CREATE TYPE mytype AS (low int, high int); -- on AN and and on all DNs
CREATE TABLE arrtable (time timestamptz, val mytype[]);
SELECT create_distributed_hypertable('arrtable', 'time');
INSERT INTO arrtable VALUES (now(), ARRAY[(1,1)::mytype]);
postgres=# SELECT * from arrtable ;
             time              |    val    
-------------------------------+-----------
 2023-02-28 21:56:40.990426-08 | {"(1,1)"}
(1 row)

@erimatnor should we close this now?

jfjoly · 2023-03-03T08:23:03Z

I'll go ahead and close given the results above. We can always re-open later if needed.

ryanbooz added the multinode label Mar 12, 2021

erimatnor self-assigned this Mar 29, 2021

erimatnor closed this as completed Apr 19, 2021

erimatnor reopened this Apr 19, 2021

erimatnor linked a pull request Apr 20, 2021 that will close this issue

Improve INSERT performance on distributed hypertables #3138

Closed

erimatnor mentioned this issue Apr 26, 2021

Use COPY when executing distributed INSERTs #3162

Merged

NunoFilipeSantos added the performance label May 11, 2021

akamensky mentioned this issue Jul 16, 2021

Throughput degradation with multi-node when upgrading 2.1 to 2.3 #3413

Open

NunoFilipeSantos added the distributed-hypertable label Oct 22, 2021

NunoFilipeSantos added the Query Experience Team label Mar 9, 2022

jfjoly added the q3-candidate label May 13, 2022

erimatnor removed their assignment Jul 6, 2022

jfjoly closed this as completed Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSERT on a distributed hypertable is significantly slower than COPY #3025

INSERT on a distributed hypertable is significantly slower than COPY #3025

ryanbooz commented Mar 12, 2021 •

edited by davidkohn88

davidkohn88 commented Mar 16, 2021

cevian commented Mar 24, 2021

erimatnor commented Apr 13, 2021 •

edited

erimatnor commented Apr 16, 2021

davidkohn88 commented Apr 16, 2021 •

edited

erimatnor commented Apr 19, 2021 •

edited

erimatnor commented Apr 19, 2021 •

edited

erimatnor commented Apr 19, 2021

erimatnor commented Apr 19, 2021

erimatnor commented Apr 21, 2021 •

edited

davidkohn88 commented Apr 21, 2021

erimatnor commented Apr 22, 2021

ryanbooz commented Apr 22, 2021

davidkohn88 commented Apr 22, 2021

erimatnor commented Apr 23, 2021

cevian commented Apr 23, 2021

cevian commented Apr 23, 2021

erimatnor commented Apr 23, 2021 •

edited

cevian commented Apr 23, 2021

cevian commented Apr 23, 2021

binakot commented Apr 27, 2021 •

edited

erimatnor commented May 12, 2021

nikkhils commented Aug 4, 2021

nikkhils commented Mar 1, 2023

jfjoly commented Mar 3, 2023

INSERT on a distributed hypertable is significantly slower than COPY #3025

INSERT on a distributed hypertable is significantly slower than COPY #3025

Comments

ryanbooz commented Mar 12, 2021 • edited by davidkohn88

Single-node, regular hypertable (8cpu/32gb)

Multi-node (1AN/4DN), distributed hypertable (4cpu/16gb)

davidkohn88 commented Mar 16, 2021

cevian commented Mar 24, 2021

erimatnor commented Apr 13, 2021 • edited

Composite/row types

Array types

erimatnor commented Apr 16, 2021

davidkohn88 commented Apr 16, 2021 • edited

erimatnor commented Apr 19, 2021 • edited

erimatnor commented Apr 19, 2021 • edited

erimatnor commented Apr 19, 2021

erimatnor commented Apr 19, 2021

erimatnor commented Apr 21, 2021 • edited

davidkohn88 commented Apr 21, 2021

erimatnor commented Apr 22, 2021

ryanbooz commented Apr 22, 2021

davidkohn88 commented Apr 22, 2021

erimatnor commented Apr 23, 2021

cevian commented Apr 23, 2021

cevian commented Apr 23, 2021

erimatnor commented Apr 23, 2021 • edited

cevian commented Apr 23, 2021

cevian commented Apr 23, 2021

binakot commented Apr 27, 2021 • edited

erimatnor commented May 12, 2021

nikkhils commented Aug 4, 2021

nikkhils commented Mar 1, 2023

jfjoly commented Mar 3, 2023

ryanbooz commented Mar 12, 2021 •

edited by davidkohn88

erimatnor commented Apr 13, 2021 •

edited

davidkohn88 commented Apr 16, 2021 •

edited

erimatnor commented Apr 19, 2021 •

edited

erimatnor commented Apr 19, 2021 •

edited

erimatnor commented Apr 21, 2021 •

edited

erimatnor commented Apr 23, 2021 •

edited

binakot commented Apr 27, 2021 •

edited