New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INSERT on a distributed hypertable is significantly slower than COPY #3025
Comments
I think this is most likely related to the parse/bind inefficiency: timescaledb/tsl/src/data_node_dispatch.c Line 195 in 10b3da0
Note that we generate predefined prepared statements and both generating them on the access and parsing them on the data node have significant overhead. We ran into this trouble with Promscale and got around it by using a trick with arrays where we sent arrays over the wire and had a prepared statement that unnested arrays rather than preparing for each input, you now have one variable length input per column: https://github.com/timescale/promscale/blob/d6a724027a90b69230aa4a77e93333845c0ca8e6/pkg/pgmodel/ingestor/insert.go#L357 The benefits for switching would be signficant, as all the code to determine whether we're in the fixed batch size case or if we're at the ends and just need to use the normal size would go away as the prepared statement would remain the same and the size of the array sent would change. Additionally, for prepared statements that are used only once (which will likely be the common case), the size of the prepared statement being sent over the wire can be as big or bigger than the data, this would significantly reduce the size of the prepared statement. All types in postgres have arrays associated with them by default, so using this should be quite general-purpose. |
Yes, in promscale we found that doing |
Currently doing POC of the approach suggested by @davidkohn88 and @cevian. A couple of issues arose so far with composite types and array types. Composite/row typesWhen composite types are in an array and then unnested, they are unnested as separate columns. For instance:
So, if original table only had two columns, then unnested statement has three and the insert will fail. This can be handled by recreating the composite type in the target list:
However, this makes creating/deparsing the INSERT statement to send to the remote node a lot more complicated. We need to check for custom types and recreate them in the target list and we also need to ensure nested composite types works. It seems to work by itself:
Array typesWhen a table has an array column, the unnest in the generated INSERT statement will unnest the arrays in arrays:
This is obviously not the desired behavior as we expect the inner arrays to not be unnested. Currently don't have a good way to deal with this. It might not be an issue if the data types are serialized or parameterized:
|
Found a potential show-stopper for the array approach when using custom types. This is actually a more general issue that affects multi-node in general when custom types are used in arrays. The issue is that the For instance, this simple test case fails on a distributed hypertable but works on a regular hypertable:
IMO, this is actually a bug in PostgreSQL. A thread on psql-hackers also brings up this issue: https://www.postgresql.org/message-id/flat/20160426.132000.16866976.horiguchi.kyotaro%40lab.ntt.co.jp#761e7ec4f277b956232487091c8db8cd |
Ugh. Well, that's annoying. I wonder if a custom serialization type that wasn't an array and an unpacking function that had the proper information would work? It'd follow the same basic approach (send things over as a blob rather than as deparsing the whole thing) and could basically just be a list of columns / types and the values list as a blob? It might be a lot simpler than the array approach anyway wouldn't have to do the per-column stuff etc...I don't know if that would run into SQL injection issues, but I don't think it would as we've already done the prepared statement etc on the AN side to do the routing. I guess the real question is if during the POC were you able to get some numbers on more simple tables in similar situations to the one reported (ie a lot of columns / med-large number of rows per batch) to see what the impact of the approach would be on performance and whether it would be worth pursuing another way of accomplishing this? It seems like that might be accomplish-able with what you've got in order to make a go/no-go decision on a different approach with a similar philosophy... One more thought: Something like:
(I don't know how crazy this is to be honest, just spitballing here, I seem to remember that Greenplum has something like this for joins....where they can join to a table that exists on the access node equivalent by basically rewriting it as a SRF on the data node equivalent and shuffling the data over, moving the smaller table to the bigger data rather than the bigger to the smaller). |
@davidkohn88 The POC revealed a good performance boost (~50%), but it is hard to know exactly if this is down only to the array-approach vs. other refactorings/improvements. For example, also found some improvements by simply doing more effecient async handling across multiple node. But I think we can assume the array/columnar approach is worthwhile if we can get it to work reliably. The idea of a custom "container" type has come up as one potential solution. Another option would be to go with a hybrid approach where we fall back to the non-columnar insert for queries with custom types. But that's also messy and requires maintaining two code paths. We could also revert to "text" encoding in case of custom types. |
Wanted to check if this is a problem for
I realized, however, that |
Looks like we can revert to text serialization just for the array that has a custom element type. |
Closed by mistake, so reopened. |
Array column types seems to pose a problem. For example:
This is because the
@davidkohn88 Any ideas on this? Seems like a show stopper for this idea. I think we need to implement our own container type where we can properly roll up and unwind array elements. |
Yea...those array functions leave a lot to be desired. And the behaviour of all multi-dimensional arrays is truly weird. It'd either be a container type, we could do this for scalars and fall back to the old behavior for arrays (which, while annoying, would cover like 99% of users anyway), we could use array types and write a special unnest-like function that just unnest-ed the first dimension of an array, which might be something we could contribute back upstream. I don't know the scale of the problems with this approach, but I do wonder if the SRF that produces tuples from a stream from the AN is feasible? I think that one has major benefits in general and would speed a lot of things up and make for fewer memory management issues etc...but it might be such a PITA that it's not worth it...(though I do think that that would have applications well beyond the insert path, it would be reusable for joins on the DN and other bits that could be quite helpful for query optimization down the line). |
I think using |
Just wanted to say that I enjoy watching you guys work this through. I'm learning a ton. Thanks for caring and working to find a great approach! |
That's true, but the The statement would look something like Is that a remotely feasible approach? I don't know what sort of roadblocks would be in the way for that one, but I guess some sort of communication between the front end PID and the back end PID is the problem? Either way, I can see doing either the COPY approach, and fall back when doing ON CONFLICT or the ARRAY approach and fall back when doing inserts of other arrays as a stopgap for now and maybe moving to this approach later? If I were choosing between the two I'd probably choose the second as I think far more people use ON CONFLICT than insert arrays into hypertables... |
Currently switching to pursue COPY approach. It is fairly simple approach and is a low-hanging fruit for good performance until we can invest more time in a custom function for columnar rollups. |
@erimatnor the problem is this won't help with Promscale performance as Promscale needs ON CONFLICT. I think writing a unnest in C that undoes just one dimension shouldn't be too hard. see https://stackoverflow.com/questions/8137112/unnest-array-by-one-level and that will create a much more general solution. Please keep in mind that we do want to benchmark Promscale on multinode fairly soon. |
It seems like the low-hanging fruit is optimize either:
I'd argue for the former. |
@cevian Even writing a new unnest function doesn't make it 100% general. There's a limit to the number of dimensions an array can have, so wrapping array types in another level of arrays is not possible if the column type already has the maximum number of dimensions. Ideally, we'd have to implement an entirely new container type and/or serialization format for "columnar data". The dimension limit is somewhat of a corner-case and is likely a limitation we can live with. But it requires extra checks and validation to block such columns from distributed hypertables if we wrap columnar data in arrays. I've already implemented the unnest approach, and the code is easy to adapt for a custom unnest function. The COPY approach is complementary (and will be faster still, I am sure). So, I am not saying we are not going to optimize for the ON CONFLICT case. But when implementing it, we need to make sure that the code works for the general case, and not only Promscale. |
FWIW i'd be super happy with having both optimizations the unnest approach for non-arrays and/or copy otherwise. I do understand the need for generality. I am just arguing against ONLY the copy optimization. |
I did not realize the max dimensions was so small (only 6) |
A new custom plan and executor node is added that implements `INSERT` using `COPY` in the backend (between access node and data nodes). `COPY` is significantly faster than executing an `INSERT` plan since tuples can be streamed to data nodes instead of dividing the original insert statement up in individual `INSERT` statements and batches for each data node. Thus `COPY` avoids the overhead of having to prepare statements, plan, and batch data for each prepared statement in the backend. Streaming tuples with `COPY` only works if there is no `ON CONFLICT` clause in the original `INSERT` statement, however. Neither is it possible to return the modified tuples in case of `ON CONFLICT DO UPDATE`, since there is no `RETURNING` clause support. Without `ON CONFLICT` we can support `RETURNING`, however, by simple returning the inserted tuples at the access node. Fixes timescale#3025 (for non-"ON CONFLICT" case).
A new custom plan and executor node is added that implements `INSERT` using `COPY` in the backend (between access node and data nodes). `COPY` is significantly faster than executing an `INSERT` plan since tuples can be streamed to data nodes instead of dividing the original insert statement into separate `INSERT` statements and batches for each data node. Thus `COPY` avoids the overhead of having to prepare statements, plan the query, and batch data for each data node. Streaming tuples with `COPY` only works if there is no `ON CONFLICT` clause in the original `INSERT` statement, however. Nor is it possible to return the modified tuples in case of `ON CONFLICT DO UPDATE`, since there is no `RETURNING` clause support. However, without `ON CONFLICT` we can support `RETURNING` by simply returning the inserted tuples at the access node since no modification of them occurs. In order to implement the new custom node, some refactoring has been done to the distributed `COPY` code. The basic `COPY` support functions have been moved to the connection module so that it is possible to "switch out" of `COPY_IN` mode in other parts of the code. For instance, when creating a remote chunk, the connection needs to be switched out of `COPY_IN` mode so that regular SQL statements can be sent again. It also makes sense wrap `libpq` calls within the connection module, including switching in and out of copy modes. Fixes timescale#3025 (for non-"ON CONFLICT" case).
Hi there. I was redirected from my question in slack: https://timescaledb.slack.com/archives/CPSD82EMU/p1619502753158000. I subscribed to this issue and looking for news here. Atm I working around distributed hypertable with time-dimension column as integer and did few experimental ingest with different chunk_time_interval. My test cluster in docker has 1 AN + 3 DN. And this is result on same setup: SELECT * FROM create_distributed_hypertable('distirbuted_data', 'event_id', 'device_id',
chunk_time_interval => 1000);
INSERT INTO distirbuted_data SELECT * FROM data;
INSERT 0 25869
Query returned successfully in 3 min 47 secs.
---
SELECT * FROM create_distributed_hypertable('distirbuted_data', 'event_id', 'device_id',
chunk_time_interval => 10000);
INSERT INTO distirbuted_data SELECT * FROM data;
INSERT 0 25869
Query returned successfully in 1 min 57 secs.
---
SELECT * FROM create_distributed_hypertable('distirbuted_data', 'event_id', 'device_id',
chunk_time_interval => 100000);
INSERT INTO distirbuted_data SELECT * FROM data;
INSERT 0 25869
Query returned successfully in 47 secs 300 msec.
---
SELECT * FROM create_distributed_hypertable('distirbuted_data', 'event_id', 'device_id',
chunk_time_interval => 1000000);
INSERT INTO distirbuted_data SELECT * FROM data;
INSERT 0 25869
Query returned successfully in 10 secs 865 msec. It's a dependency between chunk_time_interval and total ingest time with INSERT / SELECT. |
A new custom plan and executor node is added that implements `INSERT` using `COPY` in the backend (between access node and data nodes). `COPY` is significantly faster than executing an `INSERT` plan since tuples can be streamed to data nodes instead of dividing the original insert statement into separate `INSERT` statements and batches for each data node. Thus `COPY` avoids the overhead of having to prepare statements, plan the query, and batch data for each data node. Streaming tuples with `COPY` only works if there is no `ON CONFLICT` clause in the original `INSERT` statement, however. Nor is it possible to return the modified tuples in case of `ON CONFLICT DO UPDATE`, since there is no `RETURNING` clause support. However, without `ON CONFLICT` we can support `RETURNING` by simply returning the inserted tuples at the access node since no modification of them occurs. In order to implement the new custom node, some refactoring has been done to the distributed `COPY` code. The basic `COPY` support functions have been moved to the connection module so that it is possible to "switch out" of `COPY_IN` mode in other parts of the code. For instance, when creating a remote chunk, the connection needs to be switched out of `COPY_IN` mode so that regular SQL statements can be sent again. It also makes sense wrap `libpq` calls within the connection module, including switching in and out of copy modes. Fixes timescale#3025 (for non-"ON CONFLICT" case).
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for timescale#3025.
A new custom plan/executor node is added that implements distributed INSERT using COPY in the backend (between access node and data nodes). COPY is significantly faster than the existing method that sets up prepared INSERT statements on each data node. With COPY, tuples are streamed to data nodes instead of batching them in order to "fill" a configured prepared statement. A COPY also avoids the overhead of having to plan the statement on each data node. Using COPY doesn't work in all situations, however. Neither ON CONFLICT nor RETURNING clauses work since COPY lacks support for them. Still, RETURNING is possible if one knows that the tuples aren't going to be modified by, e.g., a trigger. When tuples aren't modified, one can return the original tuples on the access node. In order to implement the new custom node, some refactoring has been performed to the distributed COPY code. The basic COPY support functions have been moved to the connection module so that switching in and out of COPY_IN mode is part of the core connection handling. This allows other parts of the code to manage the connection mode, which is necessary when, e.g., creating a remote chunk. To create a chunk, the connection needs to be switched out of COPY_IN mode so that regular SQL statements can be executed again. Partial fix for #3025.
Merged PR to improve performance by using COPY in the backend. That doesn't solve the ON CONFLICT case, though, so leaving this issue open. |
@erimatnor please note that #3413 is reporting degraded performance with 2.3 and possibly this optimization might not be helping (or even regressing) things. |
The following test case now works.
@erimatnor should we close this now? |
I'll go ahead and close given the results above. We can always re-open later if needed. |
Relevant system information:
Describe the bug
Using INSERT on a distributed hypertable is significantly slower than inserting into a regular hypertable with INSERT. Based on feedback in the linked Slack conversation, I tested INSERT/COPY performance using TSBS on a regular and distributed hypertable. Across 8 different tests (results below), it is evident that INSERT actually performs worse on a distributed hypertable performs worse than a single-node. Even though we expect a batched INSERT statement to be slower than COPY, the expectation is that more nodes would still, in some way, improve ingest performance with INSERT in a similar fashion as COPY. That does not appear to be the case.
All tests inserted 36,000,000 rows, in batches of 4,000 rows, with TSBS from a pre-generated file (using
FILE
mode)Single-node, regular hypertable (8cpu/32gb)
Multi-node (1AN/4DN), distributed hypertable (4cpu/16gb)
Upon further discussion, the working theory is that parameter binding on the Access Node is the bottleneck (understandable) and using something like Promscale does with array parameters instead could significantly improve performance in a distributed transaction.
Examples:
INSERT INTO metric_table(time, value, series_id) SELECT * FROM unnest($1::TIMESTAMPTZ[], $2::DOUBLE PRECISION[], $3::BIGINT[]) a(t,v,s) ORDER BY s,t ON CONFLICT DO NOTHING
To Reproduce
There is sample code linked in the Slack conversation, although it was fairly inefficient locally to produce the results. Therefore I generated large files (36,000,000 rows each) for four different, consecutive hours. Using those four files, I tested the scenarios listed above on both a regular and distributed hypertable.
All ingests (tsbs_load) were done with a batch size of 4,000 rows because TSBS currently disallows batches above some ratio (for this test I couldn't go higher than ~4,300 rows/batch). To keep the tests even, I chose to do both COPY and INSERT with 4,000 rows.
Expected behavior
Even knowing that INSERT will carry a higher overhead than COPY in both regular and distributed hypertable inserts, the ratio of performance is expected to increase with additional data nodes (as we generally see with COPY performance).
Actual behavior
In this case, both INSERT tests were slower on multi-node than a single-node, even allowing for the smaller instance sizes.
The 1 worker tests, in particular, show a ~2x decrease in performance for multinode compared to single node. They also show a ~5x decrease in performance compared to copy whereas the same test on a regular hypertable only has a ~2x decrease in performance.
Even the 10 worker case shows a significant decrease for multinode vs singlenode, though it's harder to determine whether that's due to fewer cores on the AN than on the singlenode box.
Additional context
Original Slack conversation (it's a long thread with both @davidkohn88 and @k-rus involved): https://timescaledb.slack.com/archives/CPSD82EMU/p1614670969056700
Note that the situation here is a relatively common one, where workers are inserting from specific shards of a stream, the number of workers is based on the shards of the stream and can't be easily increased. The performance hit of going to multinode compared to singlenode in this case was so much (especially for wide tables) that the user had to stop using multinode and is looking for options other than Timescale, though they are otherwise happy. They need multinode for scaling out storage, and would have to implement some sort of two phase transactional inserts in order to use more than one worker per shard.
The text was updated successfully, but these errors were encountered: