Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow INSERTS/UPDATES/DELETES to compressed chunks #2857

Closed
srstsavage opened this issue Jan 24, 2021 · 33 comments
Closed

Allow INSERTS/UPDATES/DELETES to compressed chunks #2857

srstsavage opened this issue Jan 24, 2021 · 33 comments

Comments

@srstsavage
Copy link

Creating this issue to track a feature request for allowing INSERTS, UPDATES, and DELETES to compressed chunks within hypertables (i.e. performing those operations on hypertables with compressed chunks using data with times within the ranges of compressed chunks). Apologies if this is already covered by another issue, but I scanned for a while and didn't find one.

The current limitations are described in the docs and a blog post, which also mentions:

We also plan to weaken/remove this limitation in future releases.

I'm creating this issue mostly in hopes of hearing from the devs about the feasability and/or timeline of implementing this functionality. If you're interested in this as a user, please just +1 instead of comment spamming unless you have a relevant use case to describe.

Our use case is collecting long time series from many devices outputting observations at various frequency. The vast majority of data ingest is "current" (now-ish) data, and access patterns are primarily wide aggregations of current conditions or full time series extraction of a single device; this makes using compression on older tables very desirable.

However, we are also continuously adding new devices with existing historical time series to the data store, and certain devices periodically need older data deleted and replaced with quality controlled data. Currently this would require manual intervention, either by manually decompressing chunks, inserting data, and recompressing (which is complicated and requires temporary usage of larger disk space) or running the backfill script, which I haven't yet tested but seems like it isn't aware of secondary/space partitioning columns (i.e. it would decompress all chunks corresponding to a time range rather than specific chunks matching the hash of the secondary partitioning column).

I'll note that our use case can tolerate somewhat sub-optimal performance for INSERTS/UPDATES/DELETES on compressed chunks. In other words, we can wait a bit longer these operations to complete as long as they work as expected without manual intervention.

@srstsavage
Copy link
Author

One more note, the backfill script seems to rely on the usage of compression policies, which aren't yet supported on distributed hypertables.

#1900

@mfreed
Copy link
Member

mfreed commented Jan 28, 2021

Thanks for this tracking issue, @shane-axiom. Team indeed plans to mitigate some of these limitations.

As a start, we've recently merged a PR that allows one to ALTER the schema of a compressed hypertable (by supporting ability to add a column). This will likely be released in v2.1. #2736

@gtskaushik
Copy link

Eagerly waiting for this functionality

@q00Dree
Copy link

q00Dree commented Mar 24, 2021

That would be nice. Current situation is kinda uncomfortable with backfilling.

@mfreed
Copy link
Member

mfreed commented May 18, 2021

Update: We've been making great progress on this, and should release soon the ability to INSERT rows directly into compressed hypertables.

@wkopec-tt
Copy link

Do you have any plans for DELETEs?
We would love to move out user event data to TimescaleDB and our tests show amazing compression compression rate, but unfortunately we need to be able to delete user data to ensure GDPR-compatibility.
GDPR-compatibility is critical for every software operating in European Union, so I'm sure that it would be very important feature for a lot of timescale users.
Thanks!

@Pinacolada8
Copy link

Hey guys,
I know it is annoying to ask about release time, but do you have any release time estimate?
Sorry for my bad english and for that annoying question.

@mgagliardo91
Copy link

Would also love to see UPSERTS with unique constraints on compressed tables. With how much data we have per compressed chunk, it takes a really long time to use the backfill script - a single compressed chunk can take up to 2 hours to decompress even if we only need to upsert 2K records.

@fergus3000
Copy link

We would also like to be able to upsert into compressed chunks where we have a uniqueness constraint. So far we have implemented the workaround of decompressing, upserting and re-compressing chunks, but have concerns about performance.

@fetchezar
Copy link

Hi! I'd like to propose a limited form of DELETE (and maybe even UPDATE), that I think it's rather viable. I note that I'm talking in the air here since this is what I have derived from looking at the TimescaleDB tables.

We start with a simple schema:

create table signal (
  id int8 not null primary key,
  name text not null unique
);
create table sample (
  id int8 not null,
  tstamp timestamptz not null,,
  value float8 not null,
  primary key (id, tstamp),
  constraint fk_sample foreign key(id) references signal(id)
);
select create_hypertable('sample ', 'tstamp', chunk_time_interval => interval '5 days');
alter table example set (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'id'
);
select add_compression_policy('example', interval '10 days');

So if we have a signal called "oven_temperature" with an id 23, we'd see samples like:

23, '2022-07-04T00:30:00', 120.0
23, '2022-07-04T00:35:00', 130.0
23, '2022-07-04T00:40:00', 155.0

While the normal uncompressed chunks just follow the base table's schema, the compression settings will generate tables in the background named compress_hyper_x_y_chunk in the following form:

create table compress_hyper_x_y_chunk (
  id int8,
  tstamp compressed_data,
  value compressed_data,
  -- plus other internal timescaledb stuff
)

Now, eyeballing it it seems editing either timestamp or value data is very complicated, since you'd need to manipulate the binary blobs in each column, which are compressed and so on.

However, it seems possible to delete the data of an entire signal easily, since you could just add a where id = 23 and delete all the signal's history.

In the same vein, it may be somewhat possible to update the table replacing the id with something else. Conceptually you'd be "redirecting" the history of a signal with another one. For instance, if you had "oven1_temperature" with id 23, "oven2_temperature" with id 24, and later you wanted to unify that history on "oven1_temperature", you could replace occurences of id 24 with id 23.

The foreign key of sample makes it so you cant delete signals at all if the hypertable is compressed, so at the very least "reorganizing" existing history by pointing it to other signal IDs can be useful.

All of this is a far cry of a generic standard SQL delete or update, but it'd useful in a bunch of cases I believe, probably exposed via especific TimescaleDB functions.

What do you think?

@xvaara
Copy link
Contributor

xvaara commented Oct 12, 2022

I created a function to delete from compressed table using the segmentby col:
https://gist.github.com/xvaara/81990e8291019f931387492c1869fe84

it has a lot of debug output, just comment out the notices when in production.

@Ozzard
Copy link

Ozzard commented Nov 14, 2022

I created a function to delete from compressed table using the segmentby col: https://gist.github.com/xvaara/81990e8291019f931387492c1869fe84

You just made my life a whole load simpler, as this allows me to meet our data retention requirements by removing segmented data from a single hypertable.

@mgagliardo91
Copy link

Any outgoing work on being able to issue UPSERTS/UPDATES to a compressed chunk? We have been having to deal with merging out-of-order and in-order compressed data together for some time now hoping we may see an updated on this so we can simplify this aspect of how we use timescale.

@WalkerWalker
Copy link

WalkerWalker commented Dec 9, 2022

Looking forward to this feature!!!! Have been fighting with manual decompressing, upsert and compressing quite a lot and the dealing with deadlocks along the way. I am sure this feature can make my life a lot easier.

@iroussos
Copy link

iroussos commented Dec 9, 2022

A quick status update for anyone interested on this feature:

We are actively working on supporting updates, deletes, and upserts on compressed chunks. To do so, we are revisiting the way we perform DML on compressed chunks, which will also speed up inserts.

We expect to start gradually releasing support for the aforementioned commands during Q1 of 2023 🎆

@WalkerWalker
Copy link

Wonder if this feature covers this one or not?
#4113

@iroussos
Copy link

iroussos commented Dec 9, 2022

Wonder if this feature covers this one or not? #4113

@WalkerWalker yes ON CONFLICT DO NOTHING will be covered by this update :-)
(it is a much simpler use case of upsert, so we'll start with that either way)

@mgagliardo91
Copy link

A quick status update for anyone interested on this feature:

We are actively working on supporting updates, deletes, and upserts on compressed chunks. To do so, we are revisiting the way we perform DML on compressed chunks, which will also speed up inserts.

We expect to start gradually releasing support for the aforementioned commands during Q1 of 2023 🎆

@iroussos this is great to hear! We are very excited to simplify our current approach to upserting into compressed chunks. Will it support compressions that contain two segment_by columns?

@jsvisa
Copy link

jsvisa commented Dec 13, 2022

/ping

@iroussos
Copy link

Will it support compressions that contain two segment_by columns?

@mgagliardo91 Yes, all compression configurations will be supported

@jvanns
Copy link

jvanns commented Jan 30, 2023

DELETE support would sure make the experience of managing a hypertable via regular statements a lot more streamlined ;)

@WeeJeWel
Copy link

WeeJeWel commented Feb 2, 2023

@mgagliardo91 I'm very excited for this feature to arrive in TimescaleDB Cloud.

Can you share a rough ETA?

@mgagliardo91
Copy link

@WeeJeWel I think you'd want to ask @iroussos

@WeeJeWel
Copy link

WeeJeWel commented Feb 2, 2023

@WeeJeWel I think you'd want to ask @iroussos

You're right, haha!

@iroussos Hello ^.^

@iroussos
Copy link

@WeeJeWel sorry for the late response, lots of snow in my tiny end of the world, so I was slow to catch up with my notifications.

We are actively working on this, and the progress is looking good on our side. We are working in public, so for the more adventurous between you, you can even track our progress through our PRs (I'll link a few in case anyone is interested in what we have been doing)

With the usual disclaimer that until releases are finalized we can never make promises, here is a rough list to provide some transparency on our plans:

  1. We have completed our initial re-architecture of how compression works (#4926) - This is a required step to enable DML on compressed chunks, but it also provides a ~2x improvement on INSERT rate into compressed chunks. We expect this to be included in 2.10, which is just around the corner (before end of the month)
  2. We are currently working on enabling unique constraints on compressed chunks (#5252), and support the INSERT .. ON CONFLICT DO NOTHING and then the ON CONFLICT DO UPDATE (a.k.a. upsert) clauses - We expect those to land on the release that will follow 2.10
  3. Finally, support for UPDATES and DELETES will be released - release TBD, but we'll keep our promise and work on those once the previous ones are ready

Let us know what you believe that we should be working on after the list above is in the can by adding new issues with feature requests or voting and commenting on existing ones :-)

@WeeJeWel
Copy link

Thank you for the transparency @iroussos !

I'm looking forward to seeing this available in TimescaleDB Cloud.

@sb230132
Copy link
Contributor

sb230132 commented Mar 6, 2023

Refer #5339

@iroussos
Copy link

iroussos commented Jun 6, 2023

Closing successfully as this request is now supported in TimescaleDB 2.11 🎆

For more info check our release notes

@iroussos iroussos closed this as completed Jun 6, 2023
@bmillwood
Copy link

I'm using self-hosted TimescaleDB and just ran ALTER EXTENSION timescaledb UPDATE and confirmed with \dx I'm now on 2.11.0. However, I'm still seeing these messages:

ERROR:  cannot update/delete rows from chunk "_hyper_20_6086_chunk" as it is compressed

(etc.)

Is there anything else I need to do to make this possible? Do I need to restart PostgreSQL, recompress chunks, or anything like that?

@jsvisa
Copy link

jsvisa commented Jun 28, 2023

@bmillwood try to update the timescaledb_toolkit too and then retry

@bmillwood
Copy link

@jsvisa thanks, but I don't think I have that installed (it doesn't show up in \dx)

@jsvisa
Copy link

jsvisa commented Jun 28, 2023

@bmillwood or maybe you need to specify the target database, else the extension maybe not updated as expected. ref timescale/docs.timescale.com-content#300

@bmillwood
Copy link

bmillwood commented Jun 28, 2023

@jsvisa I don't use the Docker installation and don't have timescaledb installed on any other databases.

I looked at the code for the PR and I see this:

	/*
	 * We do not support UPDATE/DELETE operations on compressed hypertables
	 * on PG versions < 14, because Custom Scan (HypertableModify) node is
	 * not generated in the plan for UPDATE/DELETE operations on hypertables
	 */

Indeed I'm on postgresql 13. Maybe this should have been mentioned in the release notes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests