New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow INSERTS/UPDATES/DELETES to compressed chunks #2857
Comments
One more note, the backfill script seems to rely on the usage of compression policies, which aren't yet supported on distributed hypertables. |
Thanks for this tracking issue, @shane-axiom. Team indeed plans to mitigate some of these limitations. As a start, we've recently merged a PR that allows one to ALTER the schema of a compressed hypertable (by supporting ability to add a column). This will likely be released in v2.1. #2736 |
Eagerly waiting for this functionality |
That would be nice. Current situation is kinda uncomfortable with backfilling. |
Update: We've been making great progress on this, and should release soon the ability to INSERT rows directly into compressed hypertables. |
Do you have any plans for DELETEs? |
Hey guys, |
Would also love to see UPSERTS with unique constraints on compressed tables. With how much data we have per compressed chunk, it takes a really long time to use the backfill script - a single compressed chunk can take up to 2 hours to decompress even if we only need to upsert 2K records. |
We would also like to be able to upsert into compressed chunks where we have a uniqueness constraint. So far we have implemented the workaround of decompressing, upserting and re-compressing chunks, but have concerns about performance. |
Hi! I'd like to propose a limited form of DELETE (and maybe even UPDATE), that I think it's rather viable. I note that I'm talking in the air here since this is what I have derived from looking at the TimescaleDB tables. We start with a simple schema: create table signal (
id int8 not null primary key,
name text not null unique
);
create table sample (
id int8 not null,
tstamp timestamptz not null,,
value float8 not null,
primary key (id, tstamp),
constraint fk_sample foreign key(id) references signal(id)
);
select create_hypertable('sample ', 'tstamp', chunk_time_interval => interval '5 days');
alter table example set (
timescaledb.compress,
timescaledb.compress_segmentby = 'id'
);
select add_compression_policy('example', interval '10 days'); So if we have a signal called "oven_temperature" with an id 23, we'd see samples like:
While the normal uncompressed chunks just follow the base table's schema, the compression settings will generate tables in the background named compress_hyper_x_y_chunk in the following form: create table compress_hyper_x_y_chunk (
id int8,
tstamp compressed_data,
value compressed_data,
-- plus other internal timescaledb stuff
) Now, eyeballing it it seems editing either timestamp or value data is very complicated, since you'd need to manipulate the binary blobs in each column, which are compressed and so on. However, it seems possible to delete the data of an entire signal easily, since you could just add a where id = 23 and delete all the signal's history. In the same vein, it may be somewhat possible to update the table replacing the id with something else. Conceptually you'd be "redirecting" the history of a signal with another one. For instance, if you had "oven1_temperature" with id 23, "oven2_temperature" with id 24, and later you wanted to unify that history on "oven1_temperature", you could replace occurences of id 24 with id 23. The foreign key of sample makes it so you cant delete signals at all if the hypertable is compressed, so at the very least "reorganizing" existing history by pointing it to other signal IDs can be useful. All of this is a far cry of a generic standard SQL delete or update, but it'd useful in a bunch of cases I believe, probably exposed via especific TimescaleDB functions. What do you think? |
I created a function to delete from compressed table using the segmentby col: it has a lot of debug output, just comment out the notices when in production. |
You just made my life a whole load simpler, as this allows me to meet our data retention requirements by removing segmented data from a single hypertable. |
Any outgoing work on being able to issue UPSERTS/UPDATES to a compressed chunk? We have been having to deal with merging out-of-order and in-order compressed data together for some time now hoping we may see an updated on this so we can simplify this aspect of how we use timescale. |
Looking forward to this feature!!!! Have been fighting with manual decompressing, upsert and compressing quite a lot and the dealing with deadlocks along the way. I am sure this feature can make my life a lot easier. |
A quick status update for anyone interested on this feature: We are actively working on supporting updates, deletes, and upserts on compressed chunks. To do so, we are revisiting the way we perform DML on compressed chunks, which will also speed up inserts. We expect to start gradually releasing support for the aforementioned commands during Q1 of 2023 🎆 |
Wonder if this feature covers this one or not? |
@WalkerWalker yes |
@iroussos this is great to hear! We are very excited to simplify our current approach to upserting into compressed chunks. Will it support compressions that contain two segment_by columns? |
/ping |
@mgagliardo91 Yes, all compression configurations will be supported |
DELETE support would sure make the experience of managing a hypertable via regular statements a lot more streamlined ;) |
@mgagliardo91 I'm very excited for this feature to arrive in TimescaleDB Cloud. Can you share a rough ETA? |
@WeeJeWel sorry for the late response, lots of snow in my tiny end of the world, so I was slow to catch up with my notifications. We are actively working on this, and the progress is looking good on our side. We are working in public, so for the more adventurous between you, you can even track our progress through our PRs (I'll link a few in case anyone is interested in what we have been doing) With the usual disclaimer that until releases are finalized we can never make promises, here is a rough list to provide some transparency on our plans:
Let us know what you believe that we should be working on after the list above is in the can by adding new issues with feature requests or voting and commenting on existing ones :-) |
Thank you for the transparency @iroussos ! I'm looking forward to seeing this available in TimescaleDB Cloud. |
Refer #5339 |
Closing successfully as this request is now supported in TimescaleDB 2.11 🎆 For more info check our release notes |
I'm using self-hosted TimescaleDB and just ran
(etc.) Is there anything else I need to do to make this possible? Do I need to restart PostgreSQL, recompress chunks, or anything like that? |
@bmillwood try to update the |
@jsvisa thanks, but I don't think I have that installed (it doesn't show up in |
@bmillwood or maybe you need to specify the target database, else the extension maybe not updated as expected. ref timescale/docs.timescale.com-content#300 |
@jsvisa I don't use the Docker installation and don't have I looked at the code for the PR and I see this:
Indeed I'm on postgresql 13. Maybe this should have been mentioned in the release notes? |
Creating this issue to track a feature request for allowing INSERTS, UPDATES, and DELETES to compressed chunks within hypertables (i.e. performing those operations on hypertables with compressed chunks using data with times within the ranges of compressed chunks). Apologies if this is already covered by another issue, but I scanned for a while and didn't find one.
The current limitations are described in the docs and a blog post, which also mentions:
I'm creating this issue mostly in hopes of hearing from the devs about the feasability and/or timeline of implementing this functionality. If you're interested in this as a user, please just +1 instead of comment spamming unless you have a relevant use case to describe.
Our use case is collecting long time series from many devices outputting observations at various frequency. The vast majority of data ingest is "current" (now-ish) data, and access patterns are primarily wide aggregations of current conditions or full time series extraction of a single device; this makes using compression on older tables very desirable.
However, we are also continuously adding new devices with existing historical time series to the data store, and certain devices periodically need older data deleted and replaced with quality controlled data. Currently this would require manual intervention, either by manually decompressing chunks, inserting data, and recompressing (which is complicated and requires temporary usage of larger disk space) or running the backfill script, which I haven't yet tested but seems like it isn't aware of secondary/space partitioning columns (i.e. it would decompress all chunks corresponding to a time range rather than specific chunks matching the hash of the secondary partitioning column).
I'll note that our use case can tolerate somewhat sub-optimal performance for INSERTS/UPDATES/DELETES on compressed chunks. In other words, we can wait a bit longer these operations to complete as long as they work as expected without manual intervention.
The text was updated successfully, but these errors were encountered: