Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: delete compressed chunks if condition matches segmentby #2692

Open
alex88 opened this issue Nov 27, 2020 · 4 comments
Open
Labels

Comments

@alex88
Copy link

alex88 commented Nov 27, 2020

This is a follow-up of a slack discussion
In our scenario we have IOT devices sending data into a raw_data table structured like this:

 - machine_id int4
 - metric_name varchar
 - value real
 - timestamp timestamptz

We also have compression enabled for chunks older than 2 weeks, which is segmented by machine_id,metric_name.
When we have to delete a machine there are three options:

  • use a foreign key
  • sequentially decompress, delete, compress each chunk
  • leave the data there

The first option works however:

  • having a foreign key slowed down (a lot) our initial data import process up to a point it would've taken days to complete
  • deletion of all the data linked to a machine in multiple hypertables because of the ON DELETE CASCADE was causing slowdowns and locks (after 20 minutes I had to cancel the query)

The second and third option are sub-optimal.

From my understanding of compression, data is stored in a columnar format but there is one row for each value of the segmentby column. So deleting based on a condition on the segmentby colums doesn't need to change the internal row array but it would just delete the row?

It's probably not that simple though. what are your thoughts about it?

@k-rus
Copy link
Contributor

k-rus commented Nov 27, 2020

Thank you for the feature request and describing your use case!

@NunoFilipeSantos NunoFilipeSantos added feature-request Feature proposal and removed community-request labels Sep 28, 2021
@lasseste
Copy link

lasseste commented Jan 6, 2022

Is there any news on this Feature?
We have a very similar use case:
We have timeseries data stored in a hypertable with a deviceId. We have different retention periods by device ids.
For deleting our data we have a script that deletes old data according to its retention period chunk by chunk. (We only delete all data from one deviceid within a chunk never parts)
In that way we managed to delete in a relative efficient way of deleting data ,even though we cannot drop complete chunks.
We have to run a Vacuum Full on each chunk sometimes to actually free disk space from old, nearly empty chunks.

However now we would like to compress the data, as it is very well compressable. For our deletion script that means we would need to decompress and compress each chunk in the process.

As the architecture already allows to segment the data by the device id, we were wondering if it would be possible to delete segments without decompressing first. Thats how i found this feature Request.

@xvaara
Copy link
Contributor

xvaara commented Oct 12, 2022

I created a function to delete from compressed table using the segmentby col:
https://gist.github.com/xvaara/81990e8291019f931387492c1869fe84

it has a lot of debug output, just comment out the notices when in production.

@mfreed
Copy link
Member

mfreed commented Dec 24, 2022

The team has been making progress on supporting DELETEs on compressed chunks. Please see this issue for discussion:

#2857

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants