You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a follow-up of a slack discussion
In our scenario we have IOT devices sending data into a raw_data table structured like this:
- machine_id int4
- metric_name varchar
- value real
- timestamp timestamptz
We also have compression enabled for chunks older than 2 weeks, which is segmented by machine_id,metric_name.
When we have to delete a machine there are three options:
use a foreign key
sequentially decompress, delete, compress each chunk
leave the data there
The first option works however:
having a foreign key slowed down (a lot) our initial data import process up to a point it would've taken days to complete
deletion of all the data linked to a machine in multiple hypertables because of the ON DELETE CASCADE was causing slowdowns and locks (after 20 minutes I had to cancel the query)
The second and third option are sub-optimal.
From my understanding of compression, data is stored in a columnar format but there is one row for each value of the segmentby column. So deleting based on a condition on the segmentby colums doesn't need to change the internal row array but it would just delete the row?
It's probably not that simple though. what are your thoughts about it?
The text was updated successfully, but these errors were encountered:
Is there any news on this Feature?
We have a very similar use case:
We have timeseries data stored in a hypertable with a deviceId. We have different retention periods by device ids.
For deleting our data we have a script that deletes old data according to its retention period chunk by chunk. (We only delete all data from one deviceid within a chunk never parts)
In that way we managed to delete in a relative efficient way of deleting data ,even though we cannot drop complete chunks.
We have to run a Vacuum Full on each chunk sometimes to actually free disk space from old, nearly empty chunks.
However now we would like to compress the data, as it is very well compressable. For our deletion script that means we would need to decompress and compress each chunk in the process.
As the architecture already allows to segment the data by the device id, we were wondering if it would be possible to delete segments without decompressing first. Thats how i found this feature Request.
This is a follow-up of a slack discussion
In our scenario we have IOT devices sending data into a
raw_data
table structured like this:We also have compression enabled for chunks older than 2 weeks, which is segmented by
machine_id,metric_name
.When we have to delete a machine there are three options:
The first option works however:
ON DELETE CASCADE
was causing slowdowns and locks (after 20 minutes I had to cancel the query)The second and third option are sub-optimal.
From my understanding of compression, data is stored in a columnar format but there is one row for each value of the segmentby column. So deleting based on a condition on the
segmentby
colums doesn't need to change the internal row array but it would just delete the row?It's probably not that simple though. what are your thoughts about it?
The text was updated successfully, but these errors were encountered: