Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 'ON CONFLICT DO NOTHING' for compressed chunks #4113

Closed
wiebeytec opened this issue Feb 21, 2022 · 7 comments
Closed

Support 'ON CONFLICT DO NOTHING' for compressed chunks #4113

wiebeytec opened this issue Feb 21, 2022 · 7 comments
Assignees
Labels
compression ddl Issues that involve DDL handling enhancement An enhancement to an existing feature for functionality Team: Core Database

Comments

@wiebeytec
Copy link

What type of enhancement is this?

User experience

What subsystems and features will be improved?

Compression

What does the enhancement do?

When there are compressed chunks and you do INSERT ... ON CONFLICT DO NOTHING, it says: "insert with ON CONFLICT or RETURNING clause is not supported on compressed chunks".

Currently, if you have an ingress system that processes duplicate values while back-dating, you can't use compression, even if you just want to ignore the duplicates.

It's a light-weight version of #3323: only supporting DO NOTHING is fine when having unique keys. However, perhaps it's essentially the same problem, I don't know.

Implementation challenges

I tried patching the code as per attached patch (Or github compare (for as long as the link works)). The problem with it is that despite there being a unique index, duplicate values are inserted. I don't know if fixing this (checking for the value in the compressed index) is essentially the same as #3323, or that DO NOTHING can more easily be implemented.

timescaledb-compressed-do-nothing-fix.patch.txt

@wiebeytec wiebeytec added the enhancement An enhancement to an existing feature for functionality label Feb 21, 2022
@gayyappan
Copy link
Contributor

@wiebeytec This is essentially the same problem as 3323. We cannot support this enhancement without support for 3323.

@cocowalla
Copy link

Also really keen to see a fix for this.

As with many (most?) systems, I can't guarantee there will be no duplicates, so without this enhancement (more of a bug fix TBH, since it was treated as such in #100 for non-compressed hypertables) I need to check if rows exist before inserting them, which absolutely decimates performance 😭

@WalkerWalker
Copy link

WalkerWalker commented Aug 11, 2022

also as discussed here on slack. Would love to see this feature.

@sb230132 sb230132 self-assigned this Aug 29, 2022
@sb230132
Copy link
Contributor

Issue:

Consider below example table:

CREATE TABLE sample_table (
cpu double precision null,
time TIMESTAMP WITH TIME ZONE NOT NULL,
sensor_id INTEGER NOT NULL,
name TEXT default 'this is a default string value',
UNIQUE(sensor_id, time)
);

-- create hypertable
SELECT * FROM create_hypertable('sample_table', 'time',
chunk_time_interval => INTERVAL '10 day');

-- populate hypertable
INSERT INTO sample_table
SELECT random() AS cpu, time + (INTERVAL '1 minute' * random()) AS time,
sensor_id, 'very long string'
FROM
generate_series(now() - INTERVAL '1 months', now() - INTERVAL '1 week', INTERVAL '1 minute') AS g1(time),
generate_series(1, 25, 1 ) AS g2(sensor_id)
ORDER BY time;

-- show_chunks
_timescaledb_internal._hyper_1_1_chunk
_timescaledb_internal._hyper_1_2_chunk
_timescaledb_internal._hyper_1_3_chunk

Above chunks are child tables of sample_table. These chunks inherit UNIQUE CONSTRAINT from sample_table.

--enable compression
ALTER TABLE sample_table SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'sensor_id',
timescaledb.compress_orderby = 'time'
);

Enabling compression creates a new compressed table _timescaledb_internal._compressed_hypertable_2

\d _timescaledb_internal._compressed_hypertable_2
Table "_timescaledb_internal._compressed_hypertable_2"
Column | Type | Collation | Nullable | Default
-----------------------+---------------------------------------+-----------+----------+---------
cpu | _timescaledb_internal.compressed_data | | |
time | _timescaledb_internal.compressed_data | | |
sensor_id | integer | | |
name | _timescaledb_internal.compressed_data | | |
_ts_meta_count | integer | | |
_ts_meta_sequence_num | integer | | |
_ts_meta_min_1 | timestamp with time zone | | |
_ts_meta_max_1 | timestamp with time zone | | |
Indexes:
"_compressed_hypertable_2_sensor_id__ts_meta_sequence_num_idx" btree (sensor_id, _ts_meta_sequence_num)
Triggers:
ts_insert_blocker BEFORE INSERT ON _timescaledb_internal._compressed_hypertable_2 FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()

Note that UNIQUE CONSTRAINT is not created for this new compressed table.

-- compress the chunks
SELECT COMPRESS_CHUNK(i) FROM SHOW_CHUNKS('sample_table') i;

-- INSERT duplicates:
INSERT into sample_table SELECT * FROM sample_table;
ERROR: insert into a compressed chunk that has primary or unique constraint is not supported

-- INSERT duplicates WITH ON CONFLICT
INSERT into sample_table SELECT * FROM sample_table ON CONFLICT DO NOTHING;
ERROR: insert with ON CONFLICT or RETURNING clause is not supported on compressed chunks

If i comment the above error handling code, timescaledb ends up inserting duplicate rows. This is because there is no UNIQUE CONSTRAINT on the compressed table.

Questions:

  1. Should we copy the UNIQUE CONSTRAINT defined on base parent table, to compressed table and let compressed
    chunks to inherit the same constraint ?
    Challenge with this approach is that "time" column data type is _timescaledb_internal.compressed_data. This data type does not define any operator classes which defines about comparator operators.
  2. Is UNIQUE CONSTRAINT check performed after data is decompressed, or should we do binary data comparison of compressed data ?

I will clarify above questions with my team and further proceed on how to approach this problem.

@sb230132
Copy link
Contributor

sb230132 commented Mar 6, 2023

Refer #5252
Will close this issue once 5252 is merged to main branch.

@WalkerWalker
Copy link

wow. cool. Super happy to be able to use this feature.

@antekresic
Copy link
Contributor

#5252 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compression ddl Issues that involve DDL handling enhancement An enhancement to an existing feature for functionality Team: Core Database
Projects
None yet
Development

No branches or pull requests