Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Corruption when inserting into compressed chunk #4655

Closed
mfundul opened this issue Aug 26, 2022 · 7 comments · Fixed by #4926
Closed

[Bug]: Corruption when inserting into compressed chunk #4655

mfundul opened this issue Aug 26, 2022 · 7 comments · Fixed by #4926

Comments

@mfundul
Copy link
Contributor

mfundul commented Aug 26, 2022

What type of bug is this?

Data corruption

What subsystems and features are affected?

Compression

What happened?

When inserting via trigger with WHERE clause into an already compressed chunk there is data corruption if you read the data back:

ERROR:  invalid compression algorithm 147

The reproductions steps are below. I have also gotten an ASAN backtrace with a similar to the reproducer's setup.

TimescaleDB version affected

2.7.2

PostgreSQL version used

13.7

What operating system did you use?

Ubuntu 20.04 x64

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

==postgres==1255770==ERROR: AddressSanitizer: SEGV on unknown address 0x7f982efdd872 (pc 0x5592f7e95078 bp 0x7ffddbfaf0c0 sp 0x7ffddbfaf050 T0)
==postgres==1255770==The signal is caused by a READ memory access.
    #0 0x5592f7e95077 in heap_deform_tuple postgresql-13.7/src/backend/access/common/heaptuple.c:1304
    #1 0x7f98111863be in decompress_chunk timescaledb/tsl/src/compression/compression.c:1094
    #2 0x7f981116f54d in decompress_chunk_impl timescaledb/tsl/src/compression/api.c:394
    #3 0x7f9811171099 in tsl_decompress_chunk timescaledb/tsl/src/compression/api.c:637
    #4 0x7f98113f5e48 in ts_decompress_chunk timescaledb/src/cross_module_fn.c:76
    #5 0x5592f8566fdc in ExecMakeTableFunctionResult postgresql-13.7/src/backend/executor/execSRF.c:234
    #6 0x5592f85b414d in FunctionNext postgresql-13.7/src/backend/executor/nodeFunctionscan.c:95
    #7 0x5592f856b82e in ExecScanFetch postgresql-13.7/src/backend/executor/execScan.c:133
    #8 0x5592f856b93d in ExecScan postgresql-13.7/src/backend/executor/execScan.c:182
    #9 0x5592f85b4bcf in ExecFunctionScan postgresql-13.7/src/backend/executor/nodeFunctionscan.c:270
    #10 0x5592f85615e5 in ExecProcNodeFirst timescale/postgresql-13.7/src/backend/executor/execProcnode.c:450
    #11 0x5592f8540496 in ExecProcNode ../../../src/include/executor/executor.h:248
    #12 0x5592f85472ba in ExecutePlan postgresql-13.7/src/backend/executor/execMain.c:1632
    #13 0x5592f8541560 in standard_ExecutorRun postgresql-13.7/src/backend/executor/execMain.c:350
    #14 0x5592f85410b0 in ExecutorRun postgresql-13.7/src/backend/executor/execMain.c:294
    #15 0x5592f8b40a1e in PortalRunSelect postgresql-13.7/src/backend/tcop/pquery.c:921
    #16 0x5592f8b400ab in PortalRun postgresql-13.7/src/backend/tcop/pquery.c:765
    #17 0x5592f8b320d0 in exec_simple_query postgresql-13.7/src/backend/tcop/postgres.c:1239
    #18 0x5592f8b3b14e in PostgresMain postgresql-13.7/src/backend/tcop/postgres.c:4347
    #19 0x5592f8957167 in BackendRun postgresql-13.7/src/backend/postmaster/postmaster.c:4550
    #20 0x5592f8955dca in BackendStartup postgresql-13.7/src/backend/postmaster/postmaster.c:4234
    #21 0x5592f894e9c6 in ServerLoop postgresql-13.7/src/backend/postmaster/postmaster.c:1739
    #22 0x5592f894d5c6 in PostmasterMain postgresql-13.7/src/backend/postmaster/postmaster.c:1412
    #23 0x5592f8692921 in main postgresql-13.7/src/backend/main/main.c:210
    #24 0x7f981da42082 in __libc_start_main ../csu/libc-start.c:308
    #25 0x5592f7d383fd in _start (/usr/local/pgsql/postgresql-13.7/bin/postgres+0x3883fd)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV postgresql-13.7/src/backend/access/common/heaptuple.c:1304 in heap_deform_tuple

How can we reproduce the bug?

CREATE TABLE logs(ts TIMESTAMPTZ NOT NULL, module_id TEXT, level TEXT, module TEXT, msg TEXT);
SELECT create_hypertable('logs', 'ts');
ALTER TABLE logs SET (timescaledb.compress);

CREATE TABLE tester(inc INTEGER);

CREATE OR REPLACE FUNCTION tester_row() RETURNS trigger AS
$$
BEGIN
    INSERT INTO logs
    SELECT
        '1046870701',
        '1',
        'level',
        'module',
        'message'
	WHERE NEW.inc IS NOT NULL;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER tester_insert AFTER INSERT ON tester FOR EACH ROW EXECUTE FUNCTION tester_row();

INSERT INTO tester VALUES(1);
SELECT show_chunks AS chunk_id FROM show_chunks('logs') \gset
SELECT * FROM public.compress_chunk(:'chunk_id');

DO $$
DECLARE
  current_day INTEGER;
BEGIN
    FOR current_day IN SELECT generate_series(1, 1000)
    LOOP
		INSERT INTO tester VALUES(1);
		COMMIT;
    END LOOP;
END $$;

SELECT count(module_id), ts FROM logs GROUP BY ts;
-- Crash or exception here, e.g.:
-- ERROR:  invalid compression algorithm 147
@svenklemm
Copy link
Member

The buggy behaviour happens when the plan is a generic cached plan.
Slightly different reproducer without triggers:

CREATE TABLE logs(ts TIMESTAMPTZ NOT NULL, msg TEXT);
SELECT create_hypertable('logs', 'ts');
ALTER TABLE logs SET (timescaledb.compress);
INSERT INTO logs SELECT '2020-01-01', 'message';
SELECT compress_chunk(c) FROM show_chunks('logs') c;

CREATE OR REPLACE FUNCTION fn_insert(timestamptz) RETURNS VOID AS
$$
BEGIN
    SET plan_cache_mode TO force_generic_plan;
    INSERT INTO logs SELECT '2020-01-01', 'message' WHERE $1 < '3000-01-01';
END;
$$ LANGUAGE plpgsql;

SELECT fn_insert(now()) FROM generate_series(1,2);

SELECT count(*), ts FROM logs GROUP BY ts;

@konskov konskov self-assigned this Aug 29, 2022
@horzsolt horzsolt assigned mkindahl and unassigned konskov Aug 29, 2022
@svenklemm svenklemm self-assigned this Aug 30, 2022
@mkindahl
Copy link
Contributor

It corrupts data in the actual compressed chunk. Reading directly from the compressed tables does not solve anything.

mats=# CREATE TABLE logs(ts TIMESTAMPTZ NOT NULL, msg TEXT);
CREATE TABLE
mats=# SELECT create_hypertable('logs', 'ts');
 create_hypertable  
--------------------
 (11,public,logs,t)
(1 row)

mats=# 
ALTER TABLE
mats=# 
INSERT 0 1
              compress_chunk              
------------------------------------------
 _timescaledb_internal._hyper_11_11_chunk
(1 row)

mats=# SET plan_cache_mode TO force_generic_plan;
SET
mats=# EXPLAIN INSERT INTO logs SELECT '2020-01-01', 'message' WHERE now() < '3000-01-01';
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Custom Scan (HypertableModify)  (cost=0.01..0.01 rows=1 width=40)
   ->  Insert on logs  (cost=0.01..0.01 rows=1 width=40)
         ->  Result  (cost=0.01..0.01 rows=1 width=40)
               One-Time Filter: (now() < '3000-01-01 00:00:00+00'::timestamp with time zone)
               ->  Custom Scan (ChunkDispatch)  (cost=0.01..0.01 rows=1 width=40)
                     ->  Result  (cost=0.01..0.01 rows=1 width=40)
                           One-Time Filter: (now() < '3000-01-01 00:00:00+00'::timestamp with time zone)
(7 rows)

mats=# INSERT INTO logs SELECT '2020-01-01', 'message' WHERE now() < '3000-01-01';
INSERT 0 1
mats=# \dt _timescaledb_internal.*
                         List of relations
        Schema         |            Name            | Type  | Owner 
-----------------------+----------------------------+-------+-------
 _timescaledb_internal | _compressed_hypertable_12  | table | mats
 _timescaledb_internal | _hyper_11_11_chunk         | table | mats
 _timescaledb_internal | bgw_job_stat               | table | mats
 _timescaledb_internal | bgw_policy_chunk_stats     | table | mats
 _timescaledb_internal | compress_hyper_12_12_chunk | table | mats
(5 rows)

mats=# set plan_cache_mode to DEFAULT ;
SET
mats=# SELECT ts FROM _timescaledb_internal.compress_hyper_12_12_chunk ;
ERROR:  invalid compression algorithm 7
mats=# SELECT msg FROM _timescaledb_internal.compress_hyper_12_12_chunk ;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Coredump shows same stack as before (which is probably a red herring since the corruption occurs when inserting the row):

(gdb) bt
#0  0x00005635ef400b9d in slot_deform_heap_tuple (natts=2, offp=0x5635f0c1f438, tuple=<optimized out>, slot=0x5635f0c1f3f0) at execTuples.c:991
#1  tts_buffer_heap_getsomeattrs (slot=0x5635f0c1f3f0, natts=<optimized out>) at execTuples.c:695
#2  0x00005635ef40346e in slot_getsomeattrs_int (slot=slot@entry=0x5635f0c1f3f0, attnum=attnum@entry=2) at execTuples.c:1912
#3  0x00005635ef3eb772 in slot_getsomeattrs (attnum=2, slot=0x5635f0c1f3f0) at ../../../src/include/executor/tuptable.h:387
#4  slot_getattr (isnull=0x5635f0c1f6c8, attnum=2, slot=0x5635f0c1f3f0) at ../../../src/include/executor/tuptable.h:387
#5  ExecJustAssignVarImpl (isnull=<optimized out>, inslot=0x5635f0c1f3f0, state=<optimized out>) at execExprInterp.c:2104
#6  ExecJustAssignScanVar (state=<optimized out>, econtext=<optimized out>, isnull=<optimized out>) at execExprInterp.c:2126
#7  0x00005635ef3ffe79 in ExecEvalExprSwitchContext (isNull=0x7ffe15947ef7, econtext=0x5635f0c1f300, state=0x5635f0c1f718) at ../../../src/include/executor/executor.h:339
#8  ExecProject (projInfo=0x5635f0c1f710) at ../../../src/include/executor/executor.h:373
#9  ExecScan (node=<optimized out>, accessMtd=accessMtd@entry=0x5635ef42e508 <SeqNext>, recheckMtd=recheckMtd@entry=0x5635ef42e4b8 <SeqRecheck>) at execScan.c:238
#10 0x00005635ef42e4e7 in ExecSeqScan (pstate=<optimized out>) at nodeSeqscan.c:112
#11 0x00005635ef3f4750 in ExecProcNode (node=0x5635f0c1f1e8) at ../../../src/include/executor/executor.h:257
#12 ExecutePlan (estate=estate@entry=0x5635f0c1efb0, planstate=0x5635f0c1f1e8, use_parallel_mode=<optimized out>, operation=operation@entry=CMD_SELECT, sendTuples=sendTuples@entry=true, numberTuples=numberTuples@entry=0, direction=ForwardScanDirection, 
    dest=0x5635f0c64b60, execute_once=true) at execMain.c:1551
#13 0x00005635ef3f5414 in standard_ExecutorRun (queryDesc=0x5635f0bf1160, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at execMain.c:361
#14 0x00005635ef3f54e1 in ExecutorRun (queryDesc=queryDesc@entry=0x5635f0bf1160, direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=<optimized out>) at execMain.c:305
#15 0x00005635ef5c5435 in PortalRunSelect (portal=portal@entry=0x5635f0bc5c70, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x5635f0c64b60) at pquery.c:921
#16 0x00005635ef5c6f53 in PortalRun (portal=portal@entry=0x5635f0bc5c70, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x5635f0c64b60, altdest=altdest@entry=0x5635f0c64b60, qc=0x7ffe15948220)
    at pquery.c:765
#17 0x00005635ef5c2dbc in exec_simple_query (query_string=query_string@entry=0x5635f0b59e20 "SELECT msg FROM _timescaledb_internal.compress_hyper_45_45_chunk ;") at postgres.c:1214
#18 0x00005635ef5c4e85 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffe15948420, dbname=<optimized out>, username=<optimized out>) at postgres.c:4496
#19 0x00005635ef5187ea in BackendRun (port=port@entry=0x5635f0b86fd0) at postmaster.c:4530
#20 0x00005635ef51bba8 in BackendStartup (port=port@entry=0x5635f0b86fd0) at postmaster.c:4252
#21 0x00005635ef51bdf1 in ServerLoop () at postmaster.c:1745
#22 0x00005635ef51d34d in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at postmaster.c:1417
#23 0x00005635ef45998f in main (argc=1, argv=0x5635f0b537f0) at main.c:209

@mkindahl
Copy link
Contributor

Inserting without a filter works fine, but once a filter is added, it breaks:

mats=# EXPLAIN INSERT INTO logs VALUES ('2020-01-01', 'message');
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Custom Scan (HypertableModify)  (cost=0.00..0.01 rows=1 width=40)
   ->  Insert on logs  (cost=0.00..0.01 rows=1 width=40)
         ->  Custom Scan (ChunkDispatch)  (cost=0.00..0.01 rows=1 width=40)
               ->  Result  (cost=0.00..0.01 rows=1 width=40)
(4 rows)

mats=# INSERT INTO logs VALUES ('2020-01-01', 'message');
INSERT 0 1
mats=# SELECT * FROM _timescaledb_internal.compress_hyper_78_78_chunk;
                            ts                            |                       msg                        | _ts_meta_count | _ts_meta_sequence_num |     _ts_meta_min_1     |     _ts_meta_max_1     
----------------------------------------------------------+--------------------------------------------------+----------------+-----------------------+------------------------+------------------------
 BAAAAj4HhsJgAAACPgeGwmAAAAAAAQAAAAEAAAAAAAAADgAEfA8NhMAA | AQBwZ19jYXRhbG9nAHRleHQAAAEAAAABAAAAB21lc3NhZ2U= |              1 |                    10 | 2020-01-01 00:00:00+00 | 2020-01-01 00:00:00+00
 BAAAAj4HhsJgAAACPgeGwmAAAAAAAQAAAAEAAAAAAAAADgAEfA8NhMAA | AQBwZ19jYXRhbG9nAHRleHQAAAEAAAABAAAAB21lc3NhZ2U= |              1 |                     0 | 2020-01-01 00:00:00+00 | 2020-01-01 00:00:00+00
(2 rows)

mats=# EXPLAIN INSERT INTO logs SELECT '2020-01-01', 'message' WHERE now() < '3000-01-01';
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Custom Scan (HypertableModify)  (cost=0.01..0.01 rows=1 width=40)
   ->  Insert on logs  (cost=0.01..0.01 rows=1 width=40)
         ->  Result  (cost=0.01..0.01 rows=1 width=40)
               One-Time Filter: (now() < '3000-01-01 00:00:00+00'::timestamp with time zone)
               ->  Custom Scan (ChunkDispatch)  (cost=0.01..0.01 rows=1 width=40)
                     ->  Result  (cost=0.01..0.01 rows=1 width=40)
                           One-Time Filter: (now() < '3000-01-01 00:00:00+00'::timestamp with time zone)
(7 rows)

mats=# INSERT INTO logs SELECT '2020-01-01', 'message' WHERE now() < '3000-01-01';
INSERT 0 1
mats=# SELECT * FROM _timescaledb_internal.compress_hyper_78_78_chunk;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

@mfundul
Copy link
Contributor Author

mfundul commented Aug 30, 2022

Inserting without a filter works fine, but once a filter is added, it breaks:

This is the workaround I've given to support-dev collab.

@mkindahl
Copy link
Contributor

Adding the tuple to a separate table and trying the query works, but as soon as we add an expression that requires a Result node, it breaks:

mats=# EXPLAIN INSERT INTO logs SELECT * FROM src WHERE ts < '3000-01-01';
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Custom Scan (HypertableModify)  (cost=0.00..25.00 rows=400 width=40)
   ->  Insert on logs  (cost=0.00..25.00 rows=400 width=40)
         ->  Custom Scan (ChunkDispatch)  (cost=0.00..25.00 rows=400 width=40)
               ->  Seq Scan on src  (cost=0.00..25.00 rows=400 width=40)
                     Filter: (ts < '3000-01-01 00:00:00+00'::timestamp with time zone)
(5 rows)

mats=# INSERT INTO logs SELECT * FROM src WHERE ts < '3000-01-01';
INSERT 0 1
mats=# SELECT * FROM _timescaledb_internal.compress_hyper_113_114_chunk ;
                            ts                            |                       msg                        | _ts_meta_count | _ts_meta_sequence_num |     _ts_meta_min_1     |     _ts_meta_max_1     
----------------------------------------------------------+--------------------------------------------------+----------------+-----------------------+------------------------+------------------------
 BAAAAj4HhsJgAAACPgeGwmAAAAAAAQAAAAEAAAAAAAAADgAEfA8NhMAA | AQBwZ19jYXRhbG9nAHRleHQAAAEAAAABAAAAB21lc3NhZ2U= |              1 |                    10 | 2020-01-01 00:00:00+00 | 2020-01-01 00:00:00+00
 BAAAAj4HhsJgAAACPgeGwmAAAAAAAQAAAAEAAAAAAAAADgAEfA8NhMAA | AQBwZ19jYXRhbG9nAHRleHQAAAEAAAABAAAAB21lc3NhZ2U= |              1 |                     0 | 2020-01-01 00:00:00+00 | 2020-01-01 00:00:00+00
 BAAAAj4HhsJgAAACPgeGwmAAAAAAAQAAAAEAAAAAAAAADgAEfA8NhMAA | AQBwZ19jYXRhbG9nAHRleHQAAAEAAAABAAAAB21lc3NhZ2U= |              1 |                     0 | 2020-01-01 00:00:00+00 | 2020-01-01 00:00:00+00
(3 rows)

mats=# EXPLAIN INSERT INTO logs SELECT * FROM src WHERE now() < '3000-01-01';
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Custom Scan (HypertableModify)  (cost=0.01..22.01 rows=1200 width=40)
   ->  Insert on logs  (cost=0.01..22.01 rows=1200 width=40)
         ->  Result  (cost=0.01..22.01 rows=1200 width=40)
               One-Time Filter: (now() < '3000-01-01 00:00:00+00'::timestamp with time zone)
               ->  Custom Scan (ChunkDispatch)  (cost=0.01..22.01 rows=1200 width=40)
                     ->  Result  (cost=0.01..22.01 rows=1200 width=40)
                           One-Time Filter: (now() < '3000-01-01 00:00:00+00'::timestamp with time zone)
                           ->  Seq Scan on src  (cost=0.01..22.01 rows=1200 width=40)
(8 rows)

mats=# INSERT INTO logs SELECT * FROM src WHERE now() < '3000-01-01';
INSERT 0 1
mats=# SELECT * FROM _timescaledb_internal.compress_hyper_113_114_chunk ;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

@svenklemm svenklemm added this to the TimescaleDB 2.8.1 milestone Sep 27, 2022
@jnidzwetzki jnidzwetzki removed this from the TimescaleDB 2.8.1 milestone Sep 30, 2022
@mkindahl
Copy link
Contributor

A short summary of the problem and why it occurs.

As can be seen in the plan above, there is a gating clause inserted between the chunk dispatch node and the hypertable modify. Since the target chunk is a compressed chunk, the tuple table slot will be constructed as a compressed tuple table slot when the tuple is constructed in the chunk dispatch node. However, the gating clause has the types of an uncompressed tuple, so when hypertable modify takes the tuple and tries to insert it into the compressed chunk, it is treated as an uncompressed tuple and inserted the wrong way. This will later cause the select from the table to fail since the format is incorrect.

Tried a few different approaches, but they run into different issues:

  1. Modifying the tuple in-flight does not work and causes referencing issues.
  2. Removing the gating clause does not work.
  3. Changing the tuple type of the gating clause would not work since the target chunk is dynamically looked up and will be different depending on whether it is a compressed or uncompressed chunk.

@svenklemm
Copy link
Member

This will be fixed by the changes to support UPSERT/UPDATE/DELETE on compressed chunks because we won't do on-the-fly compression anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants