[Bug]: Segmentation fault in decompress chunk after appending new rows #5411

kgyrtkirk · 2023-03-07T13:54:44Z

What type of bug is this?

Crash

What subsystems and features are affected?

Compression

What happened?

the testcase is more elaborate; but most likely the only relevant part is how the compression was enabled (segmentby/orderby)

segmentation fault have happened beneath tsl/src/nodes/decompress_chunk/exec.c:394 in:

Core was generated by `postgres:'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/78984' in core file too small.
#0  toast_build_flattened_tuple (tupleDesc=0x557aaa2c5148, values=<optimized out>, isnull=0x557aaa132de8) at heaptoast.c:592
592				if (VARATT_IS_EXTERNAL(new_value))
(gdb) bt
#0  toast_build_flattened_tuple (tupleDesc=0x557aaa2c5148, values=<optimized out>, isnull=0x557aaa132de8) at heaptoast.c:592
#1  0x0000557aa8f70fe3 in ExecEvalWholeRowVar (state=state@entry=0x557aaa133778, op=op@entry=0x557aaa133fa0, econtext=econtext@entry=0x557aa9fd7458) at execExprInterp.c:4179
#2  0x0000557aa8f7150a in ExecInterpExpr (state=0x557aaa133778, econtext=0x557aa9fd7458, isnull=0x7ffde79bafdf) at execExprInterp.c:616
#3  0x0000557aa8f6e6f8 in ExecInterpExprStillValid (state=0x557aaa133778, econtext=0x557aa9fd7458, isNull=0x7ffde79bafdf) at execExprInterp.c:1826
#4  0x00007fb8c3f21bad in ExecEvalExprSwitchContext (isNull=0x7ffde79bafdf, econtext=0x557aa9fd7458, state=0x557aaa133778) at /home/dev/pg/REL_15_2/include/postgresql/server/executor/executor.h:341
#5  ExecProject (projInfo=0x557aaa133770) at /home/dev/pg/REL_15_2/include/postgresql/server/executor/executor.h:375
#6  decompress_chunk_exec (node=<optimized out>) at /home/dev/timescaledb/tsl/src/nodes/decompress_chunk/exec.c:394
#7  0x0000557aa8f938b9 in ExecCustomScan (pstate=<optimized out>) at nodeCustom.c:115
#8  0x0000557aa8f7e168 in ExecProcNodeFirst (node=0x557aaa132b10) at execProcnode.c:464
#9  0x0000557aa8fafc8f in ExecProcNode (node=node@entry=0x557aaa132b10) at ../../../src/include/executor/executor.h:259
[...]

TimescaleDB version affected

2.10.1 and current main

PostgreSQL version used

15

What operating system did you use?

timescale/timescaledb:2.10.1-pg15

What installation method did you use?

Docker

What platform did you run on?

Other

Relevant log output and stack trace

https://github.com/kgyrtkirk/reprox/actions/runs/4354697899/jobs/7610324856#step:3:1441

How can we reproduce the bug?

run the case by building the image of:
https://github.com/kgyrtkirk/reprox/tree/comp-test_segfault

The text was updated successfully, but these errors were encountered:

mickael-choisnet · 2023-03-08T12:50:10Z

Hello @kgyrtkirk,

Do you think this bug is specific to PostgreSQL 15 or can it occur on PostgreSQL 14?

I ran into a Segmentation fault error on two of my databases after upgrading timescaledb from 2.9.3 to 2.10.0 (running on PostgreSQL 14.5).

The crash occurs when querying a hypertable with a join. It does only happens after the compression jobs first ran after the upgrade.
I'm not sure it's the same issue. I am trying to find a way to reproduce it but so far I haven't succeed.

kgyrtkirk · 2023-03-08T13:09:39Z

do you happen to have a core file or a backtrace?
I'll run it with PG14 - but I've noticed that there are some differences - some of them arise from the fact that the random() fn returns different numbers regardless the same seed is set ; which negatively impact cross-version reproduction rate - however I was able to reproduce this issue by altering the start seed of the test - backtrace is the same; so I think this could happen on pg14 as well

please try to get at least a backtrace so that we can validate if its the same issue or a different one

https://github.com/kgyrtkirk/reprox/tree/comp-test_segfault-14

mickael-choisnet · 2023-03-08T13:20:14Z

I don't have the necessary tools installed on the VM to create a backtrace (not yet still but I can look into installing what's needed).
I suppose now that I have downgraded the timescaledb extension and the problem has vanished it is not sure I'll be able to reproduce it the same way...

I'll try to make a backup of my cluster with pgbackrest and test the update again on another VM (I cannot juste crash the one I fixed, it is not production but it is used)

mickael-choisnet · 2023-03-08T15:42:04Z

@kgyrtkirk,

I've successfully done what I said:

make a backup of the cluster with pgBackRest
Restore it on another VM
Install gdb and the necessary packages to debug postgresql-14 on this VM
Update timescaledb from 2.9.3 to 2.10.0 for the restored database
Run the query that makes the database to crash (and it does crash)

I've attached gdb to the backend process of my query and here is the stacktrace I get:

Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x000055e76670b52e in CheckVarSlotCompatibility (attnum=1, vartype=25, slot=<optimized out>) at ./build/../src/backend/executor/execExprInterp.c:1904
1904	./build/../src/backend/executor/execExprInterp.c: No such file or directory.
#0  0x000055e76670b52e in CheckVarSlotCompatibility (attnum=1, vartype=25, slot=<optimized out>) at ./build/../src/backend/executor/execExprInterp.c:1904
#1  0x000055e76670bbc4 in CheckVarSlotCompatibility (slot=0x0, vartype=<optimized out>, attnum=<optimized out>) at ./build/../src/backend/executor/execExprInterp.c:1902
#2  CheckExprStillValid (state=state@entry=0x55e76735d7d8, econtext=econtext@entry=0x55e76735e0a8) at ./build/../src/backend/executor/execExprInterp.c:1868
#3  0x000055e76670bbeb in ExecInterpExprStillValid (state=0x55e76735d7d8, econtext=0x55e76735e0a8, isNull=0x7ffd497afe57) at ./build/../src/backend/executor/execExprInterp.c:1818
#4  0x000055e76673686c in ExecEvalExpr (isNull=0x7ffd497afe57, econtext=0x55e76735e0a8, state=<optimized out>) at ./build/../src/include/executor/executor.h:320
#5  ExecIndexEvalRuntimeKeys (econtext=econtext@entry=0x55e76735e0a8, runtimeKeys=<optimized out>, numRuntimeKeys=<optimized out>) at ./build/../src/backend/executor/nodeIndexscan.c:634
#6  0x000055e766736986 in ExecReScanIndexScan (node=node@entry=0x55e767027388) at ./build/../src/backend/executor/nodeIndexscan.c:568
#7  0x000055e766705c38 in ExecReScan (node=0x55e767027388) at ./build/../src/backend/executor/execAmi.c:183
#8  0x000055e766736215 in ExecIndexScan (pstate=0x55e767027388) at ./build/../src/backend/executor/nodeIndexscan.c:530
#9  0x00007f36b77325e0 in ExecProcNode (node=0x55e767027388) at /usr/include/postgresql/14/server/executor/executor.h:257
#10 decompress_chunk_create_tuple (state=0x55e76701ac68) at ./tsl/src/nodes/decompress_chunk/exec.c:426
#11 decompress_chunk_exec (node=0x55e76701ac68) at ./tsl/src/nodes/decompress_chunk/exec.c:373
#12 0x000055e7667292fa in ExecProcNode (node=0x55e76701ac68) at ./build/../src/include/executor/executor.h:257
#13 ExecAppend (pstate=0x55e76701a038) at ./build/../src/backend/executor/nodeAppend.c:360
#14 0x000055e766740485 in ExecProcNode (node=0x55e76701a038) at ./build/../src/include/executor/executor.h:257
#15 ExecNestLoop (pstate=0x55e767019e98) at ./build/../src/backend/executor/nodeNestloop.c:109
#16 0x000055e766737fc9 in ExecProcNode (node=0x55e767019e98) at ./build/../src/include/executor/executor.h:257
#17 ExecLimit (pstate=0x55e767019ba8) at ./build/../src/backend/executor/nodeLimit.c:96
#18 0x000055e766713333 in ExecProcNode (node=0x55e767019ba8) at ./build/../src/include/executor/executor.h:257
#19 ExecutePlan (execute_once=<optimized out>, dest=0x55e767325628, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55e767019ba8, estate=0x55e7670198d8) at ./build/../src/backend/executor/execMain.c:1551
#20 standard_ExecutorRun (queryDesc=0x55e7670389e8, direction=<optimized out>, count=0, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:361
#21 0x000055e76688d38c in PortalRunSelect (portal=0x55e766f21458, forward=<optimized out>, count=0, dest=<optimized out>) at ./build/../src/backend/tcop/pquery.c:921
#22 0x000055e76688e71a in PortalRun (portal=0x55e766f21458, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x55e767325628, altdest=0x55e767325628, qc=0x7ffd497b0340) at ./build/../src/backend/tcop/pquery.c:765
#23 0x000055e76688a9a5 in exec_simple_query () at ./build/../src/backend/tcop/postgres.c:1213
#24 0x000055e76688c23b in PostgresMain () at ./build/../src/backend/tcop/postgres.c:4508
#25 0x000055e766809db6 in BackendRun (port=<optimized out>, port=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:4537
#26 BackendStartup (port=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:4259
#27 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1745
#28 0x000055e76680ac44 in PostmasterMain (argc=5, argv=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:1417
#29 0x000055e76657c762 in main (argc=5, argv=0x55e766e6ec00) at ./build/../src/backend/main/main.c:209
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

Is this enough?

kgyrtkirk · 2023-03-08T15:51:23Z

awesome! this seems like a slightly different backtrace - I think its key to also do the upgrade to reproduce your issue;
you may report this separetly as I'm not convinced that if we fix this issue yours will be fixed as well!

if you have any suggestions and/or ideas how this could be reproduced (beyond the 2.9.3->2.10.0 upgrade step) feel free to add them!

mickael-choisnet · 2023-03-08T15:58:02Z

Thanks,

I'll do that.
I'll try to give every detail I can think of that may pertain to the state of my database when I performed the update.

kgyrtkirk · 2023-03-17T08:57:16Z

I've tried to narrow this down - and got to: #5458

Decompression produces records which have all the decompressed data set, but it also retains the fields which are used internally during decompression. These didn't cause any problem - unless an operation is being done with the whole row - in which case all the fields which have ended up being non-null can be a potential segfault source. Fixes timescale#5458 timescale#5411

Decompression produces records which have all the decompressed data set, but it also retains the fields which are used internally during decompression. These didn't cause any problem - unless an operation is being done with the whole row - in which case all the fields which have ended up being non-null can be a potential segfault source. Fixes #5458 #5411

sb230132 · 2023-04-07T05:37:34Z

Closing this issue as its fixed in #5458

kgyrtkirk · 2023-04-07T06:08:43Z

I thinked it will close it automagically - it seems like github needs to have the "fixes" keyword repeated to do that!
Thank you @sb230132 for keeping an eye!

Decompression produces records which have all the decompressed data set, but it also retains the fields which are used internally during decompression. These didn't cause any problem - unless an operation is being done with the whole row - in which case all the fields which have ended up being non-null can be a potential segfault source. Fixes timescale#5458 timescale#5411 cherry-picked from 975e9ca

Decompression produces records which have all the decompressed data set, but it also retains the fields which are used internally during decompression. These didn't cause any problem - unless an operation is being done with the whole row - in which case all the fields which have ended up being non-null can be a potential segfault source. Fixes #5458 #5411 cherry-picked from 975e9ca

kgyrtkirk added bug compression labels Mar 7, 2023

kgyrtkirk mentioned this issue Mar 17, 2023

Fix segfault after column drop on compressed table #5462

Merged

sb230132 closed this as completed Apr 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Segmentation fault in decompress chunk after appending new rows #5411

[Bug]: Segmentation fault in decompress chunk after appending new rows #5411

kgyrtkirk commented Mar 7, 2023

mickael-choisnet commented Mar 8, 2023 •

edited

Loading

kgyrtkirk commented Mar 8, 2023 •

edited

Loading

mickael-choisnet commented Mar 8, 2023

mickael-choisnet commented Mar 8, 2023 •

edited

Loading

kgyrtkirk commented Mar 8, 2023

mickael-choisnet commented Mar 8, 2023

kgyrtkirk commented Mar 17, 2023

sb230132 commented Apr 7, 2023

kgyrtkirk commented Apr 7, 2023

[Bug]: Segmentation fault in decompress chunk after appending new rows #5411

[Bug]: Segmentation fault in decompress chunk after appending new rows #5411

Comments

kgyrtkirk commented Mar 7, 2023

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

mickael-choisnet commented Mar 8, 2023 • edited Loading

kgyrtkirk commented Mar 8, 2023 • edited Loading

mickael-choisnet commented Mar 8, 2023

mickael-choisnet commented Mar 8, 2023 • edited Loading

kgyrtkirk commented Mar 8, 2023

mickael-choisnet commented Mar 8, 2023

kgyrtkirk commented Mar 17, 2023

sb230132 commented Apr 7, 2023

kgyrtkirk commented Apr 7, 2023

mickael-choisnet commented Mar 8, 2023 •

edited

Loading

kgyrtkirk commented Mar 8, 2023 •

edited

Loading

mickael-choisnet commented Mar 8, 2023 •

edited

Loading