Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: decompress_chunk fails with "database system is in recovery mode" #6028

Closed
flexwende opened this issue Aug 30, 2023 · 3 comments
Closed
Assignees

Comments

@flexwende
Copy link

flexwende commented Aug 30, 2023

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Compression

What happened?

I tried compression to reduce storage.

compress_chunk works fine, but when I used decompress_chunk to revert the compression it fails with the error:

FATAL: the database system is in recovery mode

I tried the same with a minimal example and it worked. The hypertable chunks where the error occures were created with timescale 2.5.0. Could this be the problem?

TimescaleDB version affected

2.11.2

PostgreSQL version used

14.1

What operating system did you use?

Windows 10 x64

What installation method did you use?

Not applicable

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

2023-08-30 10:29:43.511 CEST [28896] LOG:  server process (PID 19872) was terminated by exception 0xC0000005
2023-08-30 10:29:43.511 CEST [28896] DETAIL:  Failed process was running: select decompress_chunk('_timescaledb_internal._hyper_16_413_chunk')
2023-08-30 10:29:43.511 CEST [28896] HINT:  See C include file "ntstatus.h" for a description of the hexadecimal value.
2023-08-30 10:29:43.511 CEST [28896] LOG:  terminating any other active server processes
2023-08-30 10:29:43.534 CEST [8544] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.535 CEST [28896] LOG:  all server processes terminated; reinitializing
2023-08-30 10:29:43.570 CEST [22404] LOG:  database system was interrupted; last known up at 2023-08-30 10:28:55 CEST
2023-08-30 10:29:43.572 CEST [28932] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.586 CEST [19236] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.653 CEST [21988] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.668 CEST [18012] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.682 CEST [16564] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.698 CEST [16060] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.713 CEST [22432] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.728 CEST [3620] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.743 CEST [10428] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.759 CEST [29576] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.885 CEST [26464] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.900 CEST [23356] FATAL:  the database system is in recovery mode
2023-08-30 10:29:43.916 CEST [4208] FATAL:  the database system is in recovery mode
2023-08-30 10:29:44.044 CEST [13108] FATAL:  the database system is in recovery mode
2023-08-30 10:29:44.059 CEST [23668] FATAL:  the database system is in recovery mode
2023-08-30 10:29:44.075 CEST [6272] FATAL:  the database system is in recovery mode
2023-08-30 10:29:44.200 CEST [1636] FATAL:  the database system is in recovery mode
2023-08-30 10:29:44.561 CEST [22404] LOG:  database system was not properly shut down; automatic recovery in progress
2023-08-30 10:29:44.565 CEST [22404] LOG:  redo starts at 26/21256530
2023-08-30 10:29:44.565 CEST [22404] LOG:  invalid record length at 26/21256568: wanted 24, got 0
2023-08-30 10:29:44.565 CEST [22404] LOG:  redo done at 26/21256530 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-08-30 10:29:44.595 CEST [28896] LOG:  database system is ready to accept connections
2023-08-30 10:29:44.612 CEST [28088] LOG:  TimescaleDB background worker launcher connected to shared catalogs

How can we reproduce the bug?

no idea
@lkshminarayanan
Copy link
Contributor

Hello @flexwende,

Thank you bringing this issue to our attention!

The error message you mentioned, database system is in recovery mode, typically occurs when the database server is in the process of starting up and is undergoing recovery. This is a normal part of the startup procedure. According to the logs, it appears that the database server successfully completed its recovery phase and was ready to accept connections.

However, the primary concern here is the unexpected termination of the server process during the execution of the query select decompress_chunk('_timescaledb_internal._hyper_16_413_chunk'). The exception 0xC0000005 present in the logs suggests a possible segmentation fault as the cause of termination, although further investigation is required for confirmation. Following the termination, the postgres server automatically got restarted and entered the recovery mode. During this recovery period, any attempts you made to decompress data would naturally result in the database system is in recovery mode error.

Regarding chunks created in version 2.5.0, they should not be a problem. In my internal tests, I was able to compress and decompress chunks that were created in 2.5.0 and upgraded to 2.11.2, without any issues. It is possible that the problem might be specific to a particular schema configuration. And for this purpose, it will be good to have more detailed information such as the table schema, tuple count, compression settings, and any relevant policies associated with the hypertable. Can you please provide us with these information?

Considering that the database server successfully completed its recovery and is operational again, could you please attempt to run decompress_chunk once more? This will help determine whether the issue persists beyond the recovery phase. If the problem persists, would it be feasible for you to provide a coredump or a stacktrace from the moment of the crash? These additional details will help us to find the root cause of the unexpected termination.

Thank you!

@flexwende
Copy link
Author

flexwende commented Aug 30, 2023

I solved the problem at the exact time of your answer :D

The table has an unique index. After droping the unique index decompress_chunk works as expected. To readd the unique index, compression had to be disabled for the hypertable.

Thank you for the quick response!

@lkshminarayanan
Copy link
Contributor

Glad you were able to solve the issue! I'll mark this is as closed for now. Please do reopen if you face any further issues with this. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants