New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
postgres server crash on dropping large hypertable #1486
Comments
@fvannee Are there any relationships between the tables that are being dropped (like foreign keys, continuous aggregates )? |
No, nothing. They're completely independent. |
This patch prevents recursing into the cache invalidation code. Since this code's result is dependent on the transaction snapshot, processing it multiple times recursively won't change the result. Fixes timescale#1486
I'm afraid not. It seems very difficult to reproduce. I took a look at the PR though and I think it'll indeed fix the issue. I have one further question though: do you have an idea why it keeps on receiving the invalidation messages? It seems to be processing different invalidation messages on every iteration - so apparently it's flooded with invalidation messages. Judging by the depth of the backtrace, there must be at least 6000? That's even much more than the total number of chunks that were dropped. |
So on commit, it should receive a message for 2 x #chunks (one for the table drop, one for the removal of the chunks from the hypertable parent table) + # of chunk indexex + any functions due to cascade + other stuff I am missing. With 600 chunks I can see this adding up. But, the weird thing is that if the message queue gets filled up (size of queue=4096), Postgres send a 0 - meaning invalidate all caches. So the system must be clearing the cache just fast enough to avoid this but filling up it's call stack instead. That's my read of the situation anyway. |
Ah indeed, makes sense. Thanks for looking into it! |
This patch prevents recursing into the cache invalidation code. Since this code's result is dependent on the transaction snapshot, processing it multiple times recursively won't change the result. Fixes #1486
Relevant system information:
postgres --version
): 11.2\dx
inpsql
): 1.4.2Describe the bug
We ran into a Postgres server crash while dropping a large (4TB, +- 600 chunks) TimescaleDb table. Unfortunately it looks like a race condition, so I can't reproduce it. However, I have the core dump. I'll also explain what happened.
Connection 1 executed in a transaction (a, b and d were large hypertables, c was a regular table):
At the same time, other processes were just running and querying the database, but none of them were querying the tables that were being dropped. One of these other, unrelated, processes crashed. This process was inserting data into a completely different hypertable.
Part of the backtrace can be found below. It's going into infinite recursion, eventually leading to a stackoverflow, so the full backtrace is a bit too long. You can see from the pasted part where it starts recursing though.
Please let me know in case you need any additional info from the core dump.
The text was updated successfully, but these errors were encountered: