New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recompress deadlock when when running with queries #3846
Comments
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index are locked in different orders. This commit adds an isolation test that trigger a deadlock between `recompress_chunk` and a query on a table with some uncompressed rows and then fixes the deadlock risk by conservatively taking the lock on the compressed chunk when the recompress procedure starts running (since it will eventually truncate it). Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in `AccessExclusive` mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in `AccessExclusive` as part of truncating it. To avoid the deadlock, this commit skips rebuilding the uncompressed chunk index when decompressing the chunk since it will not change when incorporating new rows into the compressed chunk. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in `AccessExclusive` mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in `AccessExclusive` as part of truncating it. To avoid the deadlock, this commit skips rebuilding the uncompressed chunk index when decompressing the chunk since it will not change when incorporating new rows into the compressed chunk. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in `AccessExclusive` mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in `AccessExclusive` as part of truncating it. To avoid the deadlock, this commit skips rebuilding the uncompressed chunk index when decompressing the chunk since it will not change when incorporating new rows into the compressed chunk. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in `AccessExclusive` mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in `AccessExclusive` as part of truncating it. To avoid the deadlock, this commit skips rebuilding the uncompressed chunk index when decompressing the chunk during recompression since it will not change when incorporating new rows into the compressed chunk. The index is still rebuilt when just decompressing a chunk. Fixes timescale#3846
When executing recompress chunk policy concurrently with queries query, a deadlock can be generated because the chunk relation and the chunk index or the uncompressed chunk or the compressed chunk are locked in different orders. In particular, when recompress chunk policy is executing, it will first decompress the chunk and as part of that lock the compressed chunk in AccessExclusive mode when dropping it and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. To avoid the deadlock, this commit updates the recompress policy to do the compression and the decompression steps in separate transactions, which will avoid the deadlock since each phase locks indexes and compressed/uncompressed chunks in the same order. Partial-Bug: timescale#3846
When executing recompress chunk policy concurrently with queries query, a deadlock can be generated because the chunk relation and the chunk index or the uncompressed chunk or the compressed chunk are locked in different orders. In particular, when recompress chunk policy is executing, it will first decompress the chunk and as part of that lock the compressed chunk in `AccessExclusive` mode when dropping it and when trying to compress the chunk again it will try to lock the uncompressed chunk in `AccessExclusive` mode as part of truncating it. To avoid the deadlock, this commit updates the recompress policy to do the compression and the decompression steps in separate transactions, which will avoid the deadlock since each phase (decompress and compress chunk) locks indexes and compressed/uncompressed chunks in the same order. Note that this fixes the policy only, and not the `recompress_chunk` function, which still is prone to deadlocks. Partial-Bug: timescale#3846
When executing recompress chunk policy concurrently with queries query, a deadlock can be generated because the chunk relation and the chunk index or the uncompressed chunk or the compressed chunk are locked in different orders. In particular, when recompress chunk policy is executing, it will first decompress the chunk and as part of that lock the compressed chunk in `AccessExclusive` mode when dropping it and when trying to compress the chunk again it will try to lock the uncompressed chunk in `AccessExclusive` mode as part of truncating it. To avoid the deadlock, this commit updates the recompress policy to do the compression and the decompression steps in separate transactions, which will avoid the deadlock since each phase (decompress and compress chunk) locks indexes and compressed/uncompressed chunks in the same order. Note that this fixes the policy only, and not the `recompress_chunk` function, which still is prone to deadlocks. Partial-Bug: timescale#3846
When executing recompress chunk policy concurrently with queries query, a deadlock can be generated because the chunk relation and the chunk index or the uncompressed chunk or the compressed chunk are locked in different orders. In particular, when recompress chunk policy is executing, it will first decompress the chunk and as part of that lock the compressed chunk in `AccessExclusive` mode when dropping it and when trying to compress the chunk again it will try to lock the uncompressed chunk in `AccessExclusive` mode as part of truncating it. To avoid the deadlock, this commit updates the recompress policy to do the compression and the decompression steps in separate transactions, which will avoid the deadlock since each phase (decompress and compress chunk) locks indexes and compressed/uncompressed chunks in the same order. Note that this fixes the policy only, and not the `recompress_chunk` function, which still is prone to deadlocks. Partial-Bug: #3846
When executing recompress chunk policy concurrently with queries query, a deadlock can be generated because the chunk relation and the chunk index or the uncompressed chunk or the compressed chunk are locked in different orders. In particular, when recompress chunk policy is executing, it will first decompress the chunk and as part of that lock the compressed chunk in `AccessExclusive` mode when dropping it and when trying to compress the chunk again it will try to lock the uncompressed chunk in `AccessExclusive` mode as part of truncating it. To avoid the deadlock, this commit updates the recompress policy to do the compression and the decompression steps in separate transactions, which will avoid the deadlock since each phase (decompress and compress chunk) locks indexes and compressed/uncompressed chunks in the same order. Note that this fixes the policy only, and not the `recompress_chunk` function, which still is prone to deadlocks. Partial-Bug: timescale#3846
After the synchronizing lock is released and the transaction is committed, both sessions are free to execute independently. This means that the query can actually start running before the recompress step has completed, which means that the order for completion is non-deterministic. We fix this by adding a marker so that the query is not reported as completed until the recompress has finished execution. Part-Of: timescale#3846
After the synchronizing lock is released and the transaction is committed, both sessions are free to execute independently. This means that the query can actually start running before the recompress step has completed, which means that the order for completion is non-deterministic. We fix this by adding a marker so that the query is not reported as completed until the recompress has finished execution. Since markers in isolation tests is a recent thing, we only run the test for PostgreSQL versions with markers added. Part-Of: timescale#3846
After the synchronizing lock is released and the transaction is committed, both sessions are free to execute independently. This means that the query can actually start running before the recompress step has completed, which means that the order for completion is non-deterministic. We fix this by adding a marker so that the query is not reported as completed until the recompress has finished execution. Since markers in isolation tests is a recent thing, we only run the test for PostgreSQL versions with markers added. Part-Of: timescale#3846
When executing recompress chunk policy concurrently with queries query, a deadlock can be generated because the chunk relation and the chunk index or the uncompressed chunk or the compressed chunk are locked in different orders. In particular, when recompress chunk policy is executing, it will first decompress the chunk and as part of that lock the compressed chunk in `AccessExclusive` mode when dropping it and when trying to compress the chunk again it will try to lock the uncompressed chunk in `AccessExclusive` mode as part of truncating it. To avoid the deadlock, this commit updates the recompress policy to do the compression and the decompression steps in separate transactions, which will avoid the deadlock since each phase (decompress and compress chunk) locks indexes and compressed/uncompressed chunks in the same order. Note that this fixes the policy only, and not the `recompress_chunk` function, which still is prone to deadlocks. Partial-Bug: #3846
After the synchronizing lock is released and the transaction is committed, both sessions are free to execute independently. This means that the query can actually start running before the recompress step has completed, which means that the order for completion is non-deterministic. We fix this by adding a marker so that the query is not reported as completed until the recompress has finished execution. Since markers in isolation tests is a recent thing, we only run the test for PostgreSQL versions with markers added. Part-Of: timescale#3846
After the synchronizing lock is released and the transaction is committed, both sessions are free to execute independently. This means that the query can actually start running before the recompress step has completed, which means that the order for completion is non-deterministic. We fix this by adding a marker so that the query is not reported as completed until the recompress has finished execution. Since markers in isolation tests is a recent thing, we only run the test for PostgreSQL versions with markers added. Part-Of: #3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index and the compressed and uncompressd chunks are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in AccessExclusive mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. Note that `decompress_chunk` and `compress_chunk` lock the relations in the same order and the issue arises because the procedures are combined inth a single transaction. To avoid the deadlock, this commit rewrites the `recompress_chunk` to be a procedure and adds a commit between the decompression and compression. Committing the transaction after the decompress will allow reads and inserts to proceed by working on the uncompressed chunk, and the compression part of the procedure will take the necessary locks in strict order, thereby avoiding a deadlock. In addition, the isolation test is rewritten so that instead of adding a waitpoint in the PL/SQL function, we implement the isolation test by taking a lock on the compressed table after the decompression. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index and the compressed and uncompressd chunks are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in AccessExclusive mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. Note that `decompress_chunk` and `compress_chunk` lock the relations in the same order and the issue arises because the procedures are combined inth a single transaction. To avoid the deadlock, this commit rewrites the `recompress_chunk` to be a procedure and adds a commit between the decompression and compression. Committing the transaction after the decompress will allow reads and inserts to proceed by working on the uncompressed chunk, and the compression part of the procedure will take the necessary locks in strict order, thereby avoiding a deadlock. In addition, the isolation test is rewritten so that instead of adding a waitpoint in the PL/SQL function, we implement the isolation test by taking a lock on the compressed table after the decompression. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index and the compressed and uncompressd chunks are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in AccessExclusive mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. Note that `decompress_chunk` and `compress_chunk` lock the relations in the same order and the issue arises because the procedures are combined inth a single transaction. To avoid the deadlock, this commit rewrites the `recompress_chunk` to be a procedure and adds a commit between the decompression and compression. Committing the transaction after the decompress will allow reads and inserts to proceed by working on the uncompressed chunk, and the compression part of the procedure will take the necessary locks in strict order, thereby avoiding a deadlock. In addition, the isolation test is rewritten so that instead of adding a waitpoint in the PL/SQL function, we implement the isolation test by taking a lock on the compressed table after the decompression. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index and the compressed and uncompressd chunks are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in AccessExclusive mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. Note that `decompress_chunk` and `compress_chunk` lock the relations in the same order and the issue arises because the procedures are combined inth a single transaction. To avoid the deadlock, this commit rewrites the `recompress_chunk` to be a procedure and adds a commit between the decompression and compression. Committing the transaction after the decompress will allow reads and inserts to proceed by working on the uncompressed chunk, and the compression part of the procedure will take the necessary locks in strict order, thereby avoiding a deadlock. In addition, the isolation test is rewritten so that instead of adding a waitpoint in the PL/SQL function, we implement the isolation test by taking a lock on the compressed table after the decompression. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index and the compressed and uncompressd chunks are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in AccessExclusive mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. Note that `decompress_chunk` and `compress_chunk` lock the relations in the same order and the issue arises because the procedures are combined inth a single transaction. To avoid the deadlock, this commit rewrites the `recompress_chunk` to be a procedure and adds a commit between the decompression and compression. Committing the transaction after the decompress will allow reads and inserts to proceed by working on the uncompressed chunk, and the compression part of the procedure will take the necessary locks in strict order, thereby avoiding a deadlock. In addition, the isolation test is rewritten so that instead of adding a waitpoint in the PL/SQL function, we implement the isolation test by taking a lock on the compressed table after the decompression. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index and the compressed and uncompressd chunks are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in AccessExclusive mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. Note that `decompress_chunk` and `compress_chunk` lock the relations in the same order and the issue arises because the procedures are combined inth a single transaction. To avoid the deadlock, this commit rewrites the `recompress_chunk` to be a procedure and adds a commit between the decompression and compression. Committing the transaction after the decompress will allow reads and inserts to proceed by working on the uncompressed chunk, and the compression part of the procedure will take the necessary locks in strict order, thereby avoiding a deadlock. In addition, the isolation test is rewritten so that instead of adding a waitpoint in the PL/SQL function, we implement the isolation test by taking a lock on the compressed table after the decompression. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index and the compressed and uncompressd chunks are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in AccessExclusive mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. Note that `decompress_chunk` and `compress_chunk` lock the relations in the same order and the issue arises because the procedures are combined inth a single transaction. To avoid the deadlock, this commit rewrites the `recompress_chunk` to be a procedure and adds a commit between the decompression and compression. Committing the transaction after the decompress will allow reads and inserts to proceed by working on the uncompressed chunk, and the compression part of the procedure will take the necessary locks in strict order, thereby avoiding a deadlock. In addition, the isolation test is rewritten so that instead of adding a waitpoint in the PL/SQL function, we implement the isolation test by taking a lock on the compressed table after the decompression. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index and the compressed and uncompressd chunks are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in AccessExclusive mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. Note that `decompress_chunk` and `compress_chunk` lock the relations in the same order and the issue arises because the procedures are combined inth a single transaction. To avoid the deadlock, this commit rewrites the `recompress_chunk` to be a procedure and adds a commit between the decompression and compression. Committing the transaction after the decompress will allow reads and inserts to proceed by working on the uncompressed chunk, and the compression part of the procedure will take the necessary locks in strict order, thereby avoiding a deadlock. In addition, the isolation test is rewritten so that instead of adding a waitpoint in the PL/SQL function, we implement the isolation test by taking a lock on the compressed table after the decompression. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index and the compressed and uncompressd chunks are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in AccessExclusive mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. Note that `decompress_chunk` and `compress_chunk` lock the relations in the same order and the issue arises because the procedures are combined inth a single transaction. To avoid the deadlock, this commit rewrites the `recompress_chunk` to be a procedure and adds a commit between the decompression and compression. Committing the transaction after the decompress will allow reads and inserts to proceed by working on the uncompressed chunk, and the compression part of the procedure will take the necessary locks in strict order, thereby avoiding a deadlock. In addition, the isolation test is rewritten so that instead of adding a waitpoint in the PL/SQL function, we implement the isolation test by taking a lock on the compressed table after the decompression. Fixes timescale#3846
When executing `recompress_chunk` and a query at the same time, a deadlock can be generated because the chunk relation and the chunk index and the compressed and uncompressd chunks are locked in different orders. In particular, when `recompress_chunk` is executing, it will first decompress the chunk and as part of that lock the uncompressed chunk index in AccessExclusive mode and when trying to compress the chunk again it will try to lock the uncompressed chunk in AccessExclusive as part of truncating it. Note that `decompress_chunk` and `compress_chunk` lock the relations in the same order and the issue arises because the procedures are combined inth a single transaction. To avoid the deadlock, this commit rewrites the `recompress_chunk` to be a procedure and adds a commit between the decompression and compression. Committing the transaction after the decompress will allow reads and inserts to proceed by working on the uncompressed chunk, and the compression part of the procedure will take the necessary locks in strict order, thereby avoiding a deadlock. In addition, the isolation test is rewritten so that instead of adding a waitpoint in the PL/SQL function, we implement the isolation test by taking a lock on the compressed table after the decompression. Fixes #3846
I see the same behaviour using timescaledb 2.5.1 on postgres 14.1. using centos7, postgresql from the pgdb-repo, timescaledb from timescaledb. I am upgrading our soh database with timescaledb, so at the moment I have set up a timescaledbenabled table where I am compressing the old data at the same time as new data are coming in. => select compress_chunk(i,true) from show_chunks('nagios_t',older_than=> interval '1 years') i; in the log, I get $ grep -A 10 1410875 /var/log/postgresql.log The query on the last line is copying data from the current working table to the new one. As far as I can see, the relations in the deadlock are two different tsdb chunks: nagios=> select oid,relname from pg_class where oid in (1410875,1410561); |
@sickel select compress_chunk(i,true) from show_chunks('nagios_t',older_than=> interval '1 years') i; The right way to do this is to compress 1 chunk at a time, commit and then compress the next chunk. |
After the synchronizing lock is released and the transaction is committed, both sessions are free to execute independently. This means that the query can actually start running before the recompress step has completed, which means that the order for completion is non-deterministic. We fix this by adding a marker so that the query is not reported as completed until the recompress has finished execution. Since markers in isolation tests is a recent thing, we only run the test for PostgreSQL versions with markers added. Part-Of: #3846
Calling
recompress_chunk
can create a deadlock with a query because of different lock order.To reproduce
hyper
with data and compress it. Do not add a compression policy.tsl_recompress_chunk_wrapper
.SELECT recompress_chunks(show_chunks('hyper'))
in S1.tsl_recompress_chunk_wrapper
until you reach the line withtsl_compress_chunk_wrapper
.SELECT count(*) FROM hyper
in S1. This will block since it is waiting to get locks acquired by S1.recompress_chunks
session, but not always.Error produced
Additional information
In this case, we have:
Locks taken when stopping debugger after step 7 are:
The text was updated successfully, but these errors were encountered: