Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] Race condition when applying large transactions and dropping tables #7894

Closed
bmatican opened this issue Mar 31, 2021 · 0 comments
Closed
Assignees
Labels
area/docdb YugabyteDB core features

Comments

@bmatican
Copy link
Contributor

bmatican commented Mar 31, 2021

Executing a large transaction and then immediately dropping the table seems to lead to crashes! I was not able to repro this when manually running commands in the shell!

But if I dump them to a file and run the file via the shell, it does fail:

cat /tmp/fail.sql
CREATE TABLE t_2000000(id int);
INSERT INTO t_2000000 SELECT i FROM (SELECT generate_series(1, 2000000) i) t;
DROP TABLE t_2000000;

./bin/ysqlsh -f /tmp/fail.sql

Example stacks

Stack 1:

#0  load (__m=std::memory_order_acquire, this=0xffffffffb8ae5308) at /opt/yb-build/brew/linuxbrew-20181203T161736v9-3ba4c2ed9b0587040949a4a9a95b576f520bae/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/atomic_base.h:713
#1  load (__m=std::memory_order_acquire, this=0xffffffffb8ae5308) at /opt/yb-build/brew/linuxbrew-20181203T161736v9-3ba4c2ed9b0587040949a4a9a95b576f520bae/Cellar/gcc/5.5.0_4/include/c++/5.5.0/atomic:420
#2  Next (n=149566879, this=0x0) at ../../src/yb/rocksdb/db/skiplist.h:577
#3  FindGreaterOrEqual (key=0x8ebfea0 "/G\027\331S\300\264C\227\200*AB\276\345\234=\\\031\355\060", this=0x4ae16f0) at ../../src/yb/rocksdb/db/skiplist.h:361
#4  Seek (target=0x8ebfea0 "/G\027\331S\300\264C\227\200*AB\276\345\234=\\\031\355\060", this=0x2adbb648) at ../../src/yb/rocksdb/db/skiplist.h:315
#5  rocksdb::(anonymous namespace)::SkipListRep<rocksdb::SingleWriterInlineSkipList<rocksdb::MemTableRep::KeyComparator const&> >::Iterator::Seek (this=0x2adbb640, user_key=..., memtable_key=<optimized out>)
    at ../../src/yb/rocksdb/memtable/skiplistrep.cc:139
#6  0x00007fe3f444cca8 in rocksdb::MemTableIterator::Seek (this=0x2adbb5f8, k=...) at ../../src/yb/rocksdb/db/memtable.cc:284
#7  0x00007fe3f4425776 in rocksdb::DBIter::Seek (this=0x2adbb238, target=...) at ../../src/yb/rocksdb/db/db_iter.cc:758
#8  0x00007fe3f9fd919d in yb::docdb::BoundedRocksDbIterator::Seek (this=0x7fe3dd8e19c0, target=...) at ../../src/yb/docdb/bounded_rocksdb_iterator.cc:62
#9  0x00007fe3fa00107d in yb::docdb::IntentToWriteRequest (transaction_id_slice=..., commit_ht=..., commit_ht@entry=..., reverse_index_key=..., reverse_index_value=..., intent_iter=intent_iter@entry=0x7fe3dd8e19c0, regular_batch=0x7fe3dd8e1bb0,
    write_id=0x7fe3dd8e1784) at ../../src/yb/docdb/docdb.cc:990
#10 0x00007fe3fa0078de in yb::docdb::PrepareApplyIntentsBatch (transaction_id=..., commit_ht=..., key_bounds=key_bounds@entry=0x62c4388, apply_state=0x630f5d8, regular_batch=regular_batch@entry=0x7fe3dd8e1bb0, intents_db=0x6398000, intents_batch=0x0)
    at ../../src/yb/docdb/docdb.cc:1176
#11 0x00007fe3faf941fd in yb::tablet::Tablet::ApplyIntents (this=0x62c4010, data=...) at ../../src/yb/tablet/tablet.cc:1999
#12 0x00007fe3faf612b8 in yb::tablet::ApplyIntentsTask::Run (this=0x630f600) at ../../src/yb/tablet/apply_intents_task.cc:57
#13 0x00007fe3f5a16fd9 in yb::rpc::Strand::Done (this=0x49118f0, status=...) at ../../src/yb/rpc/strand.cc:77
#14 0x00007fe3f5a25153 in yb::rpc::(anonymous namespace)::Worker::Execute (this=<optimized out>) at ../../src/yb/rpc/thread_pool.cc:106
#15 0x00007fe3f19a351f in operator() (this=0x315ab98) at /opt/yb-build/brew/linuxbrew-20181203T161736v9-3ba4c2ed9b0587040949a4a9a95b576f520bae/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#16 yb::Thread::SuperviseThread (arg=0x315ab40) at ../../src/yb/util/thread.cc:771
#17 0x00007fe3ed27c694 in start_thread (arg=0x7fe3dd8ea700) at pthread_create.c:333
#18 0x00007fe3ecfbe41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Stack 2:

#0  Unref (this=0x0) at ../../src/yb/rocksdb/db/memtable.h:139
#1  rocksdb::SuperVersion::Cleanup (this=0x3a67980) at ../../src/yb/rocksdb/db/column_family.cc:313
#2  0x00007ff20c3d7a55 in rocksdb::(anonymous namespace)::CleanupIteratorState (arg1=0x9862f80, arg2=<optimized out>) at ../../src/yb/rocksdb/db/db_impl.cc:3777
#3  0x00007ff20c4b470c in rocksdb::Cleanable::~Cleanable (this=0x8a5f600, __in_chrg=<optimized out>) at ../../src/yb/rocksdb/table/iterator.cc:38
#4  0x00007ff20c4112fd in ~DBIter (this=0x8a5f238, __in_chrg=<optimized out>) at ../../src/yb/rocksdb/db/db_iter.cc:106
#5  rocksdb::ArenaWrappedDBIter::~ArenaWrappedDBIter (this=0x8a5f200, __in_chrg=<optimized out>) at ../../src/yb/rocksdb/db/db_iter.cc:877
#6  0x00007ff20c411331 in rocksdb::ArenaWrappedDBIter::~ArenaWrappedDBIter (this=0x8a5f200, __in_chrg=<optimized out>) at ../../src/yb/rocksdb/db/db_iter.cc:877
#7  0x00007ff212fd70e9 in operator() (this=<optimized out>, __ptr=<optimized out>) at /opt/yb-build/brew/linuxbrew-20181203T161736v9-3ba4c2ed9b0587040949a4a9a95b576f520bae/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/unique_ptr.h:76
#8  ~unique_ptr (this=0x7ff1ea6a29a8, __in_chrg=<optimized out>) at /opt/yb-build/brew/linuxbrew-20181203T161736v9-3ba4c2ed9b0587040949a4a9a95b576f520bae/Cellar/gcc/5.5.0_4/include/c++/5.5.0/bits/unique_ptr.h:239
#9  yb::docdb::BoundedRocksDbIterator::~BoundedRocksDbIterator (this=0x7ff1ea6a2980, __in_chrg=<optimized out>) at ../../src/yb/docdb/bounded_rocksdb_iterator.h:31
#10 0x00007ff211fef520 in yb::docdb::PrepareApplyIntentsBatch (transaction_id=..., commit_ht=..., key_bounds=key_bounds@entry=0x105de88, apply_state=0x45ddad8, regular_batch=regular_batch@entry=0x7ff1ea6a2bb0, intents_db=0x44e4000, intents_batch=0x0)
    at ../../src/yb/docdb/docdb.cc:1111
#11 0x00007ff212f7c1fd in yb::tablet::Tablet::ApplyIntents (this=0x105db10, data=...) at ../../src/yb/tablet/tablet.cc:1999
#12 0x00007ff212f492b8 in yb::tablet::ApplyIntentsTask::Run (this=0x45ddb00) at ../../src/yb/tablet/apply_intents_task.cc:57
#13 0x00007ff20d9fefd9 in yb::rpc::Strand::Done (this=0x1521e30, status=...) at ../../src/yb/rpc/strand.cc:77
#14 0x00007ff20da0d153 in yb::rpc::(anonymous namespace)::Worker::Execute (this=<optimized out>) at ../../src/yb/rpc/thread_pool.cc:106
#15 0x00007ff20998b51f in operator() (this=0x2594058) at /opt/yb-build/brew/linuxbrew-20181203T161736v9-3ba4c2ed9b0587040949a4a9a95b576f520bae/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#16 yb::Thread::SuperviseThread (arg=0x2594000) at ../../src/yb/util/thread.cc:771
#17 0x00007ff205264694 in start_thread (arg=0x7ff1ea6ab700) at pthread_create.c:333
#18 0x00007ff204fa641d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

@tanujnay112 had reported this internally as it was failing in ./yb_build.sh --java-test org.yb.pgsql.TestPgRegressPlanner#testPgRegressDml

@bmatican bmatican added the area/docdb YugabyteDB core features label Mar 31, 2021
spolitov added a commit that referenced this issue Apr 2, 2021
…action

Summary:
TServer could crash if table is dropped while intents for large transaction are being applyed.
This fix keeps ScopedRWOperation while applying intents for large transaction to prevent appropriate rocksdb from being destroyed.
That was causing the crash.

Test Plan: ybd --gtest_filter PgMiniTest.BigInsertWithDropTable

Reviewers: mbautin, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D11117
spolitov added a commit that referenced this issue Apr 20, 2021
Summary:
In debug builds we create LongOperationTracker for each ScopedRWOperation to find long running ScopedRWOperations.
But we should not do it for empty ScopedRWOperations, otherwise false reports from long operation tracker could occure.
For instance we have unused ScopedRWOperation in RunningTransaction and ScopedRWOperation in ApplyIntentsTask, that is contained in RunningTransaction.

As result if we have transaction that runs longer than 1 second (3 seconds in TSAN) it will be reported twice.

Test Plan: ybd tsan --gtest_filter SnapshotTxnTest.DeleteOnLoad --test-timeout-sec 120

Reviewers: timur, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D11281
YintongMa pushed a commit to YintongMa/yugabyte-db that referenced this issue May 26, 2021
…ge transaction

Summary:
TServer could crash if table is dropped while intents for large transaction are being applyed.
This fix keeps ScopedRWOperation while applying intents for large transaction to prevent appropriate rocksdb from being destroyed.
That was causing the crash.

Test Plan: ybd --gtest_filter PgMiniTest.BigInsertWithDropTable

Reviewers: mbautin, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D11117
YintongMa pushed a commit to YintongMa/yugabyte-db that referenced this issue May 26, 2021
…WOperation

Summary:
In debug builds we create LongOperationTracker for each ScopedRWOperation to find long running ScopedRWOperations.
But we should not do it for empty ScopedRWOperations, otherwise false reports from long operation tracker could occure.
For instance we have unused ScopedRWOperation in RunningTransaction and ScopedRWOperation in ApplyIntentsTask, that is contained in RunningTransaction.

As result if we have transaction that runs longer than 1 second (3 seconds in TSAN) it will be reported twice.

Test Plan: ybd tsan --gtest_filter SnapshotTxnTest.DeleteOnLoad --test-timeout-sec 120

Reviewers: timur, bogdan

Reviewed By: bogdan

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D11281
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features
Projects
YBase features
  
Awaiting triage
Development

No branches or pull requests

2 participants