Description
when gh-ost appling binlog, pauses for a period of time, and then resumes appling binlog and row copy.
At this time, the row copy is completed and start the cutOver phase.
binlog apply delay, causes timeout waiting for events up to lock thread and then lock was not released on rollback, causing the DB to be locked and cannot be written.
gh-ost execution log :
2021-06-20 17:03:23 INFO Row copy complete
Copy: 594148/594148 100.0%; Applied: 1366; Backlog: 0/1000; Time: 11m38s(total), 11m38s(copy); streamer: mysql-bin.008173:15450343; Lag: 0.01s, State: migrating; ETA: due
2021-06-20 17:03:23 INFO Grabbing voluntary lock: gh-ost.8642188.lock
2021-06-20 17:03:23 INFO Setting LOCK timeout as 6 seconds
2021-06-20 17:03:23 INFO Looking for magic cut-over table
2021-06-20 17:03:23 INFO Creating magic cut-over tabletest
._test_1_1624179105_del
Copy: 594148/594148 100.0%; Applied: 1366; Backlog: 0/1000; Time: 11m38s(total), 11m38s(copy); streamer: mysql-bin.008173:15508966; Lag: 0.01s, State: migrating; ETA: due
2021-06-20 17:03:23 INFO Magic cut-over table created
2021-06-20 17:03:23 INFO Lockingtest
.test_1
,test
._test_1_1624179105_del
2021-06-20 17:03:23 INFO Tables locked
2021-06-20 17:03:23 INFO Session locking original & magic tables is 8642188
2021-06-20 17:03:23 INFO Writing changelog state: AllEventsUpToLockProcessed:1624179803921417462
2021-06-20 17:03:23 INFO Waiting for events up to lock
Copy: 594148/594148 100.0%; Applied: 1366; Backlog: 2/1000; Time: 11m38s(total), 11m38s(copy); streamer: mysql-bin.008173:15830960; Lag: 0.01s, State: migrating; ETA: due
Copy: 594148/594148 100.0%; Applied: 1479; Backlog: 38/1000; Time: 11m41s(total), 11m38s(copy); streamer: mysql-bin.008173:44351489; Lag: 0.01s, State: migrating;
2021-06-20 17:03:26 ERROR Timeout while waiting for events up to lock
2021-06-20 17:03:26 ERROR 2021-06-20 17:03:26 ERROR Timeout while waiting for events up to lock
2021-06-20 17:03:26 INFO Looking for magic cut-over table
2021-06-20 17:03:26 INFO Will now proceed to drop magic table and unlock tables
2021-06-20 17:03:26 INFO Dropping magic cut-over table
2021-06-20 17:03:26 INFO Dropping magic cut-over table
2021-06-20 17:03:26 INFO Dropping tabletest
._test_1_1624179105_del
the above sync.Once implementation logic, the one that is grabbed first holds the m.Lock() lock, and then another thread will wait until the previous thread After the execution is complete, change the o.done status from 0 to 1, and the subsequent thread can exit without having to execute.
In gh-ost during rollback operation,if the atomicCutOver() --> defer func() --> DropAtomicCutOverSentryTableIfExists() acquires the lock first, and the drop _del table will wait for the _del table lock, and the in the AtomicCutOverMagicLock() drop _del operation will Blocked,causes the DB to be unwritable more than 300s.
Thank you!