Skip to content

In the CutOver phase wait events up to lock timeout, the _del table lock to be not release, cause DB not writable. Introduced in #755 #992

Open
bytebase/gh-ost
#16
@brightgogo

Description

@brightgogo

when gh-ost appling binlog, pauses for a period of time, and then resumes appling binlog and row copy.
At this time, the row copy is completed and start the cutOver phase.
binlog apply delay, causes timeout waiting for events up to lock thread and then lock was not released on rollback, causing the DB to be locked and cannot be written.

gh-ost execution log :

2021-06-20 17:03:23 INFO Row copy complete
Copy: 594148/594148 100.0%; Applied: 1366; Backlog: 0/1000; Time: 11m38s(total), 11m38s(copy); streamer: mysql-bin.008173:15450343; Lag: 0.01s, State: migrating; ETA: due
2021-06-20 17:03:23 INFO Grabbing voluntary lock: gh-ost.8642188.lock
2021-06-20 17:03:23 INFO Setting LOCK timeout as 6 seconds
2021-06-20 17:03:23 INFO Looking for magic cut-over table
2021-06-20 17:03:23 INFO Creating magic cut-over table test._test_1_1624179105_del
Copy: 594148/594148 100.0%; Applied: 1366; Backlog: 0/1000; Time: 11m38s(total), 11m38s(copy); streamer: mysql-bin.008173:15508966; Lag: 0.01s, State: migrating; ETA: due
2021-06-20 17:03:23 INFO Magic cut-over table created
2021-06-20 17:03:23 INFO Locking test.test_1, test._test_1_1624179105_del
2021-06-20 17:03:23 INFO Tables locked
2021-06-20 17:03:23 INFO Session locking original & magic tables is 8642188
2021-06-20 17:03:23 INFO Writing changelog state: AllEventsUpToLockProcessed:1624179803921417462
2021-06-20 17:03:23 INFO Waiting for events up to lock
Copy: 594148/594148 100.0%; Applied: 1366; Backlog: 2/1000; Time: 11m38s(total), 11m38s(copy); streamer: mysql-bin.008173:15830960; Lag: 0.01s, State: migrating; ETA: due
Copy: 594148/594148 100.0%; Applied: 1479; Backlog: 38/1000; Time: 11m41s(total), 11m38s(copy); streamer: mysql-bin.008173:44351489; Lag: 0.01s, State: migrating;
2021-06-20 17:03:26 ERROR Timeout while waiting for events up to lock
2021-06-20 17:03:26 ERROR 2021-06-20 17:03:26 ERROR Timeout while waiting for events up to lock
2021-06-20 17:03:26 INFO Looking for magic cut-over table
2021-06-20 17:03:26 INFO Will now proceed to drop magic table and unlock tables
2021-06-20 17:03:26 INFO Dropping magic cut-over table
2021-06-20 17:03:26 INFO Dropping magic cut-over table
2021-06-20 17:03:26 INFO Dropping table test._test_1_1624179105_del

image

the above sync.Once implementation logic, the one that is grabbed first holds the m.Lock() lock, and then another thread will wait until the previous thread After the execution is complete, change the o.done status from 0 to 1, and the subsequent thread can exit without having to execute.

In gh-ost during rollback operation,if the atomicCutOver() --> defer func() --> DropAtomicCutOverSentryTableIfExists() acquires the lock first, and the drop _del table will wait for the _del table lock, and the in the AtomicCutOverMagicLock() drop _del operation will Blocked,causes the DB to be unwritable more than 300s.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions