Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MariaDB deadlock handling added #414

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

pboguslawski
Copy link
Contributor

Proposed change

Znuny (checked in v6) occasionally throws MariaDB deadlock errors to its
log (the more concurrency in DB, i.e. more application servers, the more
such errors in time). More deadlocks were noticed after upgrading to
Debian 11 (MariaDB 10.5).

Such deadlocks even if rare (i.e. a few per day) may introduce problems;
for example...

DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /opt/otrs/Kernel/System/DB.pm line 471.
ERROR: OTRS-otrs.Daemon.pl - Daemon Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager-4 Perl: 5.32.1 OS: linux Time: Thu Dec 1 12:05:19 2022

 Message: Deadlock found when trying to get lock; try restarting transaction, SQL: '
            UPDATE scheduler_task
            SET lock_key = 0
            WHERE lock_key = 1 AND id = ?'

 Traceback (4000847):
   Module: Kernel::System::Daemon::SchedulerDB::TaskAdd Line: 172
   Module: Kernel::System::Daemon::SchedulerDB::RecurrentTaskExecute Line: 2151
   Module: Kernel::System::Daemon::SchedulerDB::CronTaskToExecute Line: 1453
   Module: Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager::Run Line: 103
   Module: (eval) Line: 347
   Module: main::Start Line: 347
   Module: /opt/otrs/bin/otrs.Daemon.pl Line: 163

...blocks executions of MailQueueSend task (outbound e-mails sending) for
longer time, even over 30mins /until Daemon calls TaskCleanup() which removes
broken (NULL in lock_update_time) task from scheduler_task table/. Details
reported by MariaDB show deadlock on index scheduler_task_lock_key_id:

[...]
  Type: InnoDB
[...]
------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-12-05 02:29:06 0x7f454c2bd700
*** (1) TRANSACTION:
TRANSACTION 458003, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 1
MySQL thread id 1132, OS thread handle 139935620974336, query id 33477 myhost [ip] its Updating
UPDATE scheduler_task
            SET lock_key = '100100000838', lock_time = '2022-12-05 01:29:06', lock_update_time = '2022-12-05 01:29:06'
            WHERE lock_key = 0 AND id = '6657'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2143 page no 6 n bits 72 index scheduler_task_lock_key_id of table `its`.`scheduler_task` trx id 458003 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 458004, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 1
MySQL thread id 684, OS thread handle 139935607412480, query id 33478 myhost [ip] its Updating
UPDATE scheduler_task
            SET lock_key = 0
            WHERE lock_key = 1 AND id = '6658'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2143 page no 6 n bits 72 index scheduler_task_lock_key_id of table `its`.`scheduler_task` trx id 458004 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 8; hex 8000000000000001; asc         ;;
 1: len 8; hex 8000000000001a02; asc         ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2143 page no 6 n bits 72 index scheduler_task_lock_key_id of table `its`.`scheduler_task` trx id 458004 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 8; hex 8000000000000001; asc         ;;
 1: len 8; hex 8000000000001a02; asc         ;;

*** WE ROLL BACK TRANSACTION (2)
[...]

Similar problem was noticed (even more often) on scheduler_recurrent_task_lock_key_id
index; in this case error sounds like

DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /opt/otrs/Kernel/System/DB.pm line 471.
ERROR: OTRS-otrs.Daemon.pl - Daemon Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager-4 Perl: 5.32.1 OS: linux Time: Mon Dec 5 11:40:02 2022

 Message: Deadlock found when trying to get lock; try restarting transaction, SQL: '
                UPDATE scheduler_recurrent_task
                SET lock_key = 0, lock_time = NULL, last_execution_time = ?, last_worker_task_id = ?,
                    change_time = '2022-12-05 11:40:02'
                WHERE lock_key = ? AND id = ?'

 Traceback (1220524):
   Module: Kernel::System::Daemon::SchedulerDB::RecurrentTaskExecute Line: 2154
   Module: Kernel::System::Daemon::SchedulerDB::CronTaskToExecute Line: 1453
   Module: Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager::Run Line: 103
   Module: (eval) Line: 347
   Module: main::Start Line: 347
   Module: /opt/otrs/bin/otrs.Daemon.pl Line: 163

Znuny should be prepared to re-issue a transaction
if it fails due to deadlock.

This mod causes Znuny to retry MariaDB queries send using Do() up to 3 times
after MariaDB/MySQL deadlock is detected.

Only 1213 (ER_LOCK_DEADLOCK, Deadlock found when trying to get lock; try restarting transaction)
MariaDB/MySQL errors are handled as deadlocks now; other
MariaDB/MySQL error codes and/or other database server deadlocks may be added
in similar way in the future if required.

Type of change

  • '1 - 🐞 bug 🐞' - Bugfix (non-breaking change which fixes an issue)

Additional information

Replaces: #322
Author-Change-Id: IB#1129362

Checklist

  • The code change is tested and works locally.(❗)
  • There is no commented out code in this PR.(❕)
  • You improved or added new unit tests.(❕)
  • Local ZnunyCodePolicy passed.(❕)
  • Local UnitTests / Selenium passed.(❕)
  • GitHub workflow CI (UnitTests / Selenium) passed.(❗)

Znuny (checked in v6) occasionally throws MariaDB deadlock errors to its
log (the more concurrency in DB, i.e. more application servers, the more
such errors in time). More deadlocks were noticed after upgrading to
Debian 11 (MariaDB 10.5).

Such deadlocks even if rare (i.e. a few per day) may introduce problems;
for example...

```
DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /opt/otrs/Kernel/System/DB.pm line 471.
ERROR: OTRS-otrs.Daemon.pl - Daemon Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager-4 Perl: 5.32.1 OS: linux Time: Thu Dec 1 12:05:19 2022

 Message: Deadlock found when trying to get lock; try restarting transaction, SQL: '
            UPDATE scheduler_task
            SET lock_key = 0
            WHERE lock_key = 1 AND id = ?'

 Traceback (4000847):
   Module: Kernel::System::Daemon::SchedulerDB::TaskAdd Line: 172
   Module: Kernel::System::Daemon::SchedulerDB::RecurrentTaskExecute Line: 2151
   Module: Kernel::System::Daemon::SchedulerDB::CronTaskToExecute Line: 1453
   Module: Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager::Run Line: 103
   Module: (eval) Line: 347
   Module: main::Start Line: 347
   Module: /opt/otrs/bin/otrs.Daemon.pl Line: 163
```
...blocks executions of `MailQueueSend` task (outbound e-mails sending) for
longer time, even over 30mins /until Daemon calls `TaskCleanup()` which removes
broken (`NULL` in `lock_update_time`) task from `scheduler_task` table/. Details
reported by MariaDB show deadlock on index `scheduler_task_lock_key_id`:

```
[...]
  Type: InnoDB
[...]
------------------------
LATEST DETECTED DEADLOCK
------------------------
2022-12-05 02:29:06 0x7f454c2bd700
*** (1) TRANSACTION:
TRANSACTION 458003, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 1
MySQL thread id 1132, OS thread handle 139935620974336, query id 33477 myhost [ip] its Updating
UPDATE scheduler_task
            SET lock_key = '100100000838', lock_time = '2022-12-05 01:29:06', lock_update_time = '2022-12-05 01:29:06'
            WHERE lock_key = 0 AND id = '6657'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2143 page no 6 n bits 72 index scheduler_task_lock_key_id of table `its`.`scheduler_task` trx id 458003 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

*** (2) TRANSACTION:
TRANSACTION 458004, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1128, 4 row lock(s), undo log entries 1
MySQL thread id 684, OS thread handle 139935607412480, query id 33478 myhost [ip] its Updating
UPDATE scheduler_task
            SET lock_key = 0
            WHERE lock_key = 1 AND id = '6658'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2143 page no 6 n bits 72 index scheduler_task_lock_key_id of table `its`.`scheduler_task` trx id 458004 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 8; hex 8000000000000001; asc         ;;
 1: len 8; hex 8000000000001a02; asc         ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2143 page no 6 n bits 72 index scheduler_task_lock_key_id of table `its`.`scheduler_task` trx id 458004 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 8; hex 8000000000000001; asc         ;;
 1: len 8; hex 8000000000001a02; asc         ;;

*** WE ROLL BACK TRANSACTION (2)
[...]

```
Similar problem was noticed (even more often) on `scheduler_recurrent_task_lock_key_id`
index; in this case error sounds like

```
DBD::mysql::db do failed: Deadlock found when trying to get lock; try restarting transaction at /opt/otrs/Kernel/System/DB.pm line 471.
ERROR: OTRS-otrs.Daemon.pl - Daemon Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager-4 Perl: 5.32.1 OS: linux Time: Mon Dec 5 11:40:02 2022

 Message: Deadlock found when trying to get lock; try restarting transaction, SQL: '
                UPDATE scheduler_recurrent_task
                SET lock_key = 0, lock_time = NULL, last_execution_time = ?, last_worker_task_id = ?,
                    change_time = '2022-12-05 11:40:02'
                WHERE lock_key = ? AND id = ?'

 Traceback (1220524):
   Module: Kernel::System::Daemon::SchedulerDB::RecurrentTaskExecute Line: 2154
   Module: Kernel::System::Daemon::SchedulerDB::CronTaskToExecute Line: 1453
   Module: Kernel::System::Daemon::DaemonModules::SchedulerCronTaskManager::Run Line: 103
   Module: (eval) Line: 347
   Module: main::Start Line: 347
   Module: /opt/otrs/bin/otrs.Daemon.pl Line: 163
```

Znuny [should be prepared to re-issue a transaction](https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks-handling.html)
if it fails due to deadlock.

This mod causes Znuny to retry MariaDB queries send using Do() up to 3 times
after MariaDB/MySQL deadlock is detected.

Only `1213` (`ER_LOCK_DEADLOCK`, `Deadlock found when trying to get lock; try restarting transaction`)
MariaDB/MySQL errors are handled as deadlocks now; [other](https://stackoverflow.com/questions/2596005/working-around-mysql-error-deadlock-found-when-trying-to-get-lock-try-restarti/2596101#comment43545045_2596101)
MariaDB/MySQL error codes and/or other database server deadlocks may be added
in similar way in the future if required.

Author-Change-Id: IB#1129362
Author-Change-Id: IB#1129362
Author-Change-Id: IB#1129362
@pboguslawski pboguslawski mentioned this pull request Apr 27, 2023
6 tasks
@dennykorsukewitz dennykorsukewitz added the 1 - 🐞 bug 🐞 An issue with the system. label Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - 🐞 bug 🐞 An issue with the system.
Development

Successfully merging this pull request may close these issues.

2 participants