Skip to content

MySQL query optimizer doesn't necessarily use unique key in BuildUniqueKeyRangeEndPreparedQueryViaOffset's query #1538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MatthewJin-at opened this issue Apr 24, 2025 · 0 comments

Comments

@MatthewJin-at
Copy link

We've run into an issue where the gh-ost migration starts off fine, but at some point, the MySQL query optimizer behavior changes for this query, and this query starts taking ~15m.

The explain query shows that the MySQL query optimizer is doing an index scan

mysql> explain select `applicationId`, `id` from `_workflowActionExecutions` where ((`applicationId` > _binary'appAAAA') or (((`applicationId` = _binary'appAAAA')) AND (`id` > _binary'waxAAAA'))) and ((`applicationId` < _binary'appZZZZ') or (((`applicationId` =
_binary'appZZZZ')) AND (`id` < _binary'waxZZZZ')) or ((`applicationId` = _binary'appZZZZ') and (`id` = _binary'waxZZZZ'))) order by `applicationId` asc, `id` asc limit 1 offset 999;
+----+-------------+---------------------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table                     | partitions | type  | possible_keys                                                                                                                                                                                       | key     | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+---------------------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | _workflowActionExecutions | NULL       | index | PRIMARY,IDX_applicationId_workflowGraphExecutionId,IDX_applicationId_workflowExecutionId_status_branchId,IDX_applicationId_status_workflowId,IDX_applicationId_workflowId_status_processTime,IDX_id | PRIMARY | 34      | NULL | 1000 |   100.00 | Using where; Using index |
+----+-------------+---------------------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.47 sec)

(ids fuzzed)

This is because MySQL thinks that if we USE INDEX (PRIMARY), it will end up doing a massive range scan

mysql> explain select `applicationId`, `id`
    -> from `_workflowActionExecutions` USE INDEX (PRIMARY)
    -> where ((`applicationId` > _binary'appAAAA') or (((`applicationId` = _binary'appAAAA')) AND (`id` > _binary'waxAAAA'))) and ((`applicationId` < _binary'appZZZZ') or (((`applicationId` = _binary'appZZZZ')) AND (`id` < _binary'waxZZZZ')) or ((`applicationId` = _binary'appZZZZ') and (`id` = _binary'waxZZZZ')))
    -> order by `applicationId` asc, `id` asc
    -> limit 1
    -> offset 999;
+----+-------------+---------------------------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
| id | select_type | table                     | partitions | type  | possible_keys | key     | key_len | ref  | rows     | filtered | Extra                    |
+----+-------------+---------------------------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
|  1 | SIMPLE      | _workflowActionExecutions | NULL       | range | PRIMARY       | PRIMARY | 34      | NULL | 95809902 |   100.00 | Using where; Using index |
+----+-------------+---------------------------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
1 row in set, 1 warning (0.46 sec)

(note the massive amount of rows)

But if I run the query with USE INDEX (PRIMARY), it's actually really fast, instead of taking ~15m

mysql> select `applicationId`, `id`
    -> from `_workflowActionExecutions` USE INDEX (PRIMARY)
    -> where ((`applicationId` > _binary'appAAAA') or (((`applicationId` = _binary'appAAAA')) AND (`id` > _binary'waxAAAA'))) and ((`applicationId` < _binary'appZZZZ') or (((`applicationId` = _binary'appZZZZ')) AND (`id` < _binary'waxZZZZ')) or ((`applicationId` = _binary'appZZZZ') and (`id` = _binary'waxZZZZ')))
    -> order by `applicationId` asc, `id` asc
    -> limit 1
    -> offset 999;
+-------------------+-------------------+
| applicationId     | id                |
+-------------------+-------------------+
...
+-------------------+-------------------+
1 row in set (0.53 sec)

Is it possible to force the use of the index when constructing the query? Is that an appropriate solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant