Skip to content

MySQL query optimizer doesn't necessarily use unique key in BuildUniqueKeyRangeEndPreparedQueryViaOffset's query #1538

Open
@MatthewJin-at

Description

@MatthewJin-at

We've run into an issue where the gh-ost migration starts off fine, but at some point, the MySQL query optimizer behavior changes for this query, and this query starts taking ~15m.

The explain query shows that the MySQL query optimizer is doing an index scan

mysql> explain select `applicationId`, `id` from `_workflowActionExecutions` where ((`applicationId` > _binary'appAAAA') or (((`applicationId` = _binary'appAAAA')) AND (`id` > _binary'waxAAAA'))) and ((`applicationId` < _binary'appZZZZ') or (((`applicationId` =
_binary'appZZZZ')) AND (`id` < _binary'waxZZZZ')) or ((`applicationId` = _binary'appZZZZ') and (`id` = _binary'waxZZZZ'))) order by `applicationId` asc, `id` asc limit 1 offset 999;
+----+-------------+---------------------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table                     | partitions | type  | possible_keys                                                                                                                                                                                       | key     | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+---------------------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | _workflowActionExecutions | NULL       | index | PRIMARY,IDX_applicationId_workflowGraphExecutionId,IDX_applicationId_workflowExecutionId_status_branchId,IDX_applicationId_status_workflowId,IDX_applicationId_workflowId_status_processTime,IDX_id | PRIMARY | 34      | NULL | 1000 |   100.00 | Using where; Using index |
+----+-------------+---------------------------+------------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.47 sec)

(ids fuzzed)

This is because MySQL thinks that if we USE INDEX (PRIMARY), it will end up doing a massive range scan

mysql> explain select `applicationId`, `id`
    -> from `_workflowActionExecutions` USE INDEX (PRIMARY)
    -> where ((`applicationId` > _binary'appAAAA') or (((`applicationId` = _binary'appAAAA')) AND (`id` > _binary'waxAAAA'))) and ((`applicationId` < _binary'appZZZZ') or (((`applicationId` = _binary'appZZZZ')) AND (`id` < _binary'waxZZZZ')) or ((`applicationId` = _binary'appZZZZ') and (`id` = _binary'waxZZZZ')))
    -> order by `applicationId` asc, `id` asc
    -> limit 1
    -> offset 999;
+----+-------------+---------------------------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
| id | select_type | table                     | partitions | type  | possible_keys | key     | key_len | ref  | rows     | filtered | Extra                    |
+----+-------------+---------------------------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
|  1 | SIMPLE      | _workflowActionExecutions | NULL       | range | PRIMARY       | PRIMARY | 34      | NULL | 95809902 |   100.00 | Using where; Using index |
+----+-------------+---------------------------+------------+-------+---------------+---------+---------+------+----------+----------+--------------------------+
1 row in set, 1 warning (0.46 sec)

(note the massive amount of rows)

But if I run the query with USE INDEX (PRIMARY), it's actually really fast, instead of taking ~15m

mysql> select `applicationId`, `id`
    -> from `_workflowActionExecutions` USE INDEX (PRIMARY)
    -> where ((`applicationId` > _binary'appAAAA') or (((`applicationId` = _binary'appAAAA')) AND (`id` > _binary'waxAAAA'))) and ((`applicationId` < _binary'appZZZZ') or (((`applicationId` = _binary'appZZZZ')) AND (`id` < _binary'waxZZZZ')) or ((`applicationId` = _binary'appZZZZ') and (`id` = _binary'waxZZZZ')))
    -> order by `applicationId` asc, `id` asc
    -> limit 1
    -> offset 999;
+-------------------+-------------------+
| applicationId     | id                |
+-------------------+-------------------+
...
+-------------------+-------------------+
1 row in set (0.53 sec)

Is it possible to force the use of the index when constructing the query? Is that an appropriate solution?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions