Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于mysql的loose index scan的几点疑问 #102

Open
zhangyachen opened this issue Dec 14, 2016 · 0 comments
Open

关于mysql的loose index scan的几点疑问 #102

zhangyachen opened this issue Dec 14, 2016 · 0 comments

Comments

@zhangyachen
Copy link
Owner

zhangyachen commented Dec 14, 2016

关于MySQL的loose index scan有几点疑问,欢迎看到这篇文章的人一起探讨。

测试表结构:

CREATE TABLE `test` (
  `id` int(11) NOT NULL default '0',
  `v1` int(10) unsigned NOT NULL default '0',
  `v2` int(10) unsigned NOT NULL default '0',
  `v3` int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `v1_v2_v3` (`v1`,`v2`,`v3`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

select * from test;
+----+----+-----+----+
| id | v1 | v2  | v3 |
+----+----+-----+----+
|  1 |  1 |   0 |  1 |
|  2 |  3 |   1 |  2 |
| 10 |  4 |  10 | 10 |
|  0 |  4 | 100 |  0 |
|  3 |  4 | 100 |  3 |
|  5 |  5 |   9 |  5 |
|  8 |  7 |   3 |  8 |
|  7 |  7 |   4 |  7 |
| 30 |  8 |  15 | 30 |
+----+----+-----+----+

 select version();
+-------------+
| version()   |
+-------------+
| 5.0.51b-log |
+-------------+

由此我们可以大致画出索引的结构:

image

下面说下我纠结的实验过程:

mysql> explain select max(v3) from test where v1>3 group by v1,v2; 
+----+-------------+-------+-------+---------------+----------+---------+------+------+---------------------------------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                                 |
+----+-------------+-------+-------+---------------+----------+---------+------+------+---------------------------------------+
|  1 | SIMPLE      | test  | range | v1_v2_v3      | v1_v2_v3 | 8       | NULL |    1 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+----------+---------+------+------+---------------------------------------+
1 row in set (0.00 sec)

一般来说,MySQL的索引扫描需要定义一个起点和终点,即使需要的数据只是这段索引中的几个,MySQL仍然需要扫描这段索引中的每一个条目,将它们返回给Sever层,Server层根据where条件将存储引擎返回的数据再进行一遍过滤。

但是根据官方文档9.2.1.16 GROUP BY Optimization的描述,这段sql会用到松散扫描索引,那么我有一点疑问:MySQL的loose index scan的工作原理究竟是怎样的呢?我有个猜想:

image

如上图,MySQL根据索引的前2列(v1,v2)来分组,此时先不读取v3列的值。MySQL扫描时发现(v1,v2)的分组值发生变化时,取上一个节点的v3值(因为是B-Tree,v3的值在相同的(v1,v2)中肯定是有序的,并且是从小到大)。这样的话MySQL就少扫描了一个v3的值。当(v1,v2)重复的越多,MySQL少扫描的v3列的次数越多。
如果MySQL全部读取的话,存储引擎需要将全部的数据返回给Server层,Server层还需要自己判断max(v3),莫不如在扫描索引的时候顺便读取max(v3)了。

当我马上就要说服自己的时候,突然发现Explain结果的rows值为1。难道说MySQL估算只扫描一行就能算出结果?这时,我通过如下命令来看MySQL是如何扫描索引的:

mysql> flush status;                         
Query OK, 0 rows affected (0.00 sec)

mysql> select max(v3) from test where v1>3 group by v1,v2;            
+---------+
| max(v3) |
+---------+
|      10 |
|       3 |
|       5 |
|       8 |
|       7 |
|      30 |
+---------+
6 rows in set (0.00 sec)

mysql>  show session status like 'Handler_%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 0     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 15    |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 14    |
+----------------------------+-------+
15 rows in set (0.00 sec)

结果令我很惊奇,我们重点观察这两行数据:

| Handler_read_key           | 15    |
| Handler_read_next          | 0     |

Handler_read_next为0(Handler_read_next的意思是按照索引叶子节点顺序读取下一个节点的次数。6.1.7 Server Status Variables)。说明MySQL根本没有按照我上面的意思顺序扫描v1>3的叶子节点,到此只有三种解释了:

  • 我猜想的MySQL松散扫描的方法不正确。
  • show session status like 'Handler_%'存在bug,没有计算出正确的值。
  • explain方法的rows列的估算方法存在bug,没有正确的估算出扫描的行数。

在bugs.mysql.com上我找到了有人存在跟我一样的疑惑:Manual does not provide enough details on how loose index scan really works

罢了,我换了个MySQL版本:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.14    |
+-----------+
1 row in set (0.00 sec)

执行同样的查询:

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> explain select max(v3) from test where v1>3 group by v1,v2\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test
   partitions: NULL
         type: range
possible_keys: v1_v2_v3
          key: v1_v2_v3
      key_len: 4
          ref: NULL
         rows: 7
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

mysql>  select max(v3) from test where v1>3 group by v1,v2;
+---------+
| max(v3) |
+---------+
|      10 |
|       3 |
|       5 |
|       8 |
|       7 |
|      30 |
+---------+
6 rows in set (0.00 sec)

mysql> show session status like 'Handler_%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 7     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
18 rows in set (0.00 sec)

我惊喜的发现,换了版本之后,explain输出的rows值正常了,为7。show session status like 'Handler_%';输出的值看起来也正常了:

| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 7     |

正当我以为是MySQL5.7修复了统计的bug时,我突然发现explain的Extra是这样的:

        Extra: Using where; Using index

等等,这是什么鬼,这说明MySQL在运行这条Sql时根本没有使用松散扫描索引,怪不得统计输出结果是正常的。

我在stackoverflow上关于loose index scan的提问:How MySQL implements the loose index scan

参考资料:

@zhangyachen zhangyachen changed the title 一次颠覆认知的sql 一个颠覆认知的sql Dec 14, 2016
@zhangyachen zhangyachen changed the title 一个颠覆认知的sql group by执行过程 Dec 19, 2016
@zhangyachen zhangyachen changed the title group by执行过程 关于mysql的loose index scan的几点疑问 Dec 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant