Skip to content

Commit 48996ca

Browse files
author
Davi Arnaut
committed
Bug#67718: InnoDB drastically under-fills pages in certain conditions
The problem arises from InnoDB's B+ tree page split algorithm that attempts to optimize for sequential inserts but might end up causing poor space utilization depending on the distribution pattern of index key values throughout the index. If an insert that causes a page to be split is inserting a key value that is an immediate successor or predecessor to the last inserted key value in the same page, the insertion point is used as the split point irrespective of the actual distribution of values in the page. For example, further inserts into the page with a key value that is greater than any key value already in the page, but lower than the minimum key value of its sibling page, will lead to a series of drastically under-filled pages. This page split behavior is especially a problem for workloads with a random distribution of inserts across the pages. Since any given insert into a page might be an immediate successor or predecessor to the last insert, a pattern of sequential inserts might be incorrectly assumed and lead to an unbalanced distribution of data through page splits. The solution is to use the standard B+ tree split algorithm while still preserving some form of optimization for sequential inserts. When a page needs to be split, the median key value in a page is used as the split point so that the data is distributed in a symmetric fashion. In order to improve space utilization for sequential inserts, if a pattern of sequential inserts is detected, the insert point might be used as the split point if the page is the right-most or left-most page in the tree and the key value being inserted is above the largest or below the smallest value in the index. A new variable named "innodb_index_page_split_mode" is introduced to provide a way to control the page split behavior. The variable accepts a set of permitted values: "symmetric", "lower" and "upper". Currently, "symmetric" is always required and will be added if not stated. If its value is empty, the original aforementioned behavior is used instead. Using "symmetric" alone, pages are always split roughly in the middle. When "symmetric,lower" or "symmetric,upper" are set, sequential inserts into the left-most or right-most page in the tree will cause the insertion point to be used as the split point. Using "symmetric,upper, lower" will cause sequential-insert behavior to split at the insertion point (as above) for both the left-most and right-most pages in the tree (or more precisely, at their level).
1 parent 88a6812 commit 48996ca

File tree

9 files changed

+627
-11
lines changed

9 files changed

+627
-11
lines changed
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
#
2+
# Bug#67718: InnoDB drastically under-fills pages in certain conditions
3+
#
4+
SET @old_innodb_file_per_table = @@GLOBAL.innodb_file_per_table;
5+
SET GLOBAL innodb_file_per_table = ON;
6+
SET @old_innodb_index_page_split_mode = @@GLOBAL.innodb_index_page_split_mode;
7+
SET GLOBAL innodb_index_page_split_mode = "symmetric,lower,upper";
8+
CREATE TABLE page_split_test
9+
(
10+
id BIGINT UNSIGNED NOT NULL,
11+
payload1 CHAR(255) NOT NULL,
12+
payload2 CHAR(255) NOT NULL,
13+
payload3 CHAR(255) NOT NULL,
14+
payload4 CHAR(255) NOT NULL,
15+
PRIMARY KEY (`id`)
16+
) ENGINE=INNODB;
17+
# Fill up the root page, but don't split it.
18+
INSERT INTO page_split_test VALUES (1, REPEAT("A", 255), REPEAT("B", 255),
19+
REPEAT("C", 255), REPEAT("D", 255));
20+
INSERT INTO page_split_test VALUES (2, REPEAT("A", 255), REPEAT("B", 255),
21+
REPEAT("C", 255), REPEAT("D", 255));
22+
INSERT INTO page_split_test VALUES (3, REPEAT("A", 255), REPEAT("B", 255),
23+
REPEAT("C", 255), REPEAT("D", 255));
24+
INSERT INTO page_split_test VALUES (4, REPEAT("A", 255), REPEAT("B", 255),
25+
REPEAT("C", 255), REPEAT("D", 255));
26+
INSERT INTO page_split_test VALUES (5, REPEAT("A", 255), REPEAT("B", 255),
27+
REPEAT("C", 255), REPEAT("D", 255));
28+
INSERT INTO page_split_test VALUES (6, REPEAT("A", 255), REPEAT("B", 255),
29+
REPEAT("C", 255), REPEAT("D", 255));
30+
INSERT INTO page_split_test VALUES (7, REPEAT("A", 255), REPEAT("B", 255),
31+
REPEAT("C", 255), REPEAT("D", 255));
32+
INSERT INTO page_split_test VALUES (8, REPEAT("A", 255), REPEAT("B", 255),
33+
REPEAT("C", 255), REPEAT("D", 255));
34+
INSERT INTO page_split_test VALUES (9, REPEAT("A", 255), REPEAT("B", 255),
35+
REPEAT("C", 255), REPEAT("D", 255));
36+
INSERT INTO page_split_test VALUES (10, REPEAT("A", 255), REPEAT("B", 255),
37+
REPEAT("C", 255), REPEAT("D", 255));
38+
INSERT INTO page_split_test VALUES (11, REPEAT("A", 255), REPEAT("B", 255),
39+
REPEAT("C", 255), REPEAT("D", 255));
40+
INSERT INTO page_split_test VALUES (12, REPEAT("A", 255), REPEAT("B", 255),
41+
REPEAT("C", 255), REPEAT("D", 255));
42+
INSERT INTO page_split_test VALUES (13, REPEAT("A", 255), REPEAT("B", 255),
43+
REPEAT("C", 255), REPEAT("D", 255));
44+
INSERT INTO page_split_test VALUES (14, REPEAT("A", 255), REPEAT("B", 255),
45+
REPEAT("C", 255), REPEAT("D", 255));
46+
# The next insert will trigger the root page to be split into a root
47+
# non-leaf page and two leaf pages, both about half full.
48+
INSERT INTO page_split_test VALUES (15, REPEAT("A", 255), REPEAT("B", 255),
49+
REPEAT("C", 255), REPEAT("D", 255));
50+
# Insert enough records to fill up the right-most leaf page, but don't
51+
# split it.
52+
INSERT INTO page_split_test VALUES (16, REPEAT("A", 255), REPEAT("B", 255),
53+
REPEAT("C", 255), REPEAT("D", 255));
54+
INSERT INTO page_split_test VALUES (17, REPEAT("A", 255), REPEAT("B", 255),
55+
REPEAT("C", 255), REPEAT("D", 255));
56+
INSERT INTO page_split_test VALUES (18, REPEAT("A", 255), REPEAT("B", 255),
57+
REPEAT("C", 255), REPEAT("D", 255));
58+
INSERT INTO page_split_test VALUES (19, REPEAT("A", 255), REPEAT("B", 255),
59+
REPEAT("C", 255), REPEAT("D", 255));
60+
INSERT INTO page_split_test VALUES (20, REPEAT("A", 255), REPEAT("B", 255),
61+
REPEAT("C", 255), REPEAT("D", 255));
62+
INSERT INTO page_split_test VALUES (21, REPEAT("A", 255), REPEAT("B", 255),
63+
REPEAT("C", 255), REPEAT("D", 255));
64+
# Increase the id by some amount, and insert a few records in descending
65+
# order from that point. Each row inserted would split the "full" page
66+
# from above, and create a new page for the single record.
67+
INSERT INTO page_split_test VALUES (30, REPEAT("A", 255), REPEAT("B", 255),
68+
REPEAT("C", 255), REPEAT("D", 255));
69+
INSERT INTO page_split_test VALUES (29, REPEAT("A", 255), REPEAT("B", 255),
70+
REPEAT("C", 255), REPEAT("D", 255));
71+
INSERT INTO page_split_test VALUES (28, REPEAT("A", 255), REPEAT("B", 255),
72+
REPEAT("C", 255), REPEAT("D", 255));
73+
# Dump the page information about the test table. The right-most page
74+
# has only one record as the record (key 30) was inserted in ascending
75+
# order. The remaining nodes are split in half and filled by the later
76+
# inserts.
77+
SELECT page_number, page_type, number_records
78+
FROM information_schema.innodb_buffer_page
79+
WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
80+
ORDER BY page_number;
81+
page_number page_type number_records
82+
3 INDEX 4
83+
4 INDEX 7
84+
5 INDEX 7
85+
6 INDEX 1
86+
7 INDEX 9
87+
DROP TABLE page_split_test;
88+
#
89+
# Test page split behavior when keys are being inserted in order.
90+
#
91+
CREATE TABLE page_split_test
92+
(
93+
id BIGINT UNSIGNED NOT NULL,
94+
payload CHAR(255) NOT NULL,
95+
PRIMARY KEY (`id`)
96+
) ENGINE=INNODB;
97+
# Populate table with 300 rows in ascending key order.
98+
99+
# Space utilization is the same across leaf pages, except at the end
100+
# since it wasn't completely filled up.
101+
SELECT page_number, page_type, number_records
102+
FROM information_schema.innodb_buffer_page
103+
WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
104+
ORDER BY page_number;
105+
page_number page_type number_records
106+
3 INDEX 7
107+
4 INDEX 27
108+
5 INDEX 54
109+
6 INDEX 54
110+
7 INDEX 54
111+
8 INDEX 54
112+
9 INDEX 54
113+
10 INDEX 3
114+
TRUNCATE TABLE page_split_test;
115+
# Populate table with 300 rows in descending key order.
116+
117+
# Space utilization is the same across leaf pages, except at the end
118+
# since it wasn't completely filled up.
119+
SELECT page_number, page_type, number_records
120+
FROM information_schema.innodb_buffer_page
121+
WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
122+
ORDER BY page_number;
123+
page_number page_type number_records
124+
3 INDEX 7
125+
4 INDEX 54
126+
5 INDEX 27
127+
6 INDEX 54
128+
7 INDEX 54
129+
8 INDEX 54
130+
9 INDEX 54
131+
10 INDEX 3
132+
DROP TABLE page_split_test;
133+
SET GLOBAL innodb_file_per_table = @old_innodb_file_per_table;
134+
SET GLOBAL innodb_index_page_split_mode = @old_innodb_index_page_split_mode;
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
--echo #
2+
--echo # Bug#67718: InnoDB drastically under-fills pages in certain conditions
3+
--echo #
4+
5+
source include/have_innodb.inc;
6+
7+
SET @old_innodb_file_per_table = @@GLOBAL.innodb_file_per_table;
8+
SET GLOBAL innodb_file_per_table = ON;
9+
10+
SET @old_innodb_index_page_split_mode = @@GLOBAL.innodb_index_page_split_mode;
11+
SET GLOBAL innodb_index_page_split_mode = "symmetric,lower,upper";
12+
13+
CREATE TABLE page_split_test
14+
(
15+
id BIGINT UNSIGNED NOT NULL,
16+
payload1 CHAR(255) NOT NULL,
17+
payload2 CHAR(255) NOT NULL,
18+
payload3 CHAR(255) NOT NULL,
19+
payload4 CHAR(255) NOT NULL,
20+
PRIMARY KEY (`id`)
21+
) ENGINE=INNODB;
22+
23+
--echo # Fill up the root page, but don't split it.
24+
let $i = 1;
25+
while ($i < 15)
26+
{
27+
eval
28+
INSERT INTO page_split_test VALUES ($i, REPEAT("A", 255), REPEAT("B", 255),
29+
REPEAT("C", 255), REPEAT("D", 255));
30+
inc $i;
31+
}
32+
33+
--echo # The next insert will trigger the root page to be split into a root
34+
--echo # non-leaf page and two leaf pages, both about half full.
35+
36+
INSERT INTO page_split_test VALUES (15, REPEAT("A", 255), REPEAT("B", 255),
37+
REPEAT("C", 255), REPEAT("D", 255));
38+
39+
--echo # Insert enough records to fill up the right-most leaf page, but don't
40+
--echo # split it.
41+
42+
INSERT INTO page_split_test VALUES (16, REPEAT("A", 255), REPEAT("B", 255),
43+
REPEAT("C", 255), REPEAT("D", 255));
44+
INSERT INTO page_split_test VALUES (17, REPEAT("A", 255), REPEAT("B", 255),
45+
REPEAT("C", 255), REPEAT("D", 255));
46+
INSERT INTO page_split_test VALUES (18, REPEAT("A", 255), REPEAT("B", 255),
47+
REPEAT("C", 255), REPEAT("D", 255));
48+
INSERT INTO page_split_test VALUES (19, REPEAT("A", 255), REPEAT("B", 255),
49+
REPEAT("C", 255), REPEAT("D", 255));
50+
INSERT INTO page_split_test VALUES (20, REPEAT("A", 255), REPEAT("B", 255),
51+
REPEAT("C", 255), REPEAT("D", 255));
52+
INSERT INTO page_split_test VALUES (21, REPEAT("A", 255), REPEAT("B", 255),
53+
REPEAT("C", 255), REPEAT("D", 255));
54+
55+
--echo # Increase the id by some amount, and insert a few records in descending
56+
--echo # order from that point. Each row inserted would split the "full" page
57+
--echo # from above, and create a new page for the single record.
58+
59+
INSERT INTO page_split_test VALUES (30, REPEAT("A", 255), REPEAT("B", 255),
60+
REPEAT("C", 255), REPEAT("D", 255));
61+
INSERT INTO page_split_test VALUES (29, REPEAT("A", 255), REPEAT("B", 255),
62+
REPEAT("C", 255), REPEAT("D", 255));
63+
INSERT INTO page_split_test VALUES (28, REPEAT("A", 255), REPEAT("B", 255),
64+
REPEAT("C", 255), REPEAT("D", 255));
65+
66+
--echo # Dump the page information about the test table. The right-most page
67+
--echo # has only one record as the record (key 30) was inserted in ascending
68+
--echo # order. The remaining nodes are split in half and filled by the later
69+
--echo # inserts.
70+
71+
SELECT page_number, page_type, number_records
72+
FROM information_schema.innodb_buffer_page
73+
WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
74+
ORDER BY page_number;
75+
76+
DROP TABLE page_split_test;
77+
78+
--echo #
79+
--echo # Test page split behavior when keys are being inserted in order.
80+
--echo #
81+
82+
CREATE TABLE page_split_test
83+
(
84+
id BIGINT UNSIGNED NOT NULL,
85+
payload CHAR(255) NOT NULL,
86+
PRIMARY KEY (`id`)
87+
) ENGINE=INNODB;
88+
89+
--echo # Populate table with 300 rows in ascending key order.
90+
disable_query_log;
91+
let $i = 0;
92+
while ($i < 300)
93+
{
94+
eval INSERT INTO page_split_test VALUES ($i, REPEAT("A", 255));
95+
inc $i;
96+
}
97+
enable_query_log;
98+
--echo
99+
100+
--echo # Space utilization is the same across leaf pages, except at the end
101+
--echo # since it wasn't completely filled up.
102+
SELECT page_number, page_type, number_records
103+
FROM information_schema.innodb_buffer_page
104+
WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
105+
ORDER BY page_number;
106+
107+
TRUNCATE TABLE page_split_test;
108+
109+
--echo # Populate table with 300 rows in descending key order.
110+
disable_query_log;
111+
let $i = 300;
112+
while ($i > 0)
113+
{
114+
eval INSERT INTO page_split_test VALUES ($i, REPEAT("A", 255));
115+
dec $i;
116+
}
117+
enable_query_log;
118+
--echo
119+
120+
--echo # Space utilization is the same across leaf pages, except at the end
121+
--echo # since it wasn't completely filled up.
122+
SELECT page_number, page_type, number_records
123+
FROM information_schema.innodb_buffer_page
124+
WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
125+
ORDER BY page_number;
126+
127+
DROP TABLE page_split_test;
128+
129+
SET GLOBAL innodb_file_per_table = @old_innodb_file_per_table;
130+
SET GLOBAL innodb_index_page_split_mode = @old_innodb_index_page_split_mode;
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
SET @old_innodb_index_page_split_mode = @@GLOBAL.innodb_index_page_split_mode;
2+
SELECT @old_innodb_index_page_split_mode;
3+
@old_innodb_index_page_split_mode
4+
5+
#
6+
# Default value.
7+
#
8+
SELECT @@GLOBAL.innodb_index_page_split_mode;
9+
@@GLOBAL.innodb_index_page_split_mode
10+
11+
SELECT @@SESSION.innodb_index_page_split_mode;
12+
@@SESSION.innodb_index_page_split_mode
13+
14+
SET GLOBAL innodb_index_page_split_mode = 'symmetric';
15+
SELECT @@GLOBAL.innodb_index_page_split_mode;
16+
@@GLOBAL.innodb_index_page_split_mode
17+
symmetric
18+
SET GLOBAL innodb_index_page_split_mode = 'lower';
19+
SELECT @@GLOBAL.innodb_index_page_split_mode;
20+
@@GLOBAL.innodb_index_page_split_mode
21+
symmetric,lower
22+
SET GLOBAL innodb_index_page_split_mode = 'upper';
23+
SELECT @@GLOBAL.innodb_index_page_split_mode;
24+
@@GLOBAL.innodb_index_page_split_mode
25+
symmetric,upper
26+
SET GLOBAL innodb_index_page_split_mode = 'lower,symmetric';
27+
SELECT @@GLOBAL.innodb_index_page_split_mode;
28+
@@GLOBAL.innodb_index_page_split_mode
29+
symmetric,lower
30+
SET GLOBAL innodb_index_page_split_mode = 'upper,symmetric';
31+
SELECT @@GLOBAL.innodb_index_page_split_mode;
32+
@@GLOBAL.innodb_index_page_split_mode
33+
symmetric,upper
34+
SET GLOBAL innodb_index_page_split_mode = 'upper,lower,symmetric';
35+
SELECT @@GLOBAL.innodb_index_page_split_mode;
36+
@@GLOBAL.innodb_index_page_split_mode
37+
symmetric,upper,lower
38+
#
39+
# Scope.
40+
#
41+
SET GLOBAL innodb_index_page_split_mode = 'symmetric';
42+
SET SESSION innodb_index_page_split_mode = 'lower';
43+
SELECT @@GLOBAL.innodb_index_page_split_mode;
44+
@@GLOBAL.innodb_index_page_split_mode
45+
symmetric
46+
SELECT @@SESSION.innodb_index_page_split_mode;
47+
@@SESSION.innodb_index_page_split_mode
48+
symmetric,lower
49+
SHOW GLOBAL VARIABLES LIKE 'innodb_index_page_split_mode';
50+
Variable_name Value
51+
innodb_index_page_split_mode symmetric
52+
SHOW SESSION VARIABLES LIKE 'innodb_index_page_split_mode';
53+
Variable_name Value
54+
innodb_index_page_split_mode symmetric,lower
55+
SELECT * FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES WHERE
56+
VARIABLE_NAME = 'innodb_index_page_split_mode';
57+
VARIABLE_NAME VARIABLE_VALUE
58+
INNODB_INDEX_PAGE_SPLIT_MODE symmetric
59+
SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE
60+
VARIABLE_NAME = 'innodb_index_page_split_mode';
61+
VARIABLE_NAME VARIABLE_VALUE
62+
INNODB_INDEX_PAGE_SPLIT_MODE symmetric,lower
63+
SET GLOBAL innodb_index_page_split_mode = 'symmetric';
64+
SET SESSION innodb_index_page_split_mode = 'lower';
65+
SELECT @@GLOBAL.innodb_index_page_split_mode;
66+
@@GLOBAL.innodb_index_page_split_mode
67+
symmetric
68+
SELECT @@SESSION.innodb_index_page_split_mode;
69+
@@SESSION.innodb_index_page_split_mode
70+
symmetric,lower
71+
SET GLOBAL innodb_index_page_split_mode = 'symmetric';
72+
SET SESSION innodb_index_page_split_mode = '';
73+
SELECT @@GLOBAL.innodb_index_page_split_mode;
74+
@@GLOBAL.innodb_index_page_split_mode
75+
symmetric
76+
SELECT @@SESSION.innodb_index_page_split_mode;
77+
@@SESSION.innodb_index_page_split_mode
78+
79+
SET GLOBAL innodb_index_page_split_mode = '';
80+
SET SESSION innodb_index_page_split_mode = 'lower';
81+
SELECT @@GLOBAL.innodb_index_page_split_mode;
82+
@@GLOBAL.innodb_index_page_split_mode
83+
84+
SELECT @@SESSION.innodb_index_page_split_mode;
85+
@@SESSION.innodb_index_page_split_mode
86+
symmetric,lower
87+
#
88+
# Invalid values.
89+
#
90+
SET GLOBAL innodb_index_page_split_mode = 1.1;
91+
ERROR 42000: Incorrect argument type to variable 'innodb_index_page_split_mode'
92+
SET GLOBAL innodb_index_page_split_mode = 1e1;
93+
ERROR 42000: Incorrect argument type to variable 'innodb_index_page_split_mode'
94+
SET GLOBAL innodb_index_page_split_mode = 'inexistent';
95+
ERROR 42000: Variable 'innodb_index_page_split_mode' can't be set to the value of 'inexistent'
96+
# Cleanup.
97+
SET @@GLOBAL.innodb_index_page_split_mode = @old_innodb_index_page_split_mode;
98+
SELECT @@GLOBAL.innodb_index_page_split_mode;
99+
@@GLOBAL.innodb_index_page_split_mode
100+

0 commit comments

Comments
 (0)