Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Bug#67718: InnoDB drastically under-fills pages in certain conditions

The problem arises from InnoDB's B+ tree page split algorithm that
attempts to optimize for sequential inserts but might end up causing
poor space utilization depending on the distribution pattern of index
key values throughout the index.

If an insert that causes a page to be split is inserting a key value
that is an immediate successor or predecessor to the last inserted key
value in the same page, the insertion point is used as the split point
irrespective of the actual distribution of values in the page. For
example, further inserts into the page with a key value that is greater
than any key value already in the page, but lower than the minimum key
value of its sibling page, will lead to a series of drastically
under-filled pages.

This page split behavior is especially a problem for workloads with a
random distribution of inserts across the pages. Since any given insert
into a page might be an immediate successor or predecessor to the last
insert, a pattern of sequential inserts might be incorrectly assumed
and lead to an unbalanced distribution of data through page splits.

The solution is to use the standard B+ tree split algorithm while still
preserving some form of optimization for sequential inserts. When a
page needs to be split, the median key value in a page is used as the
split point so that the data is distributed in a symmetric fashion. In
order to improve space utilization for sequential inserts, if a pattern
of sequential inserts is detected, the insert point might be used as
the split point if the page is the right-most or left-most page in the
tree and the key value being inserted is above the largest or below the
smallest value in the index.

A new variable named "innodb_index_page_split_mode" is introduced to
provide a way to control the page split behavior. The variable accepts
a set of permitted values: "symmetric", "lower" and "upper". Currently,
"symmetric" is always required and will be added if not stated. If its
value is empty, the original aforementioned behavior is used instead.

Using "symmetric" alone, pages are always split roughly in the middle.
When "symmetric,lower" or "symmetric,upper" are set, sequential inserts
into the left-most or right-most page in the tree will cause the
insertion point to be used as the split point. Using "symmetric,upper,
lower" will cause sequential-insert behavior to split at the insertion
point (as above) for both the left-most and right-most pages in the
tree (or more precisely, at their level).
  • Loading branch information...
commit 48996cad34e0f2e11068f92f86e1eb7eda1d87fe 1 parent 88a6812
Davi Arnaut authored
View
134 mysql-test/suite/innodb/r/innodb_index_page_split_mode.result
@@ -0,0 +1,134 @@
+#
+# Bug#67718: InnoDB drastically under-fills pages in certain conditions
+#
+SET @old_innodb_file_per_table = @@GLOBAL.innodb_file_per_table;
+SET GLOBAL innodb_file_per_table = ON;
+SET @old_innodb_index_page_split_mode = @@GLOBAL.innodb_index_page_split_mode;
+SET GLOBAL innodb_index_page_split_mode = "symmetric,lower,upper";
+CREATE TABLE page_split_test
+(
+id BIGINT UNSIGNED NOT NULL,
+payload1 CHAR(255) NOT NULL,
+payload2 CHAR(255) NOT NULL,
+payload3 CHAR(255) NOT NULL,
+payload4 CHAR(255) NOT NULL,
+PRIMARY KEY (`id`)
+) ENGINE=INNODB;
+# Fill up the root page, but don't split it.
+INSERT INTO page_split_test VALUES (1, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (2, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (3, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (4, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (5, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (6, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (7, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (8, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (9, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (10, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (11, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (12, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (13, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (14, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+# The next insert will trigger the root page to be split into a root
+# non-leaf page and two leaf pages, both about half full.
+INSERT INTO page_split_test VALUES (15, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+# Insert enough records to fill up the right-most leaf page, but don't
+# split it.
+INSERT INTO page_split_test VALUES (16, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (17, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (18, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (19, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (20, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (21, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+# Increase the id by some amount, and insert a few records in descending
+# order from that point. Each row inserted would split the "full" page
+# from above, and create a new page for the single record.
+INSERT INTO page_split_test VALUES (30, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (29, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (28, REPEAT("A", 255), REPEAT("B", 255),
+REPEAT("C", 255), REPEAT("D", 255));
+# Dump the page information about the test table. The right-most page
+# has only one record as the record (key 30) was inserted in ascending
+# order. The remaining nodes are split in half and filled by the later
+# inserts.
+SELECT page_number, page_type, number_records
+FROM information_schema.innodb_buffer_page
+WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
+ ORDER BY page_number;
+page_number page_type number_records
+3 INDEX 4
+4 INDEX 7
+5 INDEX 7
+6 INDEX 1
+7 INDEX 9
+DROP TABLE page_split_test;
+#
+# Test page split behavior when keys are being inserted in order.
+#
+CREATE TABLE page_split_test
+(
+id BIGINT UNSIGNED NOT NULL,
+payload CHAR(255) NOT NULL,
+PRIMARY KEY (`id`)
+) ENGINE=INNODB;
+# Populate table with 300 rows in ascending key order.
+
+# Space utilization is the same across leaf pages, except at the end
+# since it wasn't completely filled up.
+SELECT page_number, page_type, number_records
+FROM information_schema.innodb_buffer_page
+WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
+ ORDER BY page_number;
+page_number page_type number_records
+3 INDEX 7
+4 INDEX 27
+5 INDEX 54
+6 INDEX 54
+7 INDEX 54
+8 INDEX 54
+9 INDEX 54
+10 INDEX 3
+TRUNCATE TABLE page_split_test;
+# Populate table with 300 rows in descending key order.
+
+# Space utilization is the same across leaf pages, except at the end
+# since it wasn't completely filled up.
+SELECT page_number, page_type, number_records
+FROM information_schema.innodb_buffer_page
+WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
+ ORDER BY page_number;
+page_number page_type number_records
+3 INDEX 7
+4 INDEX 54
+5 INDEX 27
+6 INDEX 54
+7 INDEX 54
+8 INDEX 54
+9 INDEX 54
+10 INDEX 3
+DROP TABLE page_split_test;
+SET GLOBAL innodb_file_per_table = @old_innodb_file_per_table;
+SET GLOBAL innodb_index_page_split_mode = @old_innodb_index_page_split_mode;
View
130 mysql-test/suite/innodb/t/innodb_index_page_split_mode.test
@@ -0,0 +1,130 @@
+--echo #
+--echo # Bug#67718: InnoDB drastically under-fills pages in certain conditions
+--echo #
+
+source include/have_innodb.inc;
+
+SET @old_innodb_file_per_table = @@GLOBAL.innodb_file_per_table;
+SET GLOBAL innodb_file_per_table = ON;
+
+SET @old_innodb_index_page_split_mode = @@GLOBAL.innodb_index_page_split_mode;
+SET GLOBAL innodb_index_page_split_mode = "symmetric,lower,upper";
+
+CREATE TABLE page_split_test
+(
+ id BIGINT UNSIGNED NOT NULL,
+ payload1 CHAR(255) NOT NULL,
+ payload2 CHAR(255) NOT NULL,
+ payload3 CHAR(255) NOT NULL,
+ payload4 CHAR(255) NOT NULL,
+ PRIMARY KEY (`id`)
+) ENGINE=INNODB;
+
+--echo # Fill up the root page, but don't split it.
+let $i = 1;
+while ($i < 15)
+{
+ eval
+ INSERT INTO page_split_test VALUES ($i, REPEAT("A", 255), REPEAT("B", 255),
+ REPEAT("C", 255), REPEAT("D", 255));
+ inc $i;
+}
+
+--echo # The next insert will trigger the root page to be split into a root
+--echo # non-leaf page and two leaf pages, both about half full.
+
+INSERT INTO page_split_test VALUES (15, REPEAT("A", 255), REPEAT("B", 255),
+ REPEAT("C", 255), REPEAT("D", 255));
+
+--echo # Insert enough records to fill up the right-most leaf page, but don't
+--echo # split it.
+
+INSERT INTO page_split_test VALUES (16, REPEAT("A", 255), REPEAT("B", 255),
+ REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (17, REPEAT("A", 255), REPEAT("B", 255),
+ REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (18, REPEAT("A", 255), REPEAT("B", 255),
+ REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (19, REPEAT("A", 255), REPEAT("B", 255),
+ REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (20, REPEAT("A", 255), REPEAT("B", 255),
+ REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (21, REPEAT("A", 255), REPEAT("B", 255),
+ REPEAT("C", 255), REPEAT("D", 255));
+
+--echo # Increase the id by some amount, and insert a few records in descending
+--echo # order from that point. Each row inserted would split the "full" page
+--echo # from above, and create a new page for the single record.
+
+INSERT INTO page_split_test VALUES (30, REPEAT("A", 255), REPEAT("B", 255),
+ REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (29, REPEAT("A", 255), REPEAT("B", 255),
+ REPEAT("C", 255), REPEAT("D", 255));
+INSERT INTO page_split_test VALUES (28, REPEAT("A", 255), REPEAT("B", 255),
+ REPEAT("C", 255), REPEAT("D", 255));
+
+--echo # Dump the page information about the test table. The right-most page
+--echo # has only one record as the record (key 30) was inserted in ascending
+--echo # order. The remaining nodes are split in half and filled by the later
+--echo # inserts.
+
+SELECT page_number, page_type, number_records
+ FROM information_schema.innodb_buffer_page
+ WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
+ ORDER BY page_number;
+
+DROP TABLE page_split_test;
+
+--echo #
+--echo # Test page split behavior when keys are being inserted in order.
+--echo #
+
+CREATE TABLE page_split_test
+(
+ id BIGINT UNSIGNED NOT NULL,
+ payload CHAR(255) NOT NULL,
+ PRIMARY KEY (`id`)
+) ENGINE=INNODB;
+
+--echo # Populate table with 300 rows in ascending key order.
+disable_query_log;
+let $i = 0;
+while ($i < 300)
+{
+ eval INSERT INTO page_split_test VALUES ($i, REPEAT("A", 255));
+ inc $i;
+}
+enable_query_log;
+--echo
+
+--echo # Space utilization is the same across leaf pages, except at the end
+--echo # since it wasn't completely filled up.
+SELECT page_number, page_type, number_records
+ FROM information_schema.innodb_buffer_page
+ WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
+ ORDER BY page_number;
+
+TRUNCATE TABLE page_split_test;
+
+--echo # Populate table with 300 rows in descending key order.
+disable_query_log;
+let $i = 300;
+while ($i > 0)
+{
+ eval INSERT INTO page_split_test VALUES ($i, REPEAT("A", 255));
+ dec $i;
+}
+enable_query_log;
+--echo
+
+--echo # Space utilization is the same across leaf pages, except at the end
+--echo # since it wasn't completely filled up.
+SELECT page_number, page_type, number_records
+ FROM information_schema.innodb_buffer_page
+ WHERE table_name = "test/page_split_test" AND index_name = "PRIMARY"
+ ORDER BY page_number;
+
+DROP TABLE page_split_test;
+
+SET GLOBAL innodb_file_per_table = @old_innodb_file_per_table;
+SET GLOBAL innodb_index_page_split_mode = @old_innodb_index_page_split_mode;
View
100 mysql-test/suite/sys_vars/r/innodb_index_page_split_mode_basic.result
@@ -0,0 +1,100 @@
+SET @old_innodb_index_page_split_mode = @@GLOBAL.innodb_index_page_split_mode;
+SELECT @old_innodb_index_page_split_mode;
+@old_innodb_index_page_split_mode
+
+#
+# Default value.
+#
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+
+SELECT @@SESSION.innodb_index_page_split_mode;
+@@SESSION.innodb_index_page_split_mode
+
+SET GLOBAL innodb_index_page_split_mode = 'symmetric';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+symmetric
+SET GLOBAL innodb_index_page_split_mode = 'lower';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+symmetric,lower
+SET GLOBAL innodb_index_page_split_mode = 'upper';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+symmetric,upper
+SET GLOBAL innodb_index_page_split_mode = 'lower,symmetric';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+symmetric,lower
+SET GLOBAL innodb_index_page_split_mode = 'upper,symmetric';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+symmetric,upper
+SET GLOBAL innodb_index_page_split_mode = 'upper,lower,symmetric';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+symmetric,upper,lower
+#
+# Scope.
+#
+SET GLOBAL innodb_index_page_split_mode = 'symmetric';
+SET SESSION innodb_index_page_split_mode = 'lower';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+symmetric
+SELECT @@SESSION.innodb_index_page_split_mode;
+@@SESSION.innodb_index_page_split_mode
+symmetric,lower
+SHOW GLOBAL VARIABLES LIKE 'innodb_index_page_split_mode';
+Variable_name Value
+innodb_index_page_split_mode symmetric
+SHOW SESSION VARIABLES LIKE 'innodb_index_page_split_mode';
+Variable_name Value
+innodb_index_page_split_mode symmetric,lower
+SELECT * FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES WHERE
+VARIABLE_NAME = 'innodb_index_page_split_mode';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_INDEX_PAGE_SPLIT_MODE symmetric
+SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE
+VARIABLE_NAME = 'innodb_index_page_split_mode';
+VARIABLE_NAME VARIABLE_VALUE
+INNODB_INDEX_PAGE_SPLIT_MODE symmetric,lower
+SET GLOBAL innodb_index_page_split_mode = 'symmetric';
+SET SESSION innodb_index_page_split_mode = 'lower';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+symmetric
+SELECT @@SESSION.innodb_index_page_split_mode;
+@@SESSION.innodb_index_page_split_mode
+symmetric,lower
+SET GLOBAL innodb_index_page_split_mode = 'symmetric';
+SET SESSION innodb_index_page_split_mode = '';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+symmetric
+SELECT @@SESSION.innodb_index_page_split_mode;
+@@SESSION.innodb_index_page_split_mode
+
+SET GLOBAL innodb_index_page_split_mode = '';
+SET SESSION innodb_index_page_split_mode = 'lower';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+
+SELECT @@SESSION.innodb_index_page_split_mode;
+@@SESSION.innodb_index_page_split_mode
+symmetric,lower
+#
+# Invalid values.
+#
+SET GLOBAL innodb_index_page_split_mode = 1.1;
+ERROR 42000: Incorrect argument type to variable 'innodb_index_page_split_mode'
+SET GLOBAL innodb_index_page_split_mode = 1e1;
+ERROR 42000: Incorrect argument type to variable 'innodb_index_page_split_mode'
+SET GLOBAL innodb_index_page_split_mode = 'inexistent';
+ERROR 42000: Variable 'innodb_index_page_split_mode' can't be set to the value of 'inexistent'
+# Cleanup.
+SET @@GLOBAL.innodb_index_page_split_mode = @old_innodb_index_page_split_mode;
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+@@GLOBAL.innodb_index_page_split_mode
+
View
77 mysql-test/suite/sys_vars/t/innodb_index_page_split_mode_basic.test
@@ -0,0 +1,77 @@
+--source include/have_innodb.inc
+
+SET @old_innodb_index_page_split_mode = @@GLOBAL.innodb_index_page_split_mode;
+SELECT @old_innodb_index_page_split_mode;
+
+--echo #
+--echo # Default value.
+--echo #
+
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+SELECT @@SESSION.innodb_index_page_split_mode;
+
+SET GLOBAL innodb_index_page_split_mode = 'symmetric';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+
+SET GLOBAL innodb_index_page_split_mode = 'lower';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+
+SET GLOBAL innodb_index_page_split_mode = 'upper';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+
+SET GLOBAL innodb_index_page_split_mode = 'lower,symmetric';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+
+SET GLOBAL innodb_index_page_split_mode = 'upper,symmetric';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+
+SET GLOBAL innodb_index_page_split_mode = 'upper,lower,symmetric';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+
+--echo #
+--echo # Scope.
+--echo #
+
+SET GLOBAL innodb_index_page_split_mode = 'symmetric';
+SET SESSION innodb_index_page_split_mode = 'lower';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+SELECT @@SESSION.innodb_index_page_split_mode;
+
+SHOW GLOBAL VARIABLES LIKE 'innodb_index_page_split_mode';
+SHOW SESSION VARIABLES LIKE 'innodb_index_page_split_mode';
+
+SELECT * FROM INFORMATION_SCHEMA.GLOBAL_VARIABLES WHERE
+ VARIABLE_NAME = 'innodb_index_page_split_mode';
+
+SELECT * FROM INFORMATION_SCHEMA.SESSION_VARIABLES WHERE
+ VARIABLE_NAME = 'innodb_index_page_split_mode';
+
+SET GLOBAL innodb_index_page_split_mode = 'symmetric';
+SET SESSION innodb_index_page_split_mode = 'lower';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+SELECT @@SESSION.innodb_index_page_split_mode;
+
+SET GLOBAL innodb_index_page_split_mode = 'symmetric';
+SET SESSION innodb_index_page_split_mode = '';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+SELECT @@SESSION.innodb_index_page_split_mode;
+
+SET GLOBAL innodb_index_page_split_mode = '';
+SET SESSION innodb_index_page_split_mode = 'lower';
+SELECT @@GLOBAL.innodb_index_page_split_mode;
+SELECT @@SESSION.innodb_index_page_split_mode;
+
+--echo #
+--echo # Invalid values.
+--echo #
+
+--error ER_WRONG_TYPE_FOR_VAR
+SET GLOBAL innodb_index_page_split_mode = 1.1;
+--error ER_WRONG_TYPE_FOR_VAR
+SET GLOBAL innodb_index_page_split_mode = 1e1;
+--error ER_WRONG_VALUE_FOR_VAR
+SET GLOBAL innodb_index_page_split_mode = 'inexistent';
+
+--echo # Cleanup.
+SET @@GLOBAL.innodb_index_page_split_mode = @old_innodb_index_page_split_mode;
+SELECT @@GLOBAL.innodb_index_page_split_mode;
View
61 storage/innobase/btr/btr0btr.c
@@ -1834,6 +1834,7 @@ btr_root_raise_and_insert(
of the inserted record */
const dtuple_t* tuple, /*!< in: tuple to insert */
ulint n_ext, /*!< in: number of externally stored columns */
+ ulint flags, /*!< in: page split flags */
mtr_t* mtr) /*!< in: mtr */
{
dict_index_t* index;
@@ -1987,7 +1988,7 @@ btr_root_raise_and_insert(
PAGE_CUR_LE, page_cursor);
/* Split the child and insert tuple */
- return(btr_page_split_and_insert(cursor, tuple, n_ext, mtr));
+ return(btr_page_split_and_insert(cursor, tuple, n_ext, flags, mtr));
}
/*************************************************************//**
@@ -1999,6 +2000,8 @@ ibool
btr_page_get_split_rec_to_left(
/*===========================*/
btr_cur_t* cursor, /*!< in: cursor at which to insert */
+ mtr_t* mtr, /*!< in: mtr */
+ ulint flags, /*!< in: page split flags */
rec_t** split_rec) /*!< out: if split recommended,
the first record on upper half page,
or NULL if tuple to be inserted should
@@ -2011,6 +2014,17 @@ btr_page_get_split_rec_to_left(
page = btr_cur_get_page(cursor);
insert_point = btr_cur_get_rec(cursor);
+ /* No asymmetric page splitting is allowed if the "symmetric" flag is
+ set, except if the "lower" bound flag is also set, in which case the
+ left-most page on its level can be split at the insertion point as an
+ optimization for inserts in descending index order. */
+ if (flags & BTR_PAGE_SPLIT_SYMMETRIC_FLAG) {
+ if (!((flags & BTR_PAGE_SPLIT_LOWER_FLAG)
+ && btr_page_get_prev(page, mtr) == FIL_NULL)) {
+ return(FALSE);
+ }
+ }
+
if (page_header_get_ptr(page, PAGE_LAST_INSERT)
== page_rec_get_next(insert_point)) {
@@ -2021,8 +2035,17 @@ btr_page_get_split_rec_to_left(
page. Otherwise, we could repeatedly move from page to page
lots of records smaller than the convergence point. */
- if (infimum != insert_point
- && page_rec_get_next(infimum) != insert_point) {
+ if (flags & BTR_PAGE_SPLIT_SYMMETRIC_FLAG) {
+ /* No asymmetric split if the "lower" bound flag is set
+ and the insert point is not the smallest value in the
+ page (and index). */
+ if (insert_point == infimum) {
+ *split_rec = page_rec_get_next(insert_point);
+ } else {
+ return(FALSE);
+ }
+ } else if (infimum != insert_point
+ && page_rec_get_next(infimum) != insert_point) {
*split_rec = insert_point;
} else {
@@ -2044,6 +2067,8 @@ ibool
btr_page_get_split_rec_to_right(
/*============================*/
btr_cur_t* cursor, /*!< in: cursor at which to insert */
+ mtr_t* mtr, /*!< in: mtr */
+ ulint flags, /*!< in: page split flags */
rec_t** split_rec) /*!< out: if split recommended,
the first record on upper half page,
or NULL if tuple to be inserted should
@@ -2055,6 +2080,17 @@ btr_page_get_split_rec_to_right(
page = btr_cur_get_page(cursor);
insert_point = btr_cur_get_rec(cursor);
+ /* No asymmetric page splitting is allowed if the "symmetric" flag is
+ set, except if the "upper" bound flag is also set, in which case the
+ right-most page on its level can be split at the insertion point as an
+ optimization for inserts in ascending index order. */
+ if (flags & BTR_PAGE_SPLIT_SYMMETRIC_FLAG) {
+ if (!((flags & BTR_PAGE_SPLIT_UPPER_FLAG)
+ && btr_page_get_next(page, mtr) == FIL_NULL)) {
+ return(FALSE);
+ }
+ }
+
/* We use eager heuristics: if the new insert would be right after
the previous insert on the same page, we assume that there is a
pattern of sequential inserts here. */
@@ -2066,11 +2102,17 @@ btr_page_get_split_rec_to_right(
next_rec = page_rec_get_next(insert_point);
+ /* A pattern of sequential inserts has been detected. Split
+ at the insert point if it is above the largest value in the
+ page. */
if (page_rec_is_supremum(next_rec)) {
split_at_new:
/* Split at the new record to insert */
*split_rec = NULL;
- } else {
+ } else if (!(flags & BTR_PAGE_SPLIT_UPPER_FLAG)) {
+ /* Asymmetrical splitting is allowed. If the insert point
+ is right next to the largest value in the page, split at
+ the insert point. */
rec_t* next_next_rec = page_rec_get_next(next_rec);
if (page_rec_is_supremum(next_next_rec)) {
@@ -2085,6 +2127,10 @@ btr_page_get_split_rec_to_right(
page. */
*split_rec = next_next_rec;
+ } else {
+ /* An insert pattern has been detected but the insert
+ point is not next to the largest value in the page. */
+ return(FALSE);
}
return(TRUE);
@@ -2533,6 +2579,7 @@ btr_page_split_and_insert(
on the predecessor of the inserted record */
const dtuple_t* tuple, /*!< in: tuple to insert */
ulint n_ext, /*!< in: number of externally stored columns */
+ ulint flags, /*!< in: page split flags */
mtr_t* mtr) /*!< in: mtr */
{
buf_block_t* block;
@@ -2597,11 +2644,13 @@ btr_page_split_and_insert(
insert_left = btr_page_tuple_smaller(
cursor, tuple, offsets, n_uniq, &heap);
}
- } else if (btr_page_get_split_rec_to_right(cursor, &split_rec)) {
+ } else if (btr_page_get_split_rec_to_right(cursor, mtr, flags,
+ &split_rec)) {
direction = FSP_UP;
hint_page_no = page_no + 1;
- } else if (btr_page_get_split_rec_to_left(cursor, &split_rec)) {
+ } else if (btr_page_get_split_rec_to_left(cursor, mtr, flags,
+ &split_rec)) {
direction = FSP_DOWN;
hint_page_no = page_no - 1;
ut_ad(split_rec);
View
47 storage/innobase/btr/btr0cur.c
@@ -66,6 +66,7 @@ Created 10/16/1994 Heikki Tuuri
#include "srv0srv.h"
#include "ibuf0ibuf.h"
#include "lock0lock.h"
+#include "ha_prototypes.h"
#include "zlib.h"
/** Buffered B-tree operation types, introduced as part of delete buffering. */
@@ -1149,6 +1150,39 @@ btr_cur_trx_report(
#endif /* UNIV_DEBUG */
/*************************************************************//**
+Get the index page split flags of a query thread. If no query
+thread is given, the global index page split flags are returned.
+@return the page split flags */
+static
+ulint
+get_page_split_flags(
+/*=================*/
+ que_thr_t* thr) /*!< in: query thread or NULL */
+{
+ trx_t* trx = thr ? thr_get_trx(thr) : NULL;
+
+ return(thd_index_page_split(trx ? trx->mysql_thd : NULL));
+}
+
+/*************************************************************//**
+Decides if the page should be split.
+@return TRUE if split recommended */
+static
+ibool
+btr_page_should_split(
+/*==================*/
+ btr_cur_t* cursor, /*!< in: cursor on page after which to insert;
+ cursor stays valid */
+ ulint flags, /*!< in: page split flags */
+ mtr_t* mtr) /*!< in: mtr */
+{
+ rec_t* rec;
+
+ return(btr_page_get_split_rec_to_right(cursor, mtr, flags, &rec)
+ || btr_page_get_split_rec_to_left(cursor, mtr, flags, &rec));
+}
+
+/*************************************************************//**
Tries to perform an insert to a page in an index tree, next to cursor.
It is assumed that mtr holds an x-latch on the page. The operation does
not succeed if there is too little space on the page. If there is just
@@ -1184,7 +1218,6 @@ btr_cur_optimistic_insert(
buf_block_t* block;
page_t* page;
ulint max_size;
- rec_t* dummy_rec;
ibool leaf;
ibool reorg;
ibool inherit;
@@ -1281,8 +1314,7 @@ btr_cur_optimistic_insert(
&& (page_get_n_recs(page) >= 2)
&& UNIV_LIKELY(leaf)
&& (dict_index_get_space_reserve() + rec_size > max_size)
- && (btr_page_get_split_rec_to_right(cursor, &dummy_rec)
- || btr_page_get_split_rec_to_left(cursor, &dummy_rec))) {
+ && btr_page_should_split(cursor, get_page_split_flags(thr), mtr)) {
fail:
err = DB_FAIL;
fail_err:
@@ -1453,6 +1485,7 @@ btr_cur_pessimistic_insert(
ibool success;
ulint n_extents = 0;
ulint n_reserved;
+ ulint split_flags;
ut_ad(dtuple_check_typed(entry));
@@ -1516,13 +1549,17 @@ btr_cur_pessimistic_insert(
}
}
+ split_flags = get_page_split_flags(thr);
+
if (dict_index_get_page(index)
== buf_block_get_page_no(btr_cur_get_block(cursor))) {
/* The page is the root page */
- *rec = btr_root_raise_and_insert(cursor, entry, n_ext, mtr);
+ *rec = btr_root_raise_and_insert(cursor, entry, n_ext,
+ split_flags, mtr);
} else {
- *rec = btr_page_split_and_insert(cursor, entry, n_ext, mtr);
+ *rec = btr_page_split_and_insert(cursor, entry, n_ext,
+ split_flags, mtr);
}
if (UNIV_LIKELY_NULL(heap)) {
View
60 storage/innobase/handler/ha_innodb.cc
@@ -195,6 +195,21 @@ static TYPELIB innodb_stats_method_typelib = {
NULL
};
+/** List of values for system variable "innodb_index_page_split_mode". */
+static const char* innodb_index_page_split_mode_names[] = {
+ "symmetric",
+ "upper",
+ "lower",
+ NULL
+};
+
+static TYPELIB innodb_index_page_split_mode_typelib = {
+ array_elements(innodb_index_page_split_mode_names) - 1,
+ "innodb_index_page_split_mode_typelib",
+ innodb_index_page_split_mode_names,
+ NULL
+};
+
/* The following counter is used to convey information to InnoDB
about server activity: in selects it is not sensible to call
srv_active_wake_master_thread after each fetch or search, we only do
@@ -441,6 +456,32 @@ innobase_commit_concurrency_validate(
DBUG_RETURN(!(!commit_concurrency == !innobase_commit_concurrency));
}
+/****************************************************************//**
+Expand and update the system variable index_page_split using the
+"saved" value. */
+static
+void
+innodb_index_page_split_mode_update(
+/*================================*/
+ THD* thd, /*!< in: thread handle */
+ struct st_mysql_sys_var* var, /*!< in: pointer to
+ system variable */
+ void* var_ptr, /*!< out: where the
+ formal value goes */
+ const void* save) /*!< in: immediate result
+ from check function */
+{
+ ulonglong flags;
+
+ flags = *(ulonglong *)(save);
+
+ if (flags & (BTR_PAGE_SPLIT_UPPER_FLAG | BTR_PAGE_SPLIT_LOWER_FLAG)) {
+ flags |= BTR_PAGE_SPLIT_SYMMETRIC_FLAG;
+ }
+
+ *(ulonglong *)(var_ptr) = flags;
+}
+
static MYSQL_THDVAR_BOOL(support_xa, PLUGIN_VAR_OPCMDARG,
"Enable InnoDB support for the XA two-phase commit",
/* check_func */ NULL, /* update_func */ NULL,
@@ -459,6 +500,9 @@ static MYSQL_THDVAR_ULONG(lock_wait_timeout, PLUGIN_VAR_RQCMDARG,
"Timeout in seconds an InnoDB transaction may wait for a lock before being rolled back. Values above 100000000 disable the timeout.",
NULL, NULL, 50, 1, 1024 * 1024 * 1024, 0);
+static MYSQL_THDVAR_SET(index_page_split_mode, PLUGIN_VAR_RQCMDARG,
+ "Index page split behavior.", NULL, innodb_index_page_split_mode_update,
+ 0, &innodb_index_page_split_mode_typelib);
static handler *innobase_create_handler(handlerton *hton,
TABLE_SHARE *table,
@@ -900,6 +944,21 @@ thd_lock_wait_timeout(
}
/******************************************************************//**
+Get the set of flags specified in innodb_index_page_split_mode.
+@return set of flags that are set */
+extern "C" UNIV_INTERN
+ulonglong
+thd_index_page_split(
+/*=================*/
+ void* thd) /*!< in: thread handle (THD*), or NULL to query
+ the global innodb_index_page_split_mode */
+{
+ /* According to <mysql/plugin.h>, passing thd == NULL
+ returns the global value of the session variable. */
+ return(THDVAR((THD*) thd, index_page_split_mode));
+}
+
+/******************************************************************//**
Set the time waited for the lock for the current query. */
extern "C" UNIV_INTERN
void
@@ -12045,6 +12104,7 @@ static struct st_mysql_sys_var* innobase_system_variables[]= {
MYSQL_SYSVAR(anticipatory_flushing),
MYSQL_SYSVAR(segment_fill_factor),
MYSQL_SYSVAR(index_fill_factor),
+ MYSQL_SYSVAR(index_page_split_mode),
NULL
};
View
19 storage/innobase/include/btr0btr.h
@@ -92,6 +92,19 @@ insert/delete buffer when the record is not in the buffer pool. */
buffer when the record is not in the buffer pool. */
#define BTR_DELETE 8192
+/* Flags for page split operations. */
+
+/** When splitting a page, split the data in half (at the middle record). */
+#define BTR_PAGE_SPLIT_SYMMETRIC_FLAG 1
+
+/** When splitting the right-most page on its level, split at the insertion
+point if it's above the largest value in the page. */
+#define BTR_PAGE_SPLIT_UPPER_FLAG 2
+
+/** When splitting the left-most page on its level, split at the insertion
+point if it's below the smallest value in the page. */
+#define BTR_PAGE_SPLIT_LOWER_FLAG 4
+
#endif /* UNIV_HOTBACKUP */
/**************************************************************//**
@@ -404,6 +417,7 @@ btr_root_raise_and_insert(
of the inserted record */
const dtuple_t* tuple, /*!< in: tuple to insert */
ulint n_ext, /*!< in: number of externally stored columns */
+ ulint flags, /*!< in: page split flags */
mtr_t* mtr); /*!< in: mtr */
/*************************************************************//**
Reorganizes an index page.
@@ -428,6 +442,8 @@ ibool
btr_page_get_split_rec_to_left(
/*===========================*/
btr_cur_t* cursor, /*!< in: cursor at which to insert */
+ mtr_t* mtr, /*!< in: mtr */
+ ulint flags, /*!< in: page split flags */
rec_t** split_rec);/*!< out: if split recommended,
the first record on upper half page,
or NULL if tuple should be first */
@@ -440,6 +456,8 @@ ibool
btr_page_get_split_rec_to_right(
/*============================*/
btr_cur_t* cursor, /*!< in: cursor at which to insert */
+ mtr_t* mtr, /*!< in: mtr */
+ ulint flags, /*!< in: page split flags */
rec_t** split_rec);/*!< out: if split recommended,
the first record on upper half page,
or NULL if tuple should be first */
@@ -461,6 +479,7 @@ btr_page_split_and_insert(
on the predecessor of the inserted record */
const dtuple_t* tuple, /*!< in: tuple to insert */
ulint n_ext, /*!< in: number of externally stored columns */
+ ulint flags, /*!< in: page split flags */
mtr_t* mtr); /*!< in: mtr */
/*******************************************************//**
Inserts a data tuple to a tree on a non-leaf level. It is assumed
View
10 storage/innobase/include/ha_prototypes.h
@@ -276,6 +276,16 @@ thd_lock_wait_timeout(
/*==================*/
void* thd); /*!< in: thread handle (THD*), or NULL to query
the global innodb_lock_wait_timeout */
+
+/******************************************************************//**
+Get the set of flags specified in innodb_index_page_split_mode.
+@return set of flags that are set */
+ulonglong
+thd_index_page_split(
+/*=================*/
+ void* thd); /*!< in: thread handle (THD*), or NULL to query
+ the global innodb_index_page_split_mode */
+
/******************************************************************//**
Add up the time waited for the lock for the current query. */
UNIV_INTERN
Please sign in to comment.
Something went wrong with that request. Please try again.