Skip to content

Commit 1791ece

Browse files
author
Tobias Christiani
committed
WL#16107 Hypergraph optimizer core cost model
The hypergraph optimizer was developed without a clearly defined cost model. A mixture of the of the old optimizer's cost model and new uncalibrated cost constants has been used in an ad-hoc way to assign costs to different access paths during optimization. Some of the problems with the current cost model are: 1) There is no well-defined cost unit, i.e., it is unclear whether a cost of 1.0 is intended to represent a running time of one millisecond, one IO block transfer, or something else. 2) The cost model assigns cost to different operations in an inconsistent manner. For example, the current cost model for table scans is based entirely on the number of pages in the table, while the cost model for index range scans includes a cost for each processed row. This leads to inconsistent costs that do not reflect actual running times. 3) The cost of different operations has not been calibrated to reflect actual running times. Most cost constants such as the cost of comparisons during sorting, the cost of hashing rows during hash joins, and the cost of filtering rows has been assigned the same constant cost value. This patch lays the groundwork for changing the hypergraph cost model and includes the following: 1) A well-defined cost unit that reflects running time. 2) A consistent cost model for base table access paths: TABLE_SCAN, INDEX_SCAN, INDEX_RANGE_SCAN, REF, and EQ_REF. 3) Additional calibration of the cost models for SORT, HASH_JOIN and FILTER access paths. Due to the scope and complexity of the cost model this patch only represents a partial move to the new cost model in the sense that some access paths and parts of the hypergraph optimizer still rely on the old cost model. The cost constants in the old model have been adjusted for the hypergraph optimizer in an attempt to make the two models somewhat compatible. We currently lack benchmarks that are comprehensive (cover all types of queries and optimizations) and benchmarks that are representative of how customers use MySQL which we could use to evaluate changes to the cost model. Here is an overview of current benchmark results: - TPC-H: No regressions, improvements to a few queries. - TPC-DS: A few regressions and a few improvements. These queries are very complex so the combination of row estimation errors, bugs in the hypergraph optimizer, and the fact that the cost model is only partially updated all factor in. - Cost benchmark made for this patch: Mostly improvements, a few regressions due to e.g. bugs in how the range optimizer produces row estimates. This cost model patch has been constructed for the InnoDB storage engine, and has only been calibrated for a data that resides in the buffer pool (main memory). We leave it as future work to extend the cost model to support multiple storage engines and different IO costs. Making the cost model robust to the various states of optimizer statistics (and the differences in statistics between storage engines) also remains. Change-Id: Ic4ab2dc5b3c08c225256bc10657560f380d7ca44
1 parent 224a648 commit 1791ece

File tree

190 files changed

+7356
-5113
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

190 files changed

+7356
-5113
lines changed

mysql-test/include/common-tests.inc

+4
Original file line numberDiff line numberDiff line change
@@ -1364,9 +1364,11 @@ select distinct fld3 from t2 limit 10;
13641364
select distinct fld3 from t2 having fld3 like "A%" limit 10;
13651365
--replace_result abr Abr art Art
13661366
select distinct substring(fld3,1,3) from t2 where fld3 like "A%";
1367+
--replace_result abr Abr art Art
13671368
select distinct substring(fld3,1,3) as a from t2 having a like "A%" order by a limit 10;
13681369
--replace_result abr Abr
13691370
select distinct substring(fld3,1,3) from t2 where fld3 like "A%" limit 10;
1371+
--replace_result abr Abr
13701372
select distinct substring(fld3,1,3) as a from t2 having a like "A%" limit 10;
13711373

13721374
# make a big table.
@@ -1718,6 +1720,7 @@ select companynr,sum(price)/count(price) as avg from t3 group by companynr havin
17181720

17191721
select companynr,count(*) from t2 group by companynr order by 2 desc;
17201722
select companynr,count(*) from t2 where companynr > 40 group by companynr order by 2 desc;
1723+
--sorted_result
17211724
select t2.fld4,t2.fld1,count(price),sum(price),min(price),max(price),avg(price) from t3,t2 where t3.companynr = 37 and t2.fld1 = t3.t2nr group by fld1,t2.fld4;
17221725

17231726
#
@@ -1728,6 +1731,7 @@ select t2.fld4,t2.fld1,count(price),sum(price),min(price),max(price),avg(price)
17281731
# send rows
17291732
#
17301733

1734+
--sorted_result
17311735
select t3.companynr,fld3,sum(price) from t3,t2 where t2.fld1 = t3.t2nr and t3.companynr = 512 group by companynr,fld3;
17321736
select t2.companynr,count(*),min(fld3),max(fld3),sum(price),avg(price) from t2,t3 where t3.companynr >= 30 and t3.companynr <= 58 and t3.t2nr = t2.fld1 and 1+1=2 group by t2.companynr;
17331737

mysql-test/include/elide_costs.inc

+11-4
Original file line numberDiff line numberDiff line change
@@ -59,10 +59,17 @@ let $elide_trace_costs_and_rows=$elide_trace_costs_and_rows /rows_for_plan\": [0
5959
let $elide_trace_costs_and_rows=$elide_trace_costs_and_rows /rows_to_scan\": [0-9.]+/rows_to_scan\": "elided"/;
6060
let $elide_trace_costs_and_rows=$elide_trace_costs_and_rows /num_rows_estimate\": [0-9.]+/num_rows_estimate\": "elided"/;
6161

62-
let $elide_json_costs=/cost": "[0-9.]*"/cost": "elided"/;
63-
64-
# Usage: --replace_regex $elide_json_ms
62+
# Usage: --replace_regex $elide_json_time
6563
# Remove actual execution times from EXPLAIN ANALYZE FORMAT=JSON
6664
# "actual_first_row_ms": 0.328761 -> "actual_first_row_ms": "elided"
6765
# "actual_last_row_ms": 0.328761 -> "actual_last_row_ms": "elided"
68-
let $elide_json_ms=/row_ms": [0-9.]*/row_ms": "elided"/;
66+
let $elide_json_time=/row_ms": [0-9.]*/row_ms": "elided"/;
67+
68+
# Usage: --replace-regex $elide_json_costs
69+
# This removes costs from EXPLAIN FORMAT=JSON for both optimizers.
70+
# Original optimizer: "read_cost": "0.25" -> read_cost: "elided"
71+
# Hypergraph optimizer: "estimated_total_cost": 0.25 -> estimated_total_cost: "elided"
72+
let $elide_json_costs=/cost": "?[0-9.]*"?/cost": "elided"/;
73+
74+
# Usage: --replace-regex $elide_json_costs_and_time
75+
let $elide_json_costs_and_time=/cost": "?[0-9.]*"?/cost": "elided"/ /row_ms": [0-9.]*/row_ms": "elided"/;

mysql-test/include/explain_into.inc

+4-1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
--echo # WL#15588 Store EXPLAIN FORMAT=JSON SELECT output in user variable.
55
--echo #
66

7+
--source include/elide_costs.inc
8+
79
SET @v1 = 'UNCHANGED';
810
SET @v2 = @v1;
911

@@ -41,6 +43,7 @@ ANALYZE TABLE t1, t2;
4143
--echo
4244
--echo # EXPLAIN SELECT.
4345
EXPLAIN FORMAT=JSON INTO @v1 SELECT * FROM t1 JOIN t2 ON i1 = i3 WHERE i2 = 2;
46+
--replace_regex $elide_json_costs
4447
SELECT @v1, JSON_VALID(@v1);
4548

4649
--echo
@@ -111,7 +114,7 @@ SET explain_json_format_version=2;
111114
--echo
112115
--echo # EXPLAIN ANALYZE SELECT.
113116
EXPLAIN ANALYZE FORMAT=JSON INTO @v1 SELECT * FROM t1 JOIN t2 ON i1 = i3 WHERE i2 = 2;
114-
--replace_regex $elide_json_ms
117+
--replace_regex $elide_json_costs_and_time
115118
SELECT @v1, JSON_VALID(@v1);
116119

117120
--echo

mysql-test/include/explain_json.inc

+1
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
--source include/force_myisam_default.inc
55
--source include/have_myisam.inc
6+
--source include/elide_costs.inc
67

78
set end_markers_in_json=on;
89

mysql-test/include/group_skip_scan_test.inc

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
--source include/elide_costs.inc
2+
13
#
24
# Test file for WL#1724 (Min/Max Optimization for Queries with Group By Clause).
35
# The queries in this file test query execution via GroupIndexSkipScan.

0 commit comments

Comments
 (0)