Summary:
The heuristic cost factors in base scans cost model affect the costs of various
operations such as seek and next operations and remote filtering overhead.
To tune these costs, we created a benchmark with several queries that
isolate the impact of different tuning parameters. For instance, we created
multiple queries on a table with varying selectivity of the filters to see the
time taken to transfer results of varying sizes and tuned the costs such that
they correlate with the execution times. Earlier, we tuned the costs by
taking the client-side execution times into consideration. This includes
the time taken to send the result to the client.
However, we realized that it is better to use server-side execution times
to tune the costs. If we pick the best plan using these costs, it should
automatically be the best plan for the client.
Moreover, PG costs also seemed to be tuned to server-side execution
times. This difference in approach seemed to cause a disparity between
the costs that YB specific cost models were assigning to the base scans,
versus the costs that PG cost models were assigning to operations such
as joins and sort. By tuning with client-side execution times, we were
able to achieve better plans in more complex queries which required PG
side processing.
Jira: DB-12619
Test Plan:
- TAQO runs show improved plan choices in benchmarks like Join Order Benchmark and TPC-H and other internal benchmarks
- ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressTAQO'
- ./yb_build.sh --java-test 'org.yb.pgsql.TestPgCostModelSeekNextEstimation'
- ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressJoin'
- ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressPlanner'
Reviewers: mihnea, mtakahara, telgersma
Reviewed By: mtakahara
Subscribers: tnayak, yql
Differential Revision: https://phorge.dev.yugabyte.com/D41734