Skip to content

Conversation

@alexpaniman
Copy link
Contributor

@alexpaniman alexpaniman commented Oct 17, 2025

Description for reviewers

In this PR a few things are added:

  • Framework for generating different join topologies
    • Includes generators for many different topologies
      • Path topology
      • Clique topology
      • Star topology
      • Random trees based on random Prufer sequences
      • Random graphs for given lognormal distribution of degrees using Chung Lu model (the only ones which can be disconnected in this list)
      • Random graphs for given lognormal distribution of degrees using Havel Hakimi graph reconstruction + Metropolis-Hastings Markov chain Monte Carlo with annealing to make graph connected with high probability
    • Tools to reconstruct queries from any table relationship graph where it's possible
      • Statistics generation
      • Graph renaming and connected subgraphs-first reordering
      • Random join key assignment based on two-parameter Pitman-Yor distribution
      • Query reconstruction
  • Tool for benchmarking and a bit of statistical processing inside test benches
    • Wrapper for measuring elapsed time
    • Configurable benchmarking tool with warmup phase, configurable min/max repeats and timeouts for both phases + single run timeouts and support for selecting desired median absolute deviation of resulting time
    • Statistical tool, which allows to report information about a given distribution and is primarily used to get robust time measurements.
      • Efficiently recalculating median & MAD on the fly to use for early exiting benchmarks
      • Calculating median, Q1, Q3, IQR, MAD, mean, stdev, min, max of a given distribution
      • Operations on random values, including map/filter, and +, -, /, * operations
  • Small argument parser to use for unit test configuration
    • Parsing arguments of different types: ints, doubles and strings
    • Support for range specifications, e.g. N=5,6..15 or alpha=0.1,0.3..0.7
  • Mersenne twister wrapper to simplify serialization/deserialization (replaces large mersenne twister state with seed+counter)
  • Flexible unit test
    • CLI-based configuration of topology used
    • Dumping results to CSV, calculating time for CBO with/without Shuffle Elimination
    • Calculating adjusted CBO time by substracting time without optimization, calculating ratio of time took by CBO with Shuffle Elimination / without Shuffle Elimination
  • Option to change when ShuffleElimination gets disabled to test it on topologies larger than the default cutoff
  • Fix PRAGMA naming inside a note about DPhypTableSize

Implemented for issue #25795

Example usage (for more info read options in kqp_join_topology_ut.cpp):

cd ./ydb/core/kqp/ut/join
ya make -r

# Runs benchmark on star topology, dumps to CSV (CBO time with Shuffle Elimination - query time on 0 level of optimizaion) / (CBO time without Shuffle Elimination - query time on 0 level of optimization) and all computation intermediate results and saves in results directory. Tests will run for all N from 5 to 15

./ydb-core-kqp-ut-join KqpJoinTopology::Benchmark --test-param TOPOLOGY='type=star; N=5..15; result=SE-0/CBO-0' --test-param SAVE_DIR=results 2>/dev/null

# Runs benchmark on graphs with given N and degrees distributed according to lognormal distribution with parameters (mu, sigma) and keys distributed according to Pitman Yor distribution (theta, alpha)

./ydb-core-kqp-ut-join KqpJoinTopology::Benchmark --test-param TOPOLOGY='type=mcmc; N=5..15; mu=1; sigma=0.5; theta=5.0; alpha=0.5; result=SE-0/CBO-0' --test-param SAVE_DIR=results 2>/dev/null

# Runs benchmarks for clique with keys distributed according to Pitman Yor distribution (theta, alpha). This will try different theta and alpha and iterate N until that particular test (for fixed theta and alpha) finishes or timeouts and then try next (theta, alpha) pair until all combinations are checked.

./ydb-core-kqp-ut-join KqpJoinTopology::Benchmark --test-param TOPOLOGY='type=clique; N=5..15; theta=1,5.0..50.0; alpha=0.1,0.3..1.0; result=SE-0/CBO-0' --test-param SAVE_DIR=results 2>/dev/null

# If you like, you can omit save directory. If you omit result option it will by default only run CBO with Shuffle Elimination.

./ydb-core-kqp-ut-join KqpJoinTopology::Benchmark --test-param TOPOLOGY='type=path; N=5..50' 2>/dev/null

# The tool includes a way to reproduce certain topologies, exact states and commands are printed to stdout before every test. Example reproduction: 
./ydb-core-kqp-ut-join KqpJoinTopology::Benchmark --test-param TOPOLOGY='type=mcmc; N=15; alpha=0.5; theta=1; sigma=0.5; mu=1; state=214cd3980000000f0000000f00000970' 2>/dev/null

@alexpaniman alexpaniman requested review from a team as code owners October 17, 2025 09:01
@github-actions
Copy link

github-actions bot commented Oct 17, 2025

🟢 2025-11-13 01:06:20 UTC The validation of the Pull Request description is successful.

auto relsCount = joinTree->Labels().size();

if (EnableShuffleElimination && relsCount <= 14) {
if (EnableShuffleElimination && (relsCount <= 14 || OptimizerSettings_.ForceShuffleElimination)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move this constant to dq settings, similar to maxDPhypDPTableSize

@alexpaniman alexpaniman force-pushed the feature-random-join-topologies-testing branch from fb1df4f to 5ee5da7 Compare October 30, 2025 05:37
@github-actions
Copy link

github-actions bot commented Oct 30, 2025

2025-10-30 05:41:17 UTC Pre-commit check linux-x86_64-relwithdebinfo for 10d2edb has started.
2025-10-30 05:41:34 UTC Artifacts will be uploaded here
2025-10-30 05:42:59 UTC ya make is running...
🔴 2025-10-30 05:51:14 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Oct 30, 2025

2025-10-30 05:41:31 UTC Pre-commit check linux-x86_64-release-asan for 10d2edb has started.
2025-10-30 05:41:48 UTC Artifacts will be uploaded here
2025-10-30 05:43:11 UTC ya make is running...
🔴 2025-10-30 05:53:55 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Oct 30, 2025

2025-10-30 07:35:17 UTC Pre-commit check linux-x86_64-relwithdebinfo for 82dac26 has started.
2025-10-30 07:35:34 UTC Artifacts will be uploaded here
2025-10-30 07:37:02 UTC ya make is running...
🔴 2025-10-30 07:40:49 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Oct 30, 2025

2025-10-30 07:35:54 UTC Pre-commit check linux-x86_64-release-asan for 82dac26 has started.
2025-10-30 07:36:10 UTC Artifacts will be uploaded here
2025-10-30 07:37:37 UTC ya make is running...
🔴 2025-10-30 07:43:15 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Oct 30, 2025

2025-10-30 09:18:08 UTC Pre-commit check linux-x86_64-release-asan for 5df6154 has started.
2025-10-30 09:18:25 UTC Artifacts will be uploaded here
2025-10-30 09:19:46 UTC ya make is running...
🔴 2025-10-30 09:25:51 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Oct 30, 2025

2025-10-30 09:18:15 UTC Pre-commit check linux-x86_64-relwithdebinfo for 5df6154 has started.
2025-10-30 09:18:33 UTC Artifacts will be uploaded here
2025-10-30 09:19:58 UTC ya make is running...
🔴 2025-10-30 09:23:53 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Oct 30, 2025

2025-10-30 14:08:51 UTC Pre-commit check linux-x86_64-release-asan for 026bc5c has started.
2025-10-30 14:09:13 UTC Artifacts will be uploaded here
2025-10-30 14:11:20 UTC ya make is running...
🔴 2025-10-30 14:18:37 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Oct 30, 2025

2025-10-30 14:09:54 UTC Pre-commit check linux-x86_64-relwithdebinfo for 026bc5c has started.
2025-10-30 14:10:10 UTC Artifacts will be uploaded here
2025-10-30 14:11:34 UTC ya make is running...
🔴 2025-10-30 14:14:56 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Oct 31, 2025

2025-10-31 08:22:30 UTC Pre-commit check linux-x86_64-relwithdebinfo for 101885d has started.
2025-10-31 08:22:46 UTC Artifacts will be uploaded here
2025-10-31 08:24:10 UTC ya make is running...
🔴 2025-10-31 08:49:19 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Oct 31, 2025

2025-10-31 08:24:47 UTC Pre-commit check linux-x86_64-release-asan for 101885d has started.
2025-10-31 08:25:05 UTC Artifacts will be uploaded here
2025-10-31 08:26:29 UTC ya make is running...
🔴 2025-10-31 08:29:53 UTC Build failed, see the logs. Also see fail summary

@github-actions
Copy link

github-actions bot commented Oct 31, 2025

2025-10-31 11:48:10 UTC Pre-commit check linux-x86_64-release-asan for 4acca99 has started.
2025-10-31 11:48:15 UTC Artifacts will be uploaded here
2025-10-31 11:49:36 UTC ya make is running...
🟡 2025-10-31 13:53:29 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
17948 17535 0 137 249 27

🟢 2025-10-31 13:53:37 UTC Build successful.
🟢 2025-10-31 13:53:59 UTC ydbd size 3.8 GiB changed* by +2.4 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: a04e6ba merge: 4acca99 diff diff %
ydbd size 4 063 322 088 Bytes 4 063 324 592 Bytes +2.4 KiB +0.000%
ydbd stripped size 1 508 300 152 Bytes 1 508 300 984 Bytes +832 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Oct 31, 2025

2025-10-31 11:49:50 UTC Pre-commit check linux-x86_64-relwithdebinfo for 4acca99 has started.
2025-10-31 11:49:54 UTC Artifacts will be uploaded here
2025-10-31 11:51:19 UTC ya make is running...
🟡 2025-10-31 13:28:37 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
41576 38745 0 6 2796 29

2025-10-31 13:28:48 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-10-31 13:39:08 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
72 (only retried tests) 58 0 0 0 14

🟢 2025-10-31 13:39:18 UTC Build successful.
🟢 2025-10-31 13:39:34 UTC ydbd size 2.3 GiB changed* by +1.4 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: a04e6ba merge: 4acca99 diff diff %
ydbd size 2 426 565 360 Bytes 2 426 566 824 Bytes +1.4 KiB +0.000%
ydbd stripped size 515 718 280 Bytes 515 718 536 Bytes +256 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

2025-11-05 05:32:17 UTC Pre-commit check linux-x86_64-release-asan for 87da4d0 has started.
2025-11-05 05:33:04 UTC Artifacts will be uploaded here
🔴 2025-11-05 05:34:48 UTC Graph compare failed, see the logs.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

2025-11-05 05:34:58 UTC Pre-commit check linux-x86_64-relwithdebinfo for 87da4d0 has started.
2025-11-05 05:35:15 UTC Artifacts will be uploaded here
2025-11-05 05:36:39 UTC Check cancelled

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

2025-11-05 05:40:05 UTC Pre-commit check linux-x86_64-relwithdebinfo for 39859c6 has started.
2025-11-05 05:40:22 UTC Artifacts will be uploaded here
🔴 2025-11-05 05:41:38 UTC Graph compare failed, see the logs.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

2025-11-05 05:40:37 UTC Pre-commit check linux-x86_64-release-asan for 39859c6 has started.
2025-11-05 05:40:54 UTC Artifacts will be uploaded here
🔴 2025-11-05 05:42:14 UTC Graph compare failed, see the logs.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

2025-11-05 05:47:34 UTC Pre-commit check linux-x86_64-release-asan for 9de9218 has started.
2025-11-05 05:47:51 UTC Artifacts will be uploaded here
2025-11-05 05:49:15 UTC ya make is running...
🟡 2025-11-05 08:00:37 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
17969 17434 0 213 295 27

🟢 2025-11-05 08:00:44 UTC Build successful.
🟢 2025-11-05 08:01:07 UTC ydbd size 3.8 GiB changed* by +2.4 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: b796797 merge: 9de9218 diff diff %
ydbd size 4 068 703 528 Bytes 4 068 706 016 Bytes +2.4 KiB +0.000%
ydbd stripped size 1 510 387 816 Bytes 1 510 388 648 Bytes +832 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

2025-11-05 05:47:54 UTC Pre-commit check linux-x86_64-relwithdebinfo for 9de9218 has started.
2025-11-05 05:48:13 UTC Artifacts will be uploaded here
2025-11-05 05:49:44 UTC ya make is running...
🟡 2025-11-05 07:35:08 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
41590 38742 0 3 2809 36

2025-11-05 07:35:22 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-11-05 07:46:41 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
96 (only retried tests) 82 0 0 0 14

🟢 2025-11-05 07:46:44 UTC Build successful.
🟢 2025-11-05 07:47:02 UTC ydbd size 2.3 GiB changed* by +1.4 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: b796797 merge: 9de9218 diff diff %
ydbd size 2 429 298 464 Bytes 2 429 299 912 Bytes +1.4 KiB +0.000%
ydbd stripped size 516 138 632 Bytes 516 138 888 Bytes +256 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@pavelvelikhov pavelvelikhov self-requested a review November 5, 2025 13:57
pavelvelikhov
pavelvelikhov previously approved these changes Nov 5, 2025
struct TExprContext;

class IOptimizerFactory: private TNonCopyable {
struct TOptimizerSettings {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Предлагаю добавить Cbo в имя класса, что бы не путать с обычными оптимизаторами

IOptimizerNew* MakeNativeOptimizerNew(
IProviderContext& ctx,
const ui32 maxDPHypDPTableSize,
const TOptimizerSettings &settings,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: & к типу

IOptimizerNew* MakeNativeOptimizerNew(
IProviderContext& pctx,
const ui32 maxDPhypDPTableSize,
const TOptimizerSettings &settings,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: & к типу

{}, TIssuesIds::CBO_ENUM_LIMIT_REACHED,
"Cost Based Optimizer could not be applied to this query: "
"Enumeration is too large, use PRAGMA MaxDPHypDPTableSize='4294967295' to disable the limitation"
"Enumeration is too large, use PRAGMA ydb.MaxDPHypDPTableSize='4294967295' to disable the limitation"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Почему ydb, а не dq?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Это общая библиотека. Ссылок на конкретные системы тут быть не должно

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Мы уже в доку прописали это, у нас эти прагмы ставятся через неймспейс ydb
Давайте пока так оставим и подумаем как это можно сделать для всех правильным
Например - провайдер может сам сообщение формировать как хочет для ворнинга

TOptimizerNativeNew(
IProviderContext& ctx,
ui32 maxDPhypDPTableSize,
const TOptimizerSettings &optimizerSettings,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: & к типу

@github-actions
Copy link

github-actions bot commented Nov 6, 2025

2025-11-06 21:30:33 UTC Pre-commit check linux-x86_64-relwithdebinfo for 69e8cb6 has started.
2025-11-06 21:30:37 UTC Artifacts will be uploaded here
2025-11-06 21:32:07 UTC ya make is running...
🟡 2025-11-06 23:08:29 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
41675 38851 0 3 2792 29

2025-11-06 23:08:50 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-11-06 23:18:46 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
61 (only retried tests) 45 0 0 0 16

🟢 2025-11-06 23:18:55 UTC Build successful.
🟢 2025-11-06 23:19:16 UTC ydbd size 2.3 GiB changed* by +1.4 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 81d1ca3 merge: 69e8cb6 diff diff %
ydbd size 2 431 468 688 Bytes 2 431 470 144 Bytes +1.4 KiB +0.000%
ydbd stripped size 516 465 200 Bytes 516 465 456 Bytes +256 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@alexpaniman alexpaniman force-pushed the feature-random-join-topologies-testing branch from 5060e38 to 83bb78b Compare November 13, 2025 10:03
@github-actions
Copy link

github-actions bot commented Nov 13, 2025

2025-11-13 10:05:37 UTC Pre-commit check linux-x86_64-relwithdebinfo for a81ab95 has started.
2025-11-13 10:05:54 UTC Artifacts will be uploaded here
2025-11-13 10:08:11 UTC ya make is running...
🟡 2025-11-13 12:11:01 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
39504 36690 0 1 2791 22

2025-11-13 12:11:13 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-11-13 12:20:15 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
44 (only retried tests) 30 0 0 0 14

🟢 2025-11-13 12:20:21 UTC Build successful.
🔴 2025-11-13 12:20:40 UTC ydbd size 2.3 GiB changed* by +35.2 MiB, which is >= 2.0 MiB vs main: Alert

ydbd size dash main: 80a142b merge: a81ab95 diff diff %
ydbd size 2 442 222 968 Bytes 2 479 158 304 Bytes +35.2 MiB +1.512%
ydbd stripped size 519 623 568 Bytes 520 975 280 Bytes +1.3 MiB +0.260%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Nov 13, 2025

2025-11-13 10:05:38 UTC Pre-commit check linux-x86_64-release-asan for a81ab95 has started.
2025-11-13 10:05:55 UTC Artifacts will be uploaded here
2025-11-13 10:08:10 UTC ya make is running...
🟡 2025-11-13 12:39:55 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
15883 15410 0 204 245 24

🟢 2025-11-13 12:40:05 UTC Build successful.
🔴 2025-11-13 12:40:35 UTC ydbd size 3.9 GiB changed* by +43.9 MiB, which is >= 2.0 MiB vs main: Alert

ydbd size dash main: 80a142b merge: a81ab95 diff diff %
ydbd size 4 089 115 400 Bytes 4 135 191 208 Bytes +43.9 MiB +1.127%
ydbd stripped size 1 518 496 712 Bytes 1 522 525 384 Bytes +3.8 MiB +0.265%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

robot-piglet pushed a commit to ytsaurus/ytsaurus that referenced this pull request Nov 13, 2025
I merged "ForceShuffleElimination" parameter. It's been decided that it's better to use a more flexible "cutoff" parameter which enables ShuffleElimination optimization based on number of joins, not just with on/off switch. This PR updates this. After it's merged files in yql/essential will match corresponding files in the PR to YDB at github <ydb-platform/ydb#27065>.

Previously merged PR concerning the same github PR: <https://nda.ya.ru/t/UxEq690V7MqMWj>
commit_hash:26ed62335263ad4c8e536a1079088fdcdbf09676
@alexpaniman alexpaniman merged commit d7bbd97 into main Nov 13, 2025
12 checks passed
maybenotilya pushed a commit to maybenotilya/ydb that referenced this pull request Nov 15, 2025
I merged "ForceShuffleElimination" parameter. It's been decided that it's better to use a more flexible "cutoff" parameter which enables ShuffleElimination optimization based on number of joins, not just with on/off switch. This PR updates this. After it's merged files in yql/essential will match corresponding files in the PR to YDB at github <ydb-platform#27065>.

Previously merged PR concerning the same github PR: <https://nda.ya.ru/t/UxEq690V7MqMWj>
commit_hash:26ed62335263ad4c8e536a1079088fdcdbf09676
maybenotilya pushed a commit to maybenotilya/ydb that referenced this pull request Nov 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants