Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: too many rows fetched by multi-shard left join #15828

Closed
GrahamCampbell opened this issue May 1, 2024 · 3 comments · Fixed by #15840
Closed

Bug Report: too many rows fetched by multi-shard left join #15828

GrahamCampbell opened this issue May 1, 2024 · 3 comments · Fixed by #15840
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: Performance

Comments

@GrahamCampbell
Copy link
Contributor

GrahamCampbell commented May 1, 2024

Overview of the Issue

Left joins with a limit that hit two shards, one for each side of the join, over-fetch rows even when a limit is present.

Reproduction Steps

Consider two tables, foo and bar, with primary vindexes hash indexes on columns a and b, respectively. The query:

SELECT `bar`.`id`
FROM `foo`
LEFT JOIN `bar`
  ON `bar`.`b` = 2
  AND `foo`.`c` = `bar`.`c`
WHERE `foo`.`a` = 1
LIMIT 1;

The planner will first select from foo from the first shard without a limit, and then select from bar, similarly without a limit. If there lots of rows in foo such that a = 1, we fetch far too many rows, resulting in bad performance and/or exceeding the max row limit. Similarly the planner then selects from bar from the second shard without a limit. Due to the presence of the left join, it is never necessary to select more than one row, and so the planner should be placing a limit 1 in the queries to each shard. This optimization extends for any limit n, not just 1, and similar queries where the right table in the left join is not constrained outside of the join clause, and actually more complex queries where everything left of the final left join can be pushed down into one shard, and then we have just this somewhat independent left join to run at the end.

Binary Version

Vitess 19.0.3

Operating System and Environment details

N/A

Log Fragments

No response

@GrahamCampbell GrahamCampbell added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels May 1, 2024
@systay systay mentioned this issue May 4, 2024
5 tasks
@systay systay added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving Type: Performance and removed Type: Bug Needs Triage This issue needs to be correctly labelled and triaged labels May 4, 2024
@systay
Copy link
Collaborator

systay commented May 4, 2024

Thanks for reporting this, @GrahamCampbell!

The planner was doing the safe thing here, we've just never invested time in making this better.

I'm changing this to a feature request - we don't want to backport these kind of planner enhancements.

@GrahamCampbell
Copy link
Contributor Author

Fair enough. Looking at the PR, I think it may have a bug - it's incorrect to push down the limit in the query for foo for the following query:

SELECT `bar`.`id`
FROM `foo`
LEFT JOIN `bar`
  ON `bar`.`b` = 2
  AND `foo`.`c` = `bar`.`c`
WHERE `foo`.`a` = 1
  AND `bar`.`id` in (1, 2, 3)
LIMIT 1;

I see there's a check against a null check condition, but I don't think that's enough. The bar.id cannot be pushed into the join condition and it does have the effect of filtering null bar.id rows.

@systay
Copy link
Collaborator

systay commented May 5, 2024

Fair enough. Looking at the PR, I think it may have a bug - it's incorrect to push down the limit in the query for foo for the following query:

I suggest we chat on the PR. You can point out the plan you think is not correct. Hard to understand without seeing the plan you are thinking of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature) Type: Performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants