Bug Report: too many rows fetched by multi-shard left join #15828
Labels
Component: Query Serving
Type: Enhancement
Logical improvement (somewhere between a bug and feature)
Type: Performance
Overview of the Issue
Left joins with a limit that hit two shards, one for each side of the join, over-fetch rows even when a limit is present.
Reproduction Steps
Consider two tables,
foo
andbar
, with primary vindexes hash indexes on columnsa
andb
, respectively. The query:The planner will first select from
foo
from the first shard without a limit, and then select frombar
, similarly without a limit. If there lots of rows infoo
such thata = 1
, we fetch far too many rows, resulting in bad performance and/or exceeding the max row limit. Similarly the planner then selects frombar
from the second shard without a limit. Due to the presence of the left join, it is never necessary to select more than one row, and so the planner should be placing a limit 1 in the queries to each shard. This optimization extends for any limitn
, not just1
, and similar queries where the right table in the left join is not constrained outside of the join clause, and actually more complex queries where everything left of the final left join can be pushed down into one shard, and then we have just this somewhat independent left join to run at the end.Binary Version
Operating System and Environment details
Log Fragments
No response
The text was updated successfully, but these errors were encountered: