Fix qualify_columns failing on correlated scalar subqueries#51
Merged
tobilg merged 3 commits intotobilg:mainfrom Mar 3, 2026
Merged
Fix qualify_columns failing on correlated scalar subqueries#51tobilg merged 3 commits intotobilg:mainfrom
tobilg merged 3 commits intotobilg:mainfrom
Conversation
When a query contains a correlated scalar subquery (e.g., `SELECT id, (SELECT AVG(val) FROM t2 WHERE t2.id = t1.id) FROM t1`), qualify_columns built an isolated scope for the inner SELECT that only contained the subquery's own sources (t2). References to the outer table (t1) triggered an UnknownTable error because the outer scope was not visible. The fix checks whether an unresolved table qualifier exists in the schema before erroring. If the table is known in the schema but not in the current scope, it is treated as a correlated outer reference and left as-is. Tables that exist in neither scope nor schema still produce an error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a query uses JOIN USING(col), the shared column exists in both tables, making it ambiguous for the resolver. qualify_columns had no awareness of USING columns and failed to resolve them. The fix registers USING columns with the resolver before qualifying expressions. Each USING column is mapped to the first FROM-clause source that contains it (the left side of the join). The resolver then checks this mapping when its standard unambiguous-column lookup fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The resolver's extract_columns_from_source used only the table name (e.g. "t1") when looking up columns in the schema, ignoring the schema and catalog qualifiers. When a table was registered as "raw.t1", the lookup for just "t1" failed because MappingSchema stores entries hierarchically. Build the fully qualified name (catalog.schema.table) from the TableRef before calling schema.column_names(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Owner
|
Merged, thanks! |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
lineage_with_schema(andqualify_columns) failed when a query contained a correlated scalar subquery referencing an outer table (e.g.,SELECT id, (SELECT AVG(val) FROM t2 WHERE t2.id = t1.id) FROM t1)t2), so the reference to outer tablet1triggered anUnknownTableerrorThis fixes #46, #47, and #48
Test plan
test_qualify_columns_correlated_scalar_subqueryverifies qualification succeeds and both inner/outer columns are resolvedtest_qualify_columns_rejects_unknown_tableverifies tables in neither scope nor schema still produce errorstest_lineage_with_schema_correlated_scalar_subqueryverifies end-to-end lineage on the exact failing query🤖 Generated with Claude Code