Finding source table for columns #522
-
Hi, First of all thank you for the great library, it has been a pleasure using it! I have a question. I am working with a lot of convoluted queries that can have multiple derived tables and multiple nested column name changes. I need to update columns based on its source table. Is there a way to find that information for any given column? Excluding columns that come from calculated columns and given that I have the schema structure for all tables and columns. I tried to find it via going thru the scope but couldnt managed to do it. For example: SELECT ) I Is this possible? Any suggestion is welcomed Kind Regards |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Scope is indeed the way to do this: import sqlglot
from sqlglot import expressions as exp
from sqlglot.optimizer.scope import traverse_scope
ast = sqlglot.parse_one("SELECT ...")
physical_columns = [
c
for scope in traverse_scope(ast)
for c in scope.columns
if isinstance(scope.sources.get(c.table), exp.Table)
] Now, this only works if the columns are qualified. Qualified means the column is prefixed with the table name. To illustrate - sqlglot can't disambiguate columns in this query without knowing the schema: unqualified = """
SELECT
a,
b,
FROM physical_table
JOIN (
SELECT *
FROM physical_table2
) AS derived_table
""" If you're dealing with columns that might not be qualified, but you know the columns of the physical tables, sqlglot can qualify the columns for you: from sqlglot.optimizer.qualify_columns import qualify_columns
schema = {
"physical_table": {
"a": "type - doesn't actually matter for qualification...",
},
"physical_table2": {
"b": "-",
}
}
qualify_columns(unqualified, schema=schema)
# SELECT
# physical_table.a AS a,
# derived_table.b AS b
# FROM physical_table
# JOIN (
# SELECT
# physical_table2.b AS b
# FROM physical_table2
# ) AS derived_table
# Now you can get all the physical columns If you're dealing with columns that might not be qualified, and you don't know the columns of the physical tables, but you can connect to the database, you can fetch the schema on the fly. Happy to chat more about this. |
Beta Was this translation helpful? Give feedback.
Scope is indeed the way to do this:
Now, this only works if the columns are qualified. Qualified means the column is prefixed with the table name. To illustrate - sqlglot can't disambiguate columns in this query without knowing the schema:
If you're dealing with columns that might …