-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix!: revert escape sequence changes introduced in #2230 #2336
Conversation
@@ -347,6 +345,7 @@ class Generator: | |||
STRICT_STRING_CONCAT = False | |||
NORMALIZE_FUNCTIONS: bool | str = "upper" | |||
NULL_ORDERING = "nulls_are_small" | |||
ESCAPE_LINE_BREAK = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Didn't see this before 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - Toby added this a while ago because BQ doesn't allow actual newlines in single-quoted strings, i.e.
SELECT '
'
So if we ever encounter such a string, e.g. in Postgres, and want to transpile it to BQ we make sure to escape it:
>>> import sqlglot
>>> sqlglot.transpile("""'
... '""", "postgres", "bigquery")
["'\\n'"]
Postgres:
georgesittas=# select '
georgesittas'# ';
?column?
----------
+
(1 row)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it has the reproducibility problem if I then read the bigquery string in from a file and pass read="postgres"
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you read it from a file, a true line break is \n, and what you see in the bq ui would actually be \n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read in a bigquery select query with bigquery newline from a file, parse it as postgres. Is that the same result as passing in an ASCII 10 character, as you specified in your example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read in a bigquery select query with bigquery newline from a file, parse it as postgres.
I'm not sure if that premise makes sense? Why would you parse a BigQuery SQL file using Postgres? That's like feeding possibly invalid SQL into parse_one
and expecting back a correct AST, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what I mean is: parse it as bigquery, transpile it to postgres. Are those equivalent when executed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I see what you mean, yea in that case the transpilation breaks because we generate SELECT '\\n'
- not equivalent. I'll think about this one tomorrow, might be that the transpilation towards postgres is broken because it doesn't treat the backslash as an escape.
This PR reverts 66aadfc, because my understanding of the intended "escaping" semantics at the time was incorrect.
Phillip's original BQ query in #2225 was
SELECT '\a\b\f\n\r\t\v'
, i.e. this is what one would type verbatim in BQ's editor to run it against the engine. However, the python string"SELECT '\a\b\f\n\r\t\v'"
that was fed intotranspile
is not the "correct" representation of the above query, because it's not what one would get if they were to read that query from a file, i.e. the source of truth for what the SQL means, in python:Feeding that last python string to
transpile
, we get back the original BQ query, so the roundtrip of 1) storing the query in a file 2) reading that query from the file into its python string representation and 3) generating SQL back for that same dialect yields the exact same query, which is the intended behavior of SQLGlot:See the discussion in #2325 for more context.
cc: @cpcloud