Fix!: revert escape sequence changes introduced in #2230 #2336

georgesittas · 2023-09-27T22:43:30Z

This PR reverts 66aadfc, because my understanding of the intended "escaping" semantics at the time was incorrect.

Phillip's original BQ query in #2225 was SELECT '\a\b\f\n\r\t\v', i.e. this is what one would type verbatim in BQ's editor to run it against the engine. However, the python string "SELECT '\a\b\f\n\r\t\v'" that was fed into transpile is not the "correct" representation of the above query, because it's not what one would get if they were to read that query from a file, i.e. the source of truth for what the SQL means, in python:

➜ cat test.sql
SELECT '\a\b\f\n\r\t\v'
➜ python
Python 3.10.13 (main, Aug 24 2023, 22:36:46) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> open('test.sql', 'r').read()
"SELECT '\\a\\b\\f\\n\\r\\t\\v'\n"

Feeding that last python string to transpile, we get back the original BQ query, so the roundtrip of 1) storing the query in a file 2) reading that query from the file into its python string representation and 3) generating SQL back for that same dialect yields the exact same query, which is the intended behavior of SQLGlot:

>>> import sqlglot
>>> sqlglot.transpile("SELECT '\\a\\b\\f\\n\\r\\t\\v'\n", "bigquery")
["SELECT '\\a\\b\\f\\n\\r\\t\\v'"]
>>> print(sqlglot.transpile("SELECT '\\a\\b\\f\\n\\r\\t\\v'\n", "bigquery")[0])
SELECT '\a\b\f\n\r\t\v'

See the discussion in #2325 for more context.

cc: @cpcloud

cpcloud · 2023-09-27T22:44:52Z

sqlglot/generator.py

@@ -347,6 +345,7 @@ class Generator:
    STRICT_STRING_CONCAT = False
    NORMALIZE_FUNCTIONS: bool | str = "upper"
    NULL_ORDERING = "nulls_are_small"
+    ESCAPE_LINE_BREAK = False


Interesting. Didn't see this before 😄

Yep - Toby added this a while ago because BQ doesn't allow actual newlines in single-quoted strings, i.e.

SELECT ' '

So if we ever encounter such a string, e.g. in Postgres, and want to transpile it to BQ we make sure to escape it:

>>> import sqlglot >>> sqlglot.transpile("""' ... '""", "postgres", "bigquery") ["'\\n'"]

Postgres:

georgesittas=# select ' georgesittas'# '; ?column? ---------- + (1 row)

BQ:

This feels like it has the reproducibility problem if I then read the bigquery string in from a file and pass read="postgres", no?

if you read it from a file, a true line break is \n, and what you see in the bq ui would actually be \n

Read in a bigquery select query with bigquery newline from a file, parse it as postgres. Is that the same result as passing in an ASCII 10 character, as you specified in your example?

Read in a bigquery select query with bigquery newline from a file, parse it as postgres.

I'm not sure if that premise makes sense? Why would you parse a BigQuery SQL file using Postgres? That's like feeding possibly invalid SQL into parse_one and expecting back a correct AST, right?

Sorry, what I mean is: parse it as bigquery, transpile it to postgres. Are those equivalent when executed?

Hmm I see what you mean, yea in that case the transpilation breaks because we generate SELECT '\\n' - not equivalent. I'll think about this one tomorrow, might be that the transpilation towards postgres is broken because it doesn't treat the backslash as an escape.

Fix: revert escape changes introduced in #2230

5713f18

georgesittas requested a review from tobymao September 27, 2023 22:43

cpcloud reviewed Sep 27, 2023

View reviewed changes

cpcloud approved these changes Sep 27, 2023

View reviewed changes

tobymao approved these changes Sep 27, 2023

View reviewed changes

georgesittas merged commit f0e5eb6 into main Sep 27, 2023
5 checks passed

georgesittas deleted the jo/revert_escape_changes branch September 27, 2023 23:01

georgesittas mentioned this pull request Oct 3, 2023

Fix: unescape escape sequences on read, re-escape them on generation #2367

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix!: revert escape sequence changes introduced in #2230 #2336

Fix!: revert escape sequence changes introduced in #2230 #2336

georgesittas commented Sep 27, 2023 •

edited

Loading

cpcloud Sep 27, 2023

georgesittas Sep 27, 2023

cpcloud Sep 27, 2023 •

edited

Loading

tobymao Sep 27, 2023

cpcloud Sep 27, 2023

georgesittas Sep 27, 2023

cpcloud Sep 27, 2023

georgesittas Sep 27, 2023 •

edited

Loading

Fix!: revert escape sequence changes introduced in #2230 #2336

Fix!: revert escape sequence changes introduced in #2230 #2336

Conversation

georgesittas commented Sep 27, 2023 • edited Loading

cpcloud Sep 27, 2023

Choose a reason for hiding this comment

georgesittas Sep 27, 2023

Choose a reason for hiding this comment

cpcloud Sep 27, 2023 • edited Loading

Choose a reason for hiding this comment

tobymao Sep 27, 2023

Choose a reason for hiding this comment

cpcloud Sep 27, 2023

Choose a reason for hiding this comment

georgesittas Sep 27, 2023

Choose a reason for hiding this comment

cpcloud Sep 27, 2023

Choose a reason for hiding this comment

georgesittas Sep 27, 2023 • edited Loading

Choose a reason for hiding this comment

georgesittas commented Sep 27, 2023 •

edited

Loading

cpcloud Sep 27, 2023 •

edited

Loading

georgesittas Sep 27, 2023 •

edited

Loading