Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix!: revert escape sequence changes introduced in #2230 #2336

Merged
merged 1 commit into from
Sep 27, 2023

Conversation

georgesittas
Copy link
Collaborator

@georgesittas georgesittas commented Sep 27, 2023

This PR reverts 66aadfc, because my understanding of the intended "escaping" semantics at the time was incorrect.

Phillip's original BQ query in #2225 was SELECT '\a\b\f\n\r\t\v', i.e. this is what one would type verbatim in BQ's editor to run it against the engine. However, the python string "SELECT '\a\b\f\n\r\t\v'" that was fed into transpile is not the "correct" representation of the above query, because it's not what one would get if they were to read that query from a file, i.e. the source of truth for what the SQL means, in python:

➜ cat test.sql
SELECT '\a\b\f\n\r\t\v'
➜ python
Python 3.10.13 (main, Aug 24 2023, 22:36:46) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> open('test.sql', 'r').read()
"SELECT '\\a\\b\\f\\n\\r\\t\\v'\n"

Feeding that last python string to transpile, we get back the original BQ query, so the roundtrip of 1) storing the query in a file 2) reading that query from the file into its python string representation and 3) generating SQL back for that same dialect yields the exact same query, which is the intended behavior of SQLGlot:

>>> import sqlglot
>>> sqlglot.transpile("SELECT '\\a\\b\\f\\n\\r\\t\\v'\n", "bigquery")
["SELECT '\\a\\b\\f\\n\\r\\t\\v'"]
>>> print(sqlglot.transpile("SELECT '\\a\\b\\f\\n\\r\\t\\v'\n", "bigquery")[0])
SELECT '\a\b\f\n\r\t\v'

See the discussion in #2325 for more context.

cc: @cpcloud

@@ -347,6 +345,7 @@ class Generator:
STRICT_STRING_CONCAT = False
NORMALIZE_FUNCTIONS: bool | str = "upper"
NULL_ORDERING = "nulls_are_small"
ESCAPE_LINE_BREAK = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Didn't see this before 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - Toby added this a while ago because BQ doesn't allow actual newlines in single-quoted strings, i.e.

SELECT '
'

So if we ever encounter such a string, e.g. in Postgres, and want to transpile it to BQ we make sure to escape it:

>>> import sqlglot
>>> sqlglot.transpile("""'
... '""", "postgres", "bigquery")
["'\\n'"]

Postgres:

georgesittas=# select '
georgesittas'# ';
 ?column?
----------
         +

(1 row)

BQ:
Screenshot 2023-09-28 at 1 48 02 AM

Copy link
Contributor

@cpcloud cpcloud Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it has the reproducibility problem if I then read the bigquery string in from a file and pass read="postgres", no?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you read it from a file, a true line break is \n, and what you see in the bq ui would actually be \n

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read in a bigquery select query with bigquery newline from a file, parse it as postgres. Is that the same result as passing in an ASCII 10 character, as you specified in your example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read in a bigquery select query with bigquery newline from a file, parse it as postgres.

I'm not sure if that premise makes sense? Why would you parse a BigQuery SQL file using Postgres? That's like feeding possibly invalid SQL into parse_one and expecting back a correct AST, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what I mean is: parse it as bigquery, transpile it to postgres. Are those equivalent when executed?

Copy link
Collaborator Author

@georgesittas georgesittas Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see what you mean, yea in that case the transpilation breaks because we generate SELECT '\\n' - not equivalent. I'll think about this one tomorrow, might be that the transpilation towards postgres is broken because it doesn't treat the backslash as an escape.

@georgesittas georgesittas merged commit f0e5eb6 into main Sep 27, 2023
5 checks passed
@georgesittas georgesittas deleted the jo/revert_escape_changes branch September 27, 2023 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants