Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor!: improve transpilation of JSON paths across dialects #2883

Merged
merged 24 commits into from
Jan 30, 2024

Conversation

@georgesittas
Copy link
Collaborator Author

georgesittas commented Jan 25, 2024

A possible improvement is to consolidate multiple -> or ->> applications into a single JSONExtract node by merging the respective JSON paths, but I'm thinking to leave it for a future PR. Right now we parse these chained operations into multiple JSONExtract nodes:

>>> import sqlglot
>>> sqlglot.parse_one("x -> 'a' -> 'b' -> 'z'", "postgres")
JSONExtract(
  this=JSONExtract(
    this=JSONExtract(
      this=Column(
        this=Identifier(this=x, quoted=False)),
      expression=JSONPath(
        this=[
          {'kind': 'root'},
          {'kind': 'key', 'value': 'a'}])),
    expression=JSONPath(
      this=[
        {'kind': 'root'},
        {'kind': 'key', 'value': 'b'}])),
  expression=JSONPath(
    this=[
      {'kind': 'root'},
      {'kind': 'key', 'value': 'z'}]))
>>> sqlglot.parse_one("x -> 'a' -> 'b' -> 'z'", "postgres").sql("bigquery")
"JSON_EXTRACT(JSON_EXTRACT(JSON_EXTRACT(x, '$.a'), '$.b'), '$.z')"
>>> sqlglot.parse_one("x -> 'a' -> 'b' -> 'z'", "postgres").sql("duckdb")
"x -> '$.a' -> '$.b' -> '$.z'"

sqlglot/serde.py Outdated Show resolved Hide resolved
@georgesittas
Copy link
Collaborator Author

FYI the only remaining dialect I need to refactor is Snowflake, but their path syntax is a bit weirder than the rest of the dialects that don't support the JSON path syntax as-is. DuckDB doesn't support the full spec of JSON path too and they seem to have augmented the path syntax with their own, for example the # here is not in the official spec:

SELECT json_extract('{"duck": [1, 2, 3]}', '$.duck[#-1]');

Refer to the linked docs for more info. You can also have JSON Pointer syntax for DuckDB paths which will be left as-is.

sqlglot/expressions.py Outdated Show resolved Hide resolved
sqlglot/dialects/sqlite.py Outdated Show resolved Hide resolved
sqlglot/generator.py Outdated Show resolved Hide resolved
@georgesittas georgesittas merged commit b4e8868 into main Jan 30, 2024
5 checks passed
@georgesittas georgesittas deleted the jo/jsonpath_refactor branch January 30, 2024 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect parse between redshift JSON_EXTRACT_PATH_TEXT and databricks JSON path expression
2 participants