Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect translation in table function unnest syntax #3962

Closed
hellozepp opened this issue Aug 23, 2024 · 7 comments · Fixed by #3964 or #3999
Closed

Incorrect translation in table function unnest syntax #3962

hellozepp opened this issue Aug 23, 2024 · 7 comments · Fixed by #3964 or #3999
Assignees

Comments

@hellozepp
Copy link
Contributor

Fully reproducible code snippet

The unnest syntax cannot be converted correctly, related to starrocks and bigquery.

In my case, I need to convert cross join unnest to LATERAL VIEW EXPLODE.

The following sql:

-- sql1
SELECT id, scores, unnest FROM values (1, [80,85,87]),(2, [77, null, 89]) as student_score(id, scores), unnest(scores) AS unnest;
-- sql2
SELECT id, scores, unnest FROM values (1, [80,85,87]),(2, [77, null, 89]) as student_score(id, scores), unnest(scores);
-- sql3
SELECT id, unnest.type, unnest.scores
FROM values ("typeA,typeB", [80,85,88]),("typeA,typeB,typeC", [87,90,95]) as example_table(type, scores), unnest(split(type, ","), scores) AS unnest(type,scores);

When we transpile in case sql1 and sql2, we will get a result that loses the unnest function (all dialects):

SELECT id, scores, t.unnest FROM VALUES (1, ARRAY(80, 85, 87)), (2, ARRAY(77, NULL, 89)) AS student_score(id, scores)

Expected value:

FROM values ​​(1, [80,85,87]),(2, [77, null, 89]) as student_score(id, scores)
LATERAL VIEW EXPLODE(scores) u AS unnest;

Official Documentation
Please include links to official SQL documentation related to your issue.

https://docs.starrocks.io/docs/using_starrocks/Lateral_join/
https://cloud.google.com/bigquery/docs/arrays?hl=en

@hellozepp
Copy link
Contributor Author

hellozepp commented Aug 23, 2024

I'm interested in trying to solve this issue, tks

I want to modify the sqlglot.transforms.unnest_to_explode function. After referring to the unnest syntax of starrocks and bigquery, I think:

  • If the alias of unnest is not set, the columns name and table Should be "unnest" as default
  • If an alias is set, columns name should be the alias, table should be "unnest" as default
  • If the sql3 form AS unnest(type,scores) is set, a TableAlias ​​is generated by unnest_to_explode, including the table alias and column name

Parsing SQL 1:
SELECT id, scores, unnest FROM values (1, [80,85,87]),(2, [77, null, 89]) as student_score(id, scores), unnest(scores) AS unnest

Parsed SQL 1:

Select(
  expressions=[
    Column(
      this=Identifier(this=id, quoted=False)),
    Column(
      this=Identifier(this=scores, quoted=False)),
    Column(
      this=Identifier(this=unnest, quoted=False))],
  from=From(
    this=Values(
      expressions=[
        Tuple(
          expressions=[
            Literal(this=1, is_string=False),
            Array(
              expressions=[
                Literal(this=80, is_string=False),
                Literal(this=85, is_string=False),
                Literal(this=87, is_string=False)])]),
        Tuple(
          expressions=[
            Literal(this=2, is_string=False),
            Array(
              expressions=[
                Literal(this=77, is_string=False),
                Null(),
                Literal(this=89, is_string=False)])])],
      alias=TableAlias(
        this=Identifier(this=student_score, quoted=False),
        columns=[
          Identifier(this=id, quoted=False),
          Identifier(this=scores, quoted=False)]))),
  joins=[
    Join(
      this=Unnest(
        expressions=[
          Column(
            this=Identifier(this=scores, quoted=False))],
        alias=TableAlias(
          this=Identifier(this=unnest, quoted=False))))])

The modification should include laterals, where alias is set to the default value of unnest:

  laterals=[
    Lateral(
      this=Explode(
        this=Column(
          this=Identifier(this=scores, quoted=False))),
      view=True,
      alias=TableAlias(
        this=Identifier(this=unnest, quoted=False),
        columns=[
          Identifier(this=unnest, quoted=False)]))])

@hellozepp
Copy link
Contributor Author

@georgesittas PTAL

@georgesittas
Copy link
Collaborator

@hellozepp please provide a clear example of how exactly you're using sqlglot, what's the expected output vs what you got & why the latter is incorrect. I can't understand what the issue at hand is based on the description, e.g. you mentioned BigQuery but the queries you provided are not valid in it.

@hellozepp
Copy link
Contributor Author

@georgesittas
Oh, let me clarify. The sql provided above is in starrocks syntax, and I want to convert it to spark:

sqlglot.transpile(sql, read="starrocks", write="spark")

The original SQL I transpile:

SELECT id, scores, unnest FROM values (1, [80,85,87]),(2, [77, null, 89]) as student_score(id, scores), unnest(scores) AS unnest;

The output I expected:

SELECT id,scores, unnest
FROM values (1, [80,85,87]),(2, [77, null, 89]) as student_score(id, scores)
LATERAL VIEW EXPLODE(scores) unnest AS unnest;

The actual output:

SELECT id, scores, t.unnest FROM VALUES (1, ARRAY(80, 85, 87)), (2, ARRAY(77, NULL, 89)) AS student_score(id, scores);

At the end of the sql, the simplified unnest is missing.

Sorry, I just tried bigquery's CROSS JOIN UNNEST transpile, and everything worked fine

@hellozepp
Copy link
Contributor Author

Another type of unnest problem, the following starrocks sql transpile to spark will also have problems:

SELECT student, score, t.unnest
FROM tests
CROSS JOIN LATERAL UNNEST(scores) AS t;

Expected output:

SELECT student, score, t.unnest FROM tests LATERAL VIEW EXPLODE(scores) unnest AS t

Actual:

SELECT student, score, t.unnest FROM tests CROSS JOIN LATERAL UNNEST(scores) AS t

CROSS JOIN LATERAL and UNNEST should not appear at the same time.

I tried to figure out the reason, and I found that _parse_lateral in parse.py calls _parse_unnest, while unnest_to_explode in transforms.py does not process isinstance(unnest, exp.Lateral). Can I modify the code?

@hellozepp
Copy link
Contributor Author

@georgesittas It seems that only starrocks has the problem.
As a novice, I want to modify the code. In your opinion, should I handle it specially in starrocks.py or change the public method?

@hellozepp
Copy link
Contributor Author

Thanks a lot @georgesittas @VaggelisD
This will be helpful later on. I will try this fix later.

hellozepp pushed a commit to hellozepp/sqlglot-doris that referenced this issue Aug 29, 2024
fix some comment
hellozepp pushed a commit to hellozepp/sqlglot-doris that referenced this issue Aug 29, 2024
fix some comment
hellozepp pushed a commit to hellozepp/sqlglot-doris that referenced this issue Aug 29, 2024
fix some comment
hellozepp pushed a commit to hellozepp/sqlglot-doris that referenced this issue Aug 29, 2024
fix some comment
hellozepp pushed a commit to hellozepp/sqlglot-doris that referenced this issue Aug 29, 2024
fix some comment
hellozepp pushed a commit to hellozepp/sqlglot-doris that referenced this issue Aug 29, 2024
Re-run github action
hellozepp pushed a commit to hellozepp/sqlglot-doris that referenced this issue Aug 29, 2024
fix some comment

Fixes tobymao#3962

Re-run github action
VaggelisD pushed a commit that referenced this issue Aug 29, 2024
* fix(starrocks): exp.Unnest transpilation

* fix(starrocks): exp.Unnest transpilation

use `arrays_zip` to merge multiple `Lateral view explode`

* Fixes #3962

fix some comment

* Fixes #3962

fix some comment

* Update sqlglot/transforms.py

Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>

* Fixes #3962

fix some comment

* Fixes #3962

fix some comment

Fixes #3962

Re-run github action

---------

Co-authored-by: lin.zhang <lin.zhang@clickzetta.com>
Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>
hellozepp added a commit to hellozepp/sqlglot-doris that referenced this issue Sep 3, 2024
* fix(starrocks): exp.Unnest transpilation

* fix(starrocks): exp.Unnest transpilation

use `arrays_zip` to merge multiple `Lateral view explode`

* Fixes tobymao#3962

fix some comment

* Fixes tobymao#3962

fix some comment

* Update sqlglot/transforms.py

Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>

* Fixes tobymao#3962

fix some comment

* Fixes tobymao#3962

fix some comment

Fixes tobymao#3962

Re-run github action

---------

Co-authored-by: lin.zhang <lin.zhang@clickzetta.com>
Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants