Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semantically incorrect transpilation of unnest from duckdb to bigquery #2227

Closed
cpcloud opened this issue Sep 15, 2023 · 7 comments · Fixed by #2235
Closed

semantically incorrect transpilation of unnest from duckdb to bigquery #2227

cpcloud opened this issue Sep 15, 2023 · 7 comments · Fixed by #2235
Assignees

Comments

@cpcloud
Copy link
Contributor

cpcloud commented Sep 15, 2023

Fully reproducible code snippet
Please include a fully reproducible code snippet or the input sql, dialect, and expected output.

First, the Python:

In [19]: import sqlglot as sg

In [20]: sg.parse_one("select unnest(array[1, 2, 3]), unnest(array[4, 5])", read="duckdb").sql(dialect="bigquery")
Out[20]: 'SELECT col, col_2 FROM UNNEST([1, 2, 3]) AS col CROSS JOIN UNNEST([4, 5]) AS col_2'

Running this in DuckDB:

D select unnest([1, 2, 3]), unnest([4, 5]);
┌──────────────────────────────────┬───────────────────────────────┐
│ unnest(main.list_value(1, 2, 3)) │ unnest(main.list_value(4, 5)) │
│              int32               │             int32             │
├──────────────────────────────────┼───────────────────────────────┤
│                                1 │                             4 │
│                                2 │                             5 │
│                                3 │                               │
└──────────────────────────────────┴───────────────────────────────┘

And in the BigQuery UI, because bq shell doesn't accept the syntax:

image

This should probably compile to something like this:

SELECT
  s.col[SAFE_OFFSET(i)] as col,
  s.col_2[SAFE_OFFSET(i)] as col_2
FROM (
  SELECT
    [1, 2, 3] as col,
    [4, 5] as col_2
) as s CROSS JOIN UNNEST(GENERATE_ARRAY(0, GREATEST(ARRAY_LENGTH([1, 2, 3]) - 1, ARRAY_LENGTH([4, 5]) - 1))) AS i

which gives the same output as DuckDB:

image

Official Documentation

https://duckdb.org/docs/sql/query_syntax/unnest.html#unnesting-lists

See the second example

-- unnest a scalar list, generating 3 rows ((1, 10), (2, 11), (3, NULL))
SELECT UNNEST([1, 2, 3]), UNNEST([10, 11]);
@cpcloud
Copy link
Contributor Author

cpcloud commented Sep 15, 2023

You can use SAFE_ORDINAL to save a bit of arithmetic syntax:

SELECT
  s.col[SAFE_ORDINAL(i)] as col,
  s.col_2[SAFE_ORDINAL(i)] as col_2
FROM (
  SELECT
    [1, 2, 3] as col,
    [4, 5] as col_2
) as s CROSS JOIN UNNEST(GENERATE_ARRAY(1, GREATEST(ARRAY_LENGTH([1, 2, 3]), ARRAY_LENGTH([4, 5])))) AS i

@cpcloud
Copy link
Contributor Author

cpcloud commented Sep 15, 2023

Postgres and DuckDB behave the same here:

ibis_testing> select unnest(array[1, 2, 3]), unnest(array[4, 5]);
+--------+--------+
| unnest | unnest |
|--------+--------|
| 1      | 4      |
| 2      | 5      |
| 3      | <null> |
+--------+--------+

@cpcloud
Copy link
Contributor Author

cpcloud commented Sep 15, 2023

FWIW I am happy to see sqlglot attempting to handle the unnest-column-to-cross-join problem. It's been a pain the butt to handle in ibis.

@cpcloud
Copy link
Contributor Author

cpcloud commented Sep 15, 2023

I guess there's implicit unnesting, which when combined with ,-style cross join makes the bigquery version even less verbose:

SELECT
  s.col[SAFE_ORDINAL(i)] as col,
  s.col_2[SAFE_ORDINAL(i)] as col_2
FROM (
  SELECT
    [1, 2, 3] as col,
    [4, 5] as col_2,
    GENERATE_ARRAY(1, GREATEST(ARRAY_LENGTH([1, 2, 3]), ARRAY_LENGTH([4, 5]))) as i
) as s, s.i

@cpcloud
Copy link
Contributor Author

cpcloud commented Sep 15, 2023

You can reduce it further:

SELECT
  [1, 2, 3][SAFE_ORDINAL(i)] AS col,
  [4, 5][SAFE_ORDINAL(i)] AS col_2
FROM
  UNNEST(GENERATE_ARRAY(1, GREATEST(ARRAY_LENGTH([1, 2, 3]), ARRAY_LENGTH([4, 5])))) AS i

The general pattern is:

SELECT
  array1[SAFE_ORDINAL(i)] AS col1,
  ...,
  arrayN[SAFE_ORDINAL(i)] AS colN
FROM
  source_table CROSS JOIN UNNEST(GENERATE_ARRAY(1, GREATEST(ARRAY_LENGTH(array1), ..., ARRAY_LENGTH(arrayN)))) AS i

@cpcloud
Copy link
Contributor Author

cpcloud commented Sep 15, 2023

There are a few dialects that I am aware of that need this transformation because they don't support UNNEST in SELECT position:

  • BigQuery
  • Snowflake
  • Trino

@tobymao
Copy link
Owner

tobymao commented Sep 15, 2023

this is the best way imo

select x, y, z
from 
unnest(array[1, 2, 3, 4]) x with offset o1
left join unnest(array[4, 5, 6]) y with offset o2
 on o1 = o2
left join unnest(array[7, 8]) z with offset o3
  on o1 = o3

@tobymao tobymao self-assigned this Sep 15, 2023
tobymao added a commit that referenced this issue Sep 16, 2023
also remove with ordinality with just offset for consistency
closes #2227
tobymao added a commit that referenced this issue Sep 16, 2023
also remove with ordinality with just offset for consistency
closes #2227
tobymao added a commit that referenced this issue Sep 16, 2023
also remove with ordinality with just offset for consistency
closes #2227
tobymao added a commit that referenced this issue Sep 16, 2023
also remove with ordinality with just offset for consistency
closes #2227
tobymao added a commit that referenced this issue Sep 16, 2023
also remove with ordinality with just offset for consistency
closes #2227
tobymao added a commit that referenced this issue Sep 16, 2023
also remove with ordinality with just offset for consistency
closes #2227
tobymao added a commit that referenced this issue Sep 16, 2023
also remove with ordinality with just offset for consistency
closes #2227
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants