Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default translation of unions to Hive does not work #663

Closed
alex-m-ffm opened this issue Jun 12, 2021 · 1 comment · Fixed by #812
Closed

Default translation of unions to Hive does not work #663

alex-m-ffm opened this issue Jun 12, 2021 · 1 comment · Fixed by #812

Comments

@alex-m-ffm
Copy link

Hi there,

Again, I don't know if this is an issue of the specific version of Hive (2.1.1-cdh6.2.1) or a general one.
The default translation of unions puts the two SELECT statements on both sides of UNION into parantheses.
This is no problem in Impala, but in Hive I get a syntax error about an unexpected string near UNION (.

df <- tibble(x = 1:10,
            group = c(rep("a", 3), rep("b", 4), rep("c", 3))
            )

df_a <- lazy_frame(df = df, con = simulate_hive()) %>% 
  filter(group == "a")

df_b <- lazy_frame(df = df, con = simulate_hive()) %>% 
  filter(group == "b")

union_all(df_a, df_b) %>% 
  show_query()

The query generated will be

(SELECT *
FROM `df`
WHERE (`group` = 'a'))
UNION ALL
(SELECT *
FROM `df`
WHERE (`group` = 'b')) 

I tried a similar simple query with an actual database in HUE and the parantheses seemed to have caused the issue. At least without them it worked.

The ideal translation would be thus:

SELECT *
FROM `df`
WHERE (`group` = 'a')
UNION ALL
SELECT *
FROM `df`
WHERE (`group` = 'b') 

In my real-life application the query is much more complex and dbplyr works with naming intermediate queries q01 et cetera. This is fine, but also the outer ones, i.e. the two sides of the UNION ALL are named like this via placing the SELECT queries in parentheses and this should be avoided.

Looking forward to hearing your views/experiences and possible fixes!
Best,
Alex

@wangyuchen
Copy link

I ran into the same issue and added a specific method to Hive to solve it temporarily...

sql_query_set_op.Hive <- function (con, x, y, method, ..., all = FALSE) {
    build_sql(x, "\n", sql(method), if (all) 
        sql(" ALL"), "\n",  y, con = con)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants