Skip to content

Default translation of unions to Hive does not work #663

@alex-m-ffm

Description

@alex-m-ffm

Hi there,

Again, I don't know if this is an issue of the specific version of Hive (2.1.1-cdh6.2.1) or a general one.
The default translation of unions puts the two SELECT statements on both sides of UNION into parantheses.
This is no problem in Impala, but in Hive I get a syntax error about an unexpected string near UNION (.

df <- tibble(x = 1:10,
            group = c(rep("a", 3), rep("b", 4), rep("c", 3))
            )

df_a <- lazy_frame(df = df, con = simulate_hive()) %>% 
  filter(group == "a")

df_b <- lazy_frame(df = df, con = simulate_hive()) %>% 
  filter(group == "b")

union_all(df_a, df_b) %>% 
  show_query()

The query generated will be

(SELECT *
FROM `df`
WHERE (`group` = 'a'))
UNION ALL
(SELECT *
FROM `df`
WHERE (`group` = 'b')) 

I tried a similar simple query with an actual database in HUE and the parantheses seemed to have caused the issue. At least without them it worked.

The ideal translation would be thus:

SELECT *
FROM `df`
WHERE (`group` = 'a')
UNION ALL
SELECT *
FROM `df`
WHERE (`group` = 'b') 

In my real-life application the query is much more complex and dbplyr works with naming intermediate queries q01 et cetera. This is fine, but also the outer ones, i.e. the two sides of the UNION ALL are named like this via placing the SELECT queries in parentheses and this should be avoided.

Looking forward to hearing your views/experiences and possible fixes!
Best,
Alex

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions