Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teradata ROW_NUMBER() OVER (PARTITION BY ...) issue #3347

Closed
jakefrost opened this issue Feb 8, 2018 · 2 comments
Closed

Teradata ROW_NUMBER() OVER (PARTITION BY ...) issue #3347

jakefrost opened this issue Feb 8, 2018 · 2 comments

Comments

@jakefrost
Copy link

Hi all, thanks for all your work on the Teradata translations for dbplyr. One issue I've come across is that ROW_NUMBER() window functions generated by dbplyr produce errors.

For example, if I run this code:

flights %>% 
  select(record_id, record_create_dt, acct_num, dep_dt, origin) %>% 
  group_by(record_id, record_create_dt, acct_num) %>% 
  mutate(rn = row_number()) %>% 
  show_query()

this is SQL generated:

SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
	,row_number() OVER (PARTITION BY "record_id", "record_create_dt", "acct_num", "dep_dt") AS "rn"
FROM (SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
FROM cdw.flights) "iljxeikdep"
WHERE (("dep_dt" = '2017-01-21') AND ("origin" = 'DEN'))

But it generates the following error:

Error in new_result(connection@ptr, statement) : 
  nanodbc/nanodbc.cpp:1344: 42000: [Teradata][ODBC Teradata Driver][Teradata Database] Syntax error: expected something between the word 'dep_dt' and ')'.

However, I then tweaked the generated SQL to add an ORDER BY clause within the OVER (PARTITION BY ...) parenthetical and ran it directly in Teradata:

SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
	,row_number() OVER (PARTITION BY "record_id", "record_create_dt", "acct_num", "dep_dt" ORDER BY "record_id", "record_create_dt") AS "rn"
FROM (SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
FROM cdw.flights) "iljxeikdep"
WHERE (("dep_dt" = '2017-01-21') AND ("origin" = 'DEN'))

and it ran without error.

I'm not exactly sure why the dbplyr translated code didn't run while my altered code did, but it could be related to the fact that Teradata recognizes ROW_NUMBER() as an ordered analytical function and requires ordering criteria within the window function. Or maybe I'm missing an obvious solution. Any help would be greatly appreciated. Thanks again!

@hadley
Copy link
Member

hadley commented May 20, 2018

You need to specify some ordering in your dplyr call:

library(dplyr, warn.conflicts = FALSE)

lf1 <- dbplyr::lazy_frame(x = 1:5, src = dbplyr::simulate_teradata())
lf1 %>% 
  group_by(x) %>%
  arrange(x) %>% 
  mutate(rn = row_number()) %>% 
  show_query()
#> <SQL> SELECT `x`, row_number() OVER (PARTITION BY `x` ORDER BY `x`) AS `rn`
#> FROM (SELECT *
#> FROM `df`
#> ORDER BY `x`) `osprstmdrh`

Created on 2018-05-20 by the reprex package (v0.2.0).

@lock
Copy link

lock bot commented Nov 16, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Nov 16, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants