Teradata ROW_NUMBER() OVER (PARTITION BY ...) issue #3347

jakefrost · 2018-02-08T01:03:51Z

Hi all, thanks for all your work on the Teradata translations for dbplyr. One issue I've come across is that ROW_NUMBER() window functions generated by dbplyr produce errors.

For example, if I run this code:

flights %>% 
  select(record_id, record_create_dt, acct_num, dep_dt, origin) %>% 
  group_by(record_id, record_create_dt, acct_num) %>% 
  mutate(rn = row_number()) %>% 
  show_query()

this is SQL generated:

SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
	,row_number() OVER (PARTITION BY "record_id", "record_create_dt", "acct_num", "dep_dt") AS "rn"
FROM (SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
FROM cdw.flights) "iljxeikdep"
WHERE (("dep_dt" = '2017-01-21') AND ("origin" = 'DEN'))

But it generates the following error:

Error in new_result(connection@ptr, statement) : 
  nanodbc/nanodbc.cpp:1344: 42000: [Teradata][ODBC Teradata Driver][Teradata Database] Syntax error: expected something between the word 'dep_dt' and ')'.

However, I then tweaked the generated SQL to add an ORDER BY clause within the OVER (PARTITION BY ...) parenthetical and ran it directly in Teradata:

SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
	,row_number() OVER (PARTITION BY "record_id", "record_create_dt", "acct_num", "dep_dt" ORDER BY "record_id", "record_create_dt") AS "rn"
FROM (SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
FROM cdw.flights) "iljxeikdep"
WHERE (("dep_dt" = '2017-01-21') AND ("origin" = 'DEN'))

and it ran without error.

I'm not exactly sure why the dbplyr translated code didn't run while my altered code did, but it could be related to the fact that Teradata recognizes ROW_NUMBER() as an ordered analytical function and requires ordering criteria within the window function. Or maybe I'm missing an obvious solution. Any help would be greatly appreciated. Thanks again!

hadley · 2018-05-20T14:30:06Z

You need to specify some ordering in your dplyr call:

library(dplyr, warn.conflicts = FALSE)

lf1 <- dbplyr::lazy_frame(x = 1:5, src = dbplyr::simulate_teradata())
lf1 %>% 
  group_by(x) %>%
  arrange(x) %>% 
  mutate(rn = row_number()) %>% 
  show_query()
#> <SQL> SELECT `x`, row_number() OVER (PARTITION BY `x` ORDER BY `x`) AS `rn`
#> FROM (SELECT *
#> FROM `df`
#> ORDER BY `x`) `osprstmdrh`

Created on 2018-05-20 by the reprex package (v0.2.0).

lock · 2018-11-16T14:45:34Z

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

jakefrost mentioned this issue Feb 8, 2018

Teradata Translation - Feature Request #3040

Closed

batpigandme added the database label Feb 9, 2018

hadley closed this as completed May 20, 2018

jkylearmstrong mentioned this issue May 30, 2018

dplyr with TeraData; mutate to add row_number() to TeraData table. #3627

Closed

lock bot locked and limited conversation to collaborators Nov 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teradata ROW_NUMBER() OVER (PARTITION BY ...) issue #3347

Teradata ROW_NUMBER() OVER (PARTITION BY ...) issue #3347

jakefrost commented Feb 8, 2018

hadley commented May 20, 2018

lock bot commented Nov 16, 2018

Teradata ROW_NUMBER() OVER (PARTITION BY ...) issue #3347

Teradata ROW_NUMBER() OVER (PARTITION BY ...) issue #3347

Comments

jakefrost commented Feb 8, 2018

hadley commented May 20, 2018

lock bot commented Nov 16, 2018