Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teradata distinct #685

Closed
overmar opened this issue Aug 9, 2021 · 1 comment · Fixed by #1192
Closed

Teradata distinct #685

overmar opened this issue Aug 9, 2021 · 1 comment · Fixed by #1192
Labels
backend 🕺 feature a feature request or enhancement

Comments

@overmar
Copy link
Contributor

overmar commented Aug 9, 2021

We are currently running Teradata 16.20.5329 so this issue may have been fixed for different versions, but thought I would bring it up.

x %>% distinct() works when simulating a teradata connection, sql_render, and show_query, however it tries to use TOP 11 when it is run in an actual teradata session. The tdplyr package from teradata overwrites the sql_query_select and the sql_clause but the code returns the same values in either package.

dsnName <- "tera_hc"
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql
library(tdplyr)
#> Registered S3 methods overwritten by 'tdplyr':
#>   method                     from  
#>   sql_query_select.Teradata  dbplyr
#>   sql_table_analyze.Teradata dbplyr
#>   sql_translation.Teradata   dbplyr

con <- DBI::dbConnect(odbc::odbc(), dsnName, timeout = 10)

x <- data.frame(x = 1)
x_test <- x %>% copy_to(con, ., "x_test", temporary = TRUE)
x_test %>% distinct() %>% show_query()
#> <SQL>
#> SELECT DISTINCT *
#> FROM "x_test"
x_test %>% distinct() %>% sql_render()
#> <SQL> SELECT DISTINCT *
#> FROM "x_test"
x_test %>% distinct()
#> Error: nanodbc/nanodbc.cpp:1655: HY000: [Teradata][ODBC Teradata Driver][Teradata Database](-6916)TOP N Syntax error: Top N option is not supported with DISTINCT option. 
#> <SQL> 'SELECT DISTINCT TOP 11 *
#> FROM "x_test"'

The issue seems to arise from the fact that the sql_clause_select statement always thinks that there is a value for top, so it always inserts the "TOP 11" statement, even though you cannot both have DISTINCT and TOP in the same teradata statement. The easiest fix I have found is to create a new sql_clause_select for Teradata and to add to the (!is.null(top)) a second statement of !distinct, which pulls out the TOP line.

The easiest way for this to work would be to have sql_clause_select be a generic, and have teradata have its own class, but this might be overkill. Also if no one else has had this problem, it may just be our teradata version.

@hadley
Copy link
Member

hadley commented Apr 28, 2022

Minimal reprex:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

db <- lazy_frame(x = 1, y = 2, con = simulate_teradata())
db %>% distinct() %>% head()
#> <SQL>
#> SELECT DISTINCT TOP 6 *
#> FROM `df`

Created on 2022-04-28 by the reprex package (v2.0.1)

The problem is that when you print a lazy frame, it automatically limits the result to try and limit the amount of computation that the database performs. Unless there's a better way to limit the result size in teradata, I'd suggest that sql_query_select.Teradata warn if distinct is TRUE and top is not NULL, and ignore the value of top.

The docs say:

You cannot specify these options in a SELECT statement that specifies the TOP n operator:

  • DISTINCT option
  • QUALIFY clause
  • SAMPLE clause
  • WITH clause

So it's not just your version.

@hadley hadley added feature a feature request or enhancement backend 🕺 labels Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend 🕺 feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants