You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are currently running Teradata 16.20.5329 so this issue may have been fixed for different versions, but thought I would bring it up.
x %>% distinct() works when simulating a teradata connection, sql_render, and show_query, however it tries to use TOP 11 when it is run in an actual teradata session. The tdplyr package from teradata overwrites the sql_query_select and the sql_clause but the code returns the same values in either package.
dsnName<-"tera_hc"
library(dplyr)
#> #> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#> #> filter, lag#> The following objects are masked from 'package:base':#> #> intersect, setdiff, setequal, union
library(dbplyr)
#> #> Attaching package: 'dbplyr'#> The following objects are masked from 'package:dplyr':#> #> ident, sql
library(tdplyr)
#> Registered S3 methods overwritten by 'tdplyr':#> method from #> sql_query_select.Teradata dbplyr#> sql_table_analyze.Teradata dbplyr#> sql_translation.Teradata dbplyrcon<-DBI::dbConnect(odbc::odbc(), dsnName, timeout=10)
x<-data.frame(x=1)
x_test<-x %>% copy_to(con, ., "x_test", temporary=TRUE)
x_test %>% distinct() %>% show_query()
#> <SQL>#> SELECT DISTINCT *#> FROM "x_test"x_test %>% distinct() %>% sql_render()
#> <SQL> SELECT DISTINCT *#> FROM "x_test"x_test %>% distinct()
#> Error: nanodbc/nanodbc.cpp:1655: HY000: [Teradata][ODBC Teradata Driver][Teradata Database](-6916)TOP N Syntax error: Top N option is not supported with DISTINCT option. #> <SQL> 'SELECT DISTINCT TOP 11 *#> FROM "x_test"'
The issue seems to arise from the fact that the sql_clause_select statement always thinks that there is a value for top, so it always inserts the "TOP 11" statement, even though you cannot both have DISTINCT and TOP in the same teradata statement. The easiest fix I have found is to create a new sql_clause_select for Teradata and to add to the (!is.null(top)) a second statement of !distinct, which pulls out the TOP line.
The easiest way for this to work would be to have sql_clause_select be a generic, and have teradata have its own class, but this might be overkill. Also if no one else has had this problem, it may just be our teradata version.
The text was updated successfully, but these errors were encountered:
The problem is that when you print a lazy frame, it automatically limits the result to try and limit the amount of computation that the database performs. Unless there's a better way to limit the result size in teradata, I'd suggest that sql_query_select.Teradata warn if distinct is TRUE and top is not NULL, and ignore the value of top.
We are currently running Teradata 16.20.5329 so this issue may have been fixed for different versions, but thought I would bring it up.
x %>% distinct() works when simulating a teradata connection, sql_render, and show_query, however it tries to use TOP 11 when it is run in an actual teradata session. The tdplyr package from teradata overwrites the sql_query_select and the sql_clause but the code returns the same values in either package.
The issue seems to arise from the fact that the sql_clause_select statement always thinks that there is a value for top, so it always inserts the "TOP 11" statement, even though you cannot both have DISTINCT and TOP in the same teradata statement. The easiest fix I have found is to create a new sql_clause_select for Teradata and to add to the (!is.null(top)) a second statement of !distinct, which pulls out the TOP line.
The easiest way for this to work would be to have sql_clause_select be a generic, and have teradata have its own class, but this might be overkill. Also if no one else has had this problem, it may just be our teradata version.
The text was updated successfully, but these errors were encountered: