Teradata Translation - Feature Request #3040

happyshows · 2017-08-22T13:41:19Z

Adding Teradata SQL Translation to dbplyr package, will start to work with Edgar on this.

edgararuiz-zz · 2017-08-23T13:44:23Z

Thank you for looking into this.

The translation should be a new R script in the dbplyr repo called db-odbc-teradata.R: https://github.com/tidyverse/dbplyr. I'd suggest to start with a copy of the MSSQL translation
(https://github.com/tidyverse/dbplyr/blob/master/R/db-odbc-mssql.R) and modify it to fit Teradata's SQL syntax. I suggest MSSQL because it looks like the syntax to select the top rows are similar for both variances.

For more detailed info about how translations work, please also review the SQL Translation article: http://db.rstudio.com/translation

I'm curious, at this point, when you attempt to use dplyr with a Teradata connection, where does it fail? It seems that the SQL statements are standard enough where some operations should work, is that correct?

happyshows · 2017-08-23T13:53:02Z

I don't have a way to dig deeper as the tbl command failed in the first place.

I believe the S3 dispatch assigned the tbl function to default with uses LIMIT syntax.

> tbl(conn,'FISCAL_DAY')
Error in new_result(connection@ptr, statement) : 
  nanodbc/nanodbc.cpp:1344: 42000: [Teradata][ODBC Teradata Driver][Teradata Database] Syntax error: expected something between the word 'FISCAL_DAY' and the 'LIMIT' keyword.

happyshows · 2017-08-23T14:01:21Z

I also tried to trick a way to pass the first step but no luck

> class(conn)
[1] "Teradata"
attr(,"package")
[1] ".GlobalEnv"
> class(conn)<-'Microsoft SQL Server'
> class(conn)
[1] "Microsoft SQL Server"
> tbl(conn,'FISCAL_DAY')
Error in UseMethod("tbl") : 
  no applicable method for 'tbl' applied to an object of class "Microsoft SQL Server"

edgararuiz-zz · 2017-08-23T14:02:48Z

Ok, yeah, the resulting SQL is something like SELECT * FROM FISCAL_DAY LIMIT 6.

I think pointing MSSQL's select function to Teradata may let us run some queries

sql_select.Teradata<- `sql_select.Microsoft SQL Server`
tbl(conn,'FISCAL_DAY')

happyshows · 2017-08-23T14:12:46Z

After importing many internal functions in dbplyr, I can run the tbl command now. So what's the suggested next step? Should I folk the dbplyr and create the R file. Any test steps I need to follow?

edgararuiz-zz · 2017-08-23T14:23:20Z

Yes, the two main functions to customize are sql_translate_env and sql_select.

As far as testing goes, I usually just have an RMarkdown with multiple dplyr code chunks to make sure it's working. For more automated testing you can use the https://github.com/rstudio/dbtest package. The package is still wip, and creates a table in the database is testing against, so you will need write access to it.

Thank you again for working on this!

olwagees · 2017-09-05T23:41:55Z

Hello,

@happyshows

I ran into this same error trying to work with teradata today and am wondering if you could help. I'm fairly new to this.

Thank you!

happyshows · 2017-09-06T00:44:45Z

@olwagees
I'm trying to get sometime to work on this. Bascially follow edgar's 1st post. Download the package, create a teradata file and rename the function based on sqlserver, it should get you connect to td fine.

edgararuiz-zz · 2017-09-06T00:51:52Z

Hi @happyshows , if your version is working fine, would you mind sending a PR our way so we can make it part of the package? I can help with the testing if that's what's holding you up.

happyshows · 2017-09-06T13:36:25Z

@edgararuiz what I did was changing the function name so that dispatch will fine correctly, but I remember some basic translation has problem, will get back to you on this next week.

edgararuiz-zz · 2017-09-06T13:40:26Z

Sounds good, thanks

dfalbel · 2017-09-26T15:37:47Z

The only problem I found was LIMIT keyword does not exist in Teradata, so we should use SAMPLE.
Following the same thing that was done for oracle I reimplemented sql_select for Teradata like this:

sql_select.Teradata<- function(con, select, from, where = NULL,
                             group_by = NULL, having = NULL,
                             order_by = NULL,
                             limit = NULL,
                             distinct = FALSE,
                             ...) {
  out <- vector("list", 7)
  names(out) <- c("select", "from", "where", "group_by", "having", "order_by",
                  "limit")
  
  out$select    <- dbplyr:::sql_clause_select(select, con, distinct)
  out$from      <- dbplyr:::sql_clause_from(from, con)
  out$where     <- dbplyr:::sql_clause_where(where, con)
  out$group_by  <- dbplyr:::sql_clause_group_by(group_by, con)
  out$having    <- dbplyr:::sql_clause_having(having, con)
  out$order_by  <- dbplyr:::sql_clause_order_by(order_by, con)
  
  # Using Sample instead of limit
  if (!is.null(limit) && !identical(limit, Inf)) {
    assertthat::assert_that(is.numeric(limit), length(limit) == 1L, limit > 0)
    out$limit <- build_sql(
      "SAMPLE ", sql(format(trunc(limit), scientific = FALSE)),
      con = con
    )
  }
  
  escape(unname(dbplyr:::compact(out)), collapse = "\n", parens = FALSE, con = con)
}

And everything works fine... @edgararuiz any chance this could be added to dbplyr?

bogdanrau · 2017-10-04T19:59:08Z

Also interested in Teradata. Following.

Fixes tidyverse/dplyr#3040

weixing777 · 2017-12-22T17:49:04Z

Hi, thank you for adding Teradata translation. I installed the development version of dbplyr (1.1.0.9000), and it fixed the head translation I had issue with. However, I don't think "not equal to" has been properly translated.

For instance, if I run the following code:

f_app_dt_flat %>%
    filter(rec_sys_id!=3, app_sbmt_dt=="2017-12-01") %>%
    select(app_num , loan_num) %>%
    show_query()

Here is the query it generated:

<SQL>
SELECT "app_num", "loan_num"
FROM "f_app_dt_flat"
WHERE (("rec_sys_id" != 3.0) AND ("app_sbmt_dt" = '2017-12-01'))

I don't think Teradata accept "!=" as "not equal to".

edgararuiz-zz · 2017-12-22T18:22:45Z

Hi @weixing777 , I have a fix for this in this branch: devtools::install_github("edgararuiz/dbplyr", ref = "fix-ter") can you take a look to confirm it works on your side before a sent a PR over? Thanks for reporting it!

weixing777 · 2017-12-22T18:29:44Z

@edgararuiz Thank you for the quick response! It works!

edgararuiz-zz · 2017-12-22T18:33:44Z

Awesome, I just opened the PR for it: tidyverse/dbplyr#61

jakefrost · 2018-02-07T21:21:09Z

Hi Edgar, thanks for all your work on the Teradata translation. One issue I've come across is that PARTITION BY window functions generated by dbplyr don't include an ORDER BY clause, but Teradata requires one for the query to run.

For example, if I run this code:

flights %>% 
  group_by(record_id, record_create_dt, acct_num) %>% 
  mutate(rn = row_number()) %>% 
  show_query()

this is SQL generated:

SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
	,row_number() OVER (PARTITION BY "record_id", "record_create_dt", "acct_num", "dep_dt") AS "rn"
FROM (SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
FROM cdw.flights) "iljxeikdep"
WHERE (("dep_dt" = '2017-01-21') AND ("origin" = 'DEN'))

However, it runs with the following error:

Error in new_result(connection@ptr, statement) : 
  nanodbc/nanodbc.cpp:1344: 42000: [Teradata][ODBC Teradata Driver][Teradata Database] Syntax error: expected something between the word 'dep_dt' and ')'.

I then tweaked the generated SQL to add an ORDER BY clause and ran it directly in Teradata:

SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
	,row_number() OVER (PARTITION BY "record_id", "record_create_dt", "acct_num", "dep_dt" ORDER BY "record_id", "record_create_dt") AS "rn"
FROM (SELECT "record_id", "record_create_dt", "acct_num", "dep_dt", "origin"
FROM cdw.flights) "iljxeikdep"
WHERE (("dep_dt" = '2017-01-21') AND ("origin" = 'DEN'))

and it ran without error.

Could an update be made where an ORDER BY clause is generated within PARTITION BY window functions, at least for tbl_teradata objects?

happyshows · 2018-02-07T22:22:16Z

@jakefrost can you file a separate issue for better visibility? Also I don't believe ORDER BY is required to pair with Partition by...

jakefrost · 2018-02-08T01:07:38Z

@happyshows Sure thing--I opened a new issue here. I did a bit more research, and you're right about it not being a requirement of Partition by. Maybe it has to do with row_number()? I'm not sure. Thanks for the response!

lock · 2018-08-07T01:16:57Z

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

hadley added database feature a feature request or enhancement labels Aug 23, 2017

edgararuiz-zz mentioned this issue Oct 23, 2017

Adds Teradata translation tidyverse/dbplyr#43

Merged

hadley added the wip work in progress label Oct 23, 2017

hadley closed this as completed in tidyverse/dbplyr#43 Oct 25, 2017

hadley pushed a commit to tidyverse/dbplyr that referenced this issue Oct 25, 2017

Adds Teradata translation (#43)

0e2075c

Fixes tidyverse/dplyr#3040

lock bot locked and limited conversation to collaborators Aug 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teradata Translation - Feature Request #3040

Teradata Translation - Feature Request #3040

happyshows commented Aug 22, 2017

edgararuiz-zz commented Aug 23, 2017

happyshows commented Aug 23, 2017

happyshows commented Aug 23, 2017 •

edited

edgararuiz-zz commented Aug 23, 2017

happyshows commented Aug 23, 2017

edgararuiz-zz commented Aug 23, 2017

olwagees commented Sep 5, 2017

happyshows commented Sep 6, 2017

edgararuiz-zz commented Sep 6, 2017

happyshows commented Sep 6, 2017

edgararuiz-zz commented Sep 6, 2017

dfalbel commented Sep 26, 2017

bogdanrau commented Oct 4, 2017

weixing777 commented Dec 22, 2017

edgararuiz-zz commented Dec 22, 2017

weixing777 commented Dec 22, 2017

edgararuiz-zz commented Dec 22, 2017

jakefrost commented Feb 7, 2018

happyshows commented Feb 7, 2018

jakefrost commented Feb 8, 2018

lock bot commented Aug 7, 2018

Teradata Translation - Feature Request #3040

Teradata Translation - Feature Request #3040

Comments

happyshows commented Aug 22, 2017

edgararuiz-zz commented Aug 23, 2017

happyshows commented Aug 23, 2017

happyshows commented Aug 23, 2017 • edited

edgararuiz-zz commented Aug 23, 2017

happyshows commented Aug 23, 2017

edgararuiz-zz commented Aug 23, 2017

olwagees commented Sep 5, 2017

happyshows commented Sep 6, 2017

edgararuiz-zz commented Sep 6, 2017

happyshows commented Sep 6, 2017

edgararuiz-zz commented Sep 6, 2017

dfalbel commented Sep 26, 2017

bogdanrau commented Oct 4, 2017

weixing777 commented Dec 22, 2017

edgararuiz-zz commented Dec 22, 2017

weixing777 commented Dec 22, 2017

edgararuiz-zz commented Dec 22, 2017

jakefrost commented Feb 7, 2018

happyshows commented Feb 7, 2018

jakefrost commented Feb 8, 2018

lock bot commented Aug 7, 2018

happyshows commented Aug 23, 2017 •

edited