Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redshift Incorrect SQL Translation for stringr::str_sub #327

Closed
samstiyer opened this issue Jun 22, 2019 · 3 comments
Closed

Redshift Incorrect SQL Translation for stringr::str_sub #327

samstiyer opened this issue Jun 22, 2019 · 3 comments
Labels
feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL

Comments

@samstiyer
Copy link

samstiyer commented Jun 22, 2019

Hi, since this is my first time ever posting a github issue I spoke with @edgararuiz to get sign off before posting, apologies if I did not follow best practices!

A co-worker and I found a translation issue when working with Amazon Redshift for stringr::str_sub(). The SQL translation is to SUBSTR, which is an unsupported Redshift function. Instead, the translation ought to be to SUBSTRING, which is supported.

Example

Table

# Creating Example Data
dbSendStatement(conn = redshift_connection, 
                statement = 
                "
                create temporary table category_stage
                (
                  catid smallint default 0,
                  catgroup varchar(10) default 'General',
                  catname varchar(10) default 'General',
                  catdesc varchar(50) default 'General'

                );"
)
                
# Inserting Example Values
dbSendStatement(conn = redshift_connection, 
                statement = 
                  "
                insert into category_stage values
                (12, 'Concerts', 'Comedy', 'All stand-up comedy performances');
"
)

stringr::str_sub - show_query() example

category_stage_db <- tbl(redshift_connection, "category_stage")


category_stage_db %>% 
  mutate(catgroup_sub_string = str_sub(catgroup, 1, 5)) %>% 
  show_query()

Result

<SQL>
SELECT "catid", "catgroup", "catname", "catdesc", SUBSTR("catgroup", 1, 5) AS "catgroup_sub_string"
FROM "category_stage"

stringr::str_sub Executing Query

category_stage_db %>% 
  mutate(catgroup_sub_string = str_sub(catgroup, 1, 5))

Result

Error: <SQL> 'SELECT "catid", "catgroup", "catname", "catdesc", SUBSTR("catgroup", 1, 5) AS "catgroup_sub_string"
FROM "category_stage"
LIMIT 11'
  nanodbc/nanodbc.cpp:1587: XX000: [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState XX000] ERROR:  SUBSTR() function is not supported (Hint: use SUBSTRING instead)
DETAIL:  
  -----------------------------------------------
  error:  SUBSTR() function is not supported (Hint: use SUBSTRING instead)
  code:      8001
  context:   
  query:     4141058
  location:  cg_expr.cpp:1970
  process:   padbmaster [pid=14742]
  -----------------------------------------------

str_sub & SUBSTRING comparison

Further, L is required when using the supported SUBSTRING in a mutate statement and the arguments are the index of the string you wish to start at and the number of characters after the start. This is different than stringr::str_sub():

stringr::str_sub(string, start, end)

SUBSTRING(string, start, numberofcharacters)

stringr::str_sub()

stringr::str_sub("Concerts", 2, 5)

Result

"once"

SUBSTRING

category_stage_db %>% 
  mutate(catgroup_sub_string = SUBSTRING(catgroup, 2L, 5L))

Result

# Source:   lazy query [?? x 2]
# Database: Redshift 8.0.2 [redshift_connection user & url]
  catgroup catgroup_sub_string
  <chr>    <chr>              
1 Concerts oncer     

Example of not using L in SUBSTRING

category_stage_db %>% 
  mutate(catgroup_sub_string = SUBSTRING(catgroup, 3, 2))
Error: <SQL> 'SELECT "catid", "catgroup", "catname", "catdesc", SUBSTRING("catgroup", 3.0, 2.0) AS "catgroup_sub_string"
FROM "category_stage"
LIMIT 11'
  nanodbc/nanodbc.cpp:1587: XX000: [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState XX000] ERROR:  Not implemented
DETAIL:  
  -----------------------------------------------
  error:  Not implemented
  code:      1001
  context:   'false' - Use of complex SQL function substring() is not supported 
  query:     4141142
  location:  cg_expr.cpp:309
  process:   padbmaster [pid=14742]
  -----------------------------------------------

Hopefully this was helpful!

@samstiyer samstiyer changed the title Redshift Incorrect SQL Translation for stringr::sub_string Redshift Incorrect SQL Translation for stringr::str_sub Jun 22, 2019
@hadley
Copy link
Member

hadley commented Dec 13, 2019

Minimal reprex:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

lazy_frame(x = "abc", con = simulate_postgres()) %>% 
  mutate(x = str_sub(x, 1, 5))
#> <SQL>
#> SELECT SUBSTR(`x`, 1, 5) AS `x`
#> FROM `df`

Created on 2019-12-13 by the reprex package (v0.3.0)

This appears to be valid postgres code, but isn't valid for redshift. Currently redshift uses exactly the same translations as postgres, so we'd have to add a custom redshift translations before we could fix that. To make that easier could you please give me the result of class(redshift_connection)?

@hadley hadley added feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL labels Dec 13, 2019
@kmishra9
Copy link

Here you go

> class(redshift_connection)
[1] "PqConnection"
attr(,"package")
[1] "RPostgres"

@kmishra9
Copy link

kmishra9 commented Apr 2, 2020

Another thing here -- if its possible to check which of the substr and substring functions are available and use one that's available during translation, that would be helpful for avoiding errors. For whatever reason, the instance of redshift I'm connected to, only supports one of them, which can be a gotcha.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL
Projects
None yet
Development

No branches or pull requests

3 participants