Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pmin and pmax fail in mutate connected to SQL Server #118

Closed
ghost opened this issue Jun 25, 2018 · 11 comments
Closed

pmin and pmax fail in mutate connected to SQL Server #118

ghost opened this issue Jun 25, 2018 · 11 comments
Labels
bug func trans 🌍
Milestone

Comments

@ghost
Copy link

@ghost ghost commented Jun 25, 2018

@jnolis commented on Nov 28, 2017, 6:36 PM UTC:

Occasionally, I will need to use pmin or pmax to replace values above or below a threshold, for instance:

require(dplyr)
data_frame(X = 1:10) %>%
  mutate(Y = pmax(X,5))

However, when I use dbplyr to try and do this using a SQL Server table, I get the following error:

tbl(con,"OrderInfo") %>%
  mutate(Test = pmax(Revenue,0))
Error: <SQL> 'SELECT  TOP 10 "CustomerId", "OrderId", "ClientOrderId", "Channel", "OrderDate", "Revenue", "Cost", MAX("Revenue", 0.0) AS "Test"
FROM "OrderInfo"'
  nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]The MAX function requires 1 argument(s). 

As you can see, the SQL code generator assumes that I want to do a pairwise MAX, however SQL Server does not support that (only taking the maximum over a single column).

A cursory Stack Overflow search suggests that to do this operation a case statement could be used. I think either the SQL code generator should either be updated to do this, or an error should be generated at the time of calling the mutate.

This issue was moved by krlmlr from tidyverse/dplyr/issues/3227.

@ghost
Copy link
Author

@ghost ghost commented Jun 25, 2018

@edgararuiz commented on Dec 2, 2017, 3:31 AM UTC:

You're correct, pmax and pmin is not currently supported. So this is a feature request. Thank you for the SO link.

@ghost
Copy link
Author

@ghost ghost commented Jun 25, 2018

@DavisVaughan commented on Dec 23, 2017, 1:04 PM UTC:

@edgararuiz, You guys could take the approach I used for Access. It only works for two columns, but that is the most common case.

                   # pmin/pmax for 2 columns
                   pmin          = function(x, y) {
                     sql_expr(iif(UQ(x) <= UQ(y), !!x, !!y))
                   },

                   pmax          = function(x, y) {
                     sql_expr(iif(UQ(x) <= UQ(y), !!y, !!x))
                   },

@ghost
Copy link
Author

@ghost ghost commented Jun 25, 2018

@krlmlr commented on Jan 17, 2018, 9:14 AM UTC:

@DavisVaughan: Macro expansion is a slippery slope, what if x and y are complex expressions?

Out of curiosity: Which databases actually support pairwise MIN() and MAX()? Sqlite3 does, Postgres and MySQL don't. Maybe we should just remove the default translation of pmin and pmax and enable it only for the databases that support this?

@ghost
Copy link
Author

@ghost ghost commented Jun 25, 2018

@edgararuiz commented on Jan 17, 2018, 1:45 PM UTC:

You're right, we would enable it only for those who support it. I believe can also do pmin and pmax

@ghost
Copy link
Author

@ghost ghost commented Jun 25, 2018

@DavisVaughan commented on Jan 17, 2018, 2:46 PM UTC:

For completeness, Postgres and MySQL have GREATEST / LEAST which do what you want.
dbGetQuery(con, "SELECT GREATEST(drat, wt) FROM mtcars").
Works with any number of columns like pmax / pmin. This should probably be implemented right? (No such function for MSSQL)

Docs for Postgres at the bottom here

Docs for MySQL here

A stack overflow Q/A confirming the equivalence of Postgres GREATEST and SQLite pairwise max() here.

@ghost
Copy link
Author

@ghost ghost commented Jun 25, 2018

@hadley commented on Jun 7, 2018, 11:43 PM UTC:

Let's remove it from the default translation, and add to Postgres, MySQL, and SQLite.

Is it worth making the default translation an error?

@ghost
Copy link
Author

@ghost ghost commented Jun 25, 2018

@krlmlr commented on Jun 18, 2018, 8:46 AM UTC:

If we don't translate pmin() and pmax() (and apply dbplyr's default passthrough for unknown functions), the database gives a helpful error message. It didn't work before, so by changing the error message we don't introduce any unwanted behavior, I think.

@jyypma
Copy link

@jyypma jyypma commented Jul 30, 2018

For reference, Oracle also uses LEAST and GREATEST.
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions060.htm

@jyypma
Copy link

@jyypma jyypma commented Jul 30, 2018

LEAST and GREATEST in PostgreSQL (and perhaps also MySQL and Oracle) may not provide the same implementation as pmin and pmax in R in the presence of NA values. See for instance the example below

dat <- data.frame(a = c(1, 2, 3, 4, 5, 6, 7, NA, NA), 
                  b = c(4, 5, 6, 1, 2, 3, NA, 8, NA))
dat %>% mutate(maxval = pmax(a, b), minval = pmin(a, b))
   a  b maxval minval
1  1  4      4      1
2  2  5      5      2
3  3  6      6      3
4  4  1      4      1
5  5  2      5      2
6  6  3      6      3
7  7 NA     NA     NA
8 NA  8     NA     NA
9 NA NA     NA     NA

The same result with LEAST and GREATEST from PostgreSQL gives

tbl_ref %>% mutate(maxval = sql("GREATEST(a,b)"), minval = sql("LEAST(a,b)"))
# Source:   lazy query [?? x 4]
# Database: postgres 10.0.3
      a     b maxval minval
  <dbl> <dbl>  <dbl>  <dbl>
1     1     4      4      1
2     2     5      5      2
3     3     6      6      3
4     4     1      4      1
5     5     2      5      2
6     6     3      6      3
7     7    NA      7      7
8    NA     8      8      8
9    NA    NA     NA     NA

@hadley hadley added bug func trans 🌍 labels Jan 2, 2019
@hadley
Copy link
Member

@hadley hadley commented Jan 2, 2019

We should probably make least() and greatest() the default translation. @jyypma In general, we can't provide 100% equivalent for NA vs NULL behaviour so the minor differences are not a problem.

@hadley hadley added this to the v1.4.0 milestone Jan 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug func trans 🌍
Projects
None yet
Development

No branches or pull requests

2 participants