Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pmin and pmax fail in mutate connected to SQL Server #118

Closed
move bot opened this issue Jun 25, 2018 · 11 comments
Milestone

Comments

@move
Copy link

@move move bot commented Jun 25, 2018

@jnolis commented on Nov 28, 2017, 6:36 PM UTC:

Occasionally, I will need to use pmin or pmax to replace values above or below a threshold, for instance:

require(dplyr)
data_frame(X = 1:10) %>%
  mutate(Y = pmax(X,5))

However, when I use dbplyr to try and do this using a SQL Server table, I get the following error:

tbl(con,"OrderInfo") %>%
  mutate(Test = pmax(Revenue,0))
Error: <SQL> 'SELECT  TOP 10 "CustomerId", "OrderId", "ClientOrderId", "Channel", "OrderDate", "Revenue", "Cost", MAX("Revenue", 0.0) AS "Test"
FROM "OrderInfo"'
  nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]The MAX function requires 1 argument(s). 

As you can see, the SQL code generator assumes that I want to do a pairwise MAX, however SQL Server does not support that (only taking the maximum over a single column).

A cursory Stack Overflow search suggests that to do this operation a case statement could be used. I think either the SQL code generator should either be updated to do this, or an error should be generated at the time of calling the mutate.

This issue was moved by krlmlr from tidyverse/dplyr/issues/3227.

@move

This comment has been minimized.

Copy link
Author

@move move bot commented Jun 25, 2018

@edgararuiz commented on Dec 2, 2017, 3:31 AM UTC:

You're correct, pmax and pmin is not currently supported. So this is a feature request. Thank you for the SO link.

@move

This comment has been minimized.

Copy link
Author

@move move bot commented Jun 25, 2018

@DavisVaughan commented on Dec 23, 2017, 1:04 PM UTC:

@edgararuiz, You guys could take the approach I used for Access. It only works for two columns, but that is the most common case.

                   # pmin/pmax for 2 columns
                   pmin          = function(x, y) {
                     sql_expr(iif(UQ(x) <= UQ(y), !!x, !!y))
                   },

                   pmax          = function(x, y) {
                     sql_expr(iif(UQ(x) <= UQ(y), !!y, !!x))
                   },
@move

This comment has been minimized.

Copy link
Author

@move move bot commented Jun 25, 2018

@krlmlr commented on Jan 17, 2018, 9:14 AM UTC:

@DavisVaughan: Macro expansion is a slippery slope, what if x and y are complex expressions?

Out of curiosity: Which databases actually support pairwise MIN() and MAX()? Sqlite3 does, Postgres and MySQL don't. Maybe we should just remove the default translation of pmin and pmax and enable it only for the databases that support this?

@move

This comment has been minimized.

Copy link
Author

@move move bot commented Jun 25, 2018

@edgararuiz commented on Jan 17, 2018, 1:45 PM UTC:

You're right, we would enable it only for those who support it. I believe can also do pmin and pmax

@move

This comment has been minimized.

Copy link
Author

@move move bot commented Jun 25, 2018

@DavisVaughan commented on Jan 17, 2018, 2:46 PM UTC:

For completeness, Postgres and MySQL have GREATEST / LEAST which do what you want.
dbGetQuery(con, "SELECT GREATEST(drat, wt) FROM mtcars").
Works with any number of columns like pmax / pmin. This should probably be implemented right? (No such function for MSSQL)

Docs for Postgres at the bottom here

Docs for MySQL here

A stack overflow Q/A confirming the equivalence of Postgres GREATEST and SQLite pairwise max() here.

@move

This comment has been minimized.

Copy link
Author

@move move bot commented Jun 25, 2018

@hadley commented on Jun 7, 2018, 11:43 PM UTC:

Let's remove it from the default translation, and add to Postgres, MySQL, and SQLite.

Is it worth making the default translation an error?

@move

This comment has been minimized.

Copy link
Author

@move move bot commented Jun 25, 2018

@krlmlr commented on Jun 18, 2018, 8:46 AM UTC:

If we don't translate pmin() and pmax() (and apply dbplyr's default passthrough for unknown functions), the database gives a helpful error message. It didn't work before, so by changing the error message we don't introduce any unwanted behavior, I think.

@jyypma

This comment has been minimized.

Copy link

@jyypma jyypma commented Jul 30, 2018

For reference, Oracle also uses LEAST and GREATEST.
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions060.htm

@jyypma

This comment has been minimized.

Copy link

@jyypma jyypma commented Jul 30, 2018

LEAST and GREATEST in PostgreSQL (and perhaps also MySQL and Oracle) may not provide the same implementation as pmin and pmax in R in the presence of NA values. See for instance the example below

dat <- data.frame(a = c(1, 2, 3, 4, 5, 6, 7, NA, NA), 
                  b = c(4, 5, 6, 1, 2, 3, NA, 8, NA))
dat %>% mutate(maxval = pmax(a, b), minval = pmin(a, b))
   a  b maxval minval
1  1  4      4      1
2  2  5      5      2
3  3  6      6      3
4  4  1      4      1
5  5  2      5      2
6  6  3      6      3
7  7 NA     NA     NA
8 NA  8     NA     NA
9 NA NA     NA     NA

The same result with LEAST and GREATEST from PostgreSQL gives

tbl_ref %>% mutate(maxval = sql("GREATEST(a,b)"), minval = sql("LEAST(a,b)"))
# Source:   lazy query [?? x 4]
# Database: postgres 10.0.3
      a     b maxval minval
  <dbl> <dbl>  <dbl>  <dbl>
1     1     4      4      1
2     2     5      5      2
3     3     6      6      3
4     4     1      4      1
5     5     2      5      2
6     6     3      6      3
7     7    NA      7      7
8    NA     8      8      8
9    NA    NA     NA     NA
@hadley

This comment has been minimized.

Copy link
Member

@hadley hadley commented Jan 2, 2019

We should probably make least() and greatest() the default translation. @jyypma In general, we can't provide 100% equivalent for NA vs NULL behaviour so the minor differences are not a problem.

@hadley hadley added this to the v1.4.0 milestone Jan 9, 2019
@hadley hadley closed this in 7667461 Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.