Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Median translated incorrectly for SQLite #357

Closed
GrayAlex49 opened this issue Sep 10, 2019 · 11 comments
Closed

Median translated incorrectly for SQLite #357

GrayAlex49 opened this issue Sep 10, 2019 · 11 comments
Labels
feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL

Comments

@GrayAlex49
Copy link

Using dplyr to work with data from a database, the median function does not seem to be translating correctly. Mean works as expected but median does not. I'm using a mysql database but can recreate the problem in sqlite.

mf <- memdb_frame(group = c("a", "a", "b", "b", "c"), 
                  year = c(2010, 2010, 2012, 2013, 2010), 
                  var1 = 5:1,
                  var2 = 100:104)

mf %>% 
  group_by(group, year) %>% 
  summarise_at(vars(one_of("var1", "var2")), median)


mf %>% 
  group_by(group, year) %>% 
  summarise_at(vars(one_of("var1", "var2")), mean)

This error is rather unhelpful, on my actual data I was getting something like

'AS temp, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY var1) AS var1, at line 1 [1064]

@shntnu
Copy link

shntnu commented Oct 18, 2019

The problem is that PERCENTILE_CONT is not supported in SQLite

data <- data.frame(x = c(1, 2))

db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

data <- dplyr::copy_to(db, data)

dbplyr::sql_render(dplyr::summarize(data, x = mean(x)))
#> Warning: Missing values are always removed in SQL.
#> Use `mean(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> <SQL> SELECT AVG(`x`) AS `x`
#> FROM `data`

dplyr::summarize(data, x = mean(x))
#> # Source:   lazy query [?? x 1]
#> # Database: sqlite 3.29.0 [:memory:]
#>       x
#>   <dbl>
#> 1   1.5

dbplyr::sql_render(dplyr::summarize(data, x = median(x)))
#> <SQL> SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY `x`) AS `x`
#> FROM `data`

dplyr::summarize(data, x = median(x))
#> Error: near "(": syntax error

Created on 2019-10-18 by the reprex package (v0.3.0)

@shntnu
Copy link

shntnu commented Oct 19, 2019

This commit might have introduced the bug 19909fa.
This issue looks relevant #169

@GrayAlex49
Copy link
Author

My original code and error was with MariaDB, I just didn't know how to do a reprex for that. But it looks like you are right which is why my error messages were different.

@krlmlr

This comment has been minimized.

@shntnu

This comment has been minimized.

@krlmlr

This comment has been minimized.

@shntnu

This comment has been minimized.

@hadley
Copy link
Member

hadley commented Dec 13, 2019

@GrayAlex49 MariaDB does support PERCENTILE_CONT according to the documentation: https://mariadb.com/kb/en/library/percentile_cont/

MEDIAN is not built into SQLite, but it is included in the extension functions bundled by RSQLite: https://github.com/r-dbi/RSQLite/blob/master/src/vendor/sqlite3/extension-functions.c

@hadley hadley added feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL labels Dec 13, 2019
@shntnu

This comment has been minimized.

@hadley
Copy link
Member

hadley commented Sep 17, 2020

The original bug was about MySQL, but no reprex was provided. There seems to be a separate bug with SQLite where it's using PERCENTILE_CONT, but should instead be using MEDIAN.

@hadley hadley changed the title Median Translation Issue Median translated incorrectly for SQLite Sep 17, 2020
@hadley
Copy link
Member

hadley commented Sep 22, 2020

Fixed in a79c441

@hadley hadley closed this as completed Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL
Projects
None yet
Development

No branches or pull requests

4 participants