Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Median is Translated wrongly for MSSQL in summarize #1008

Closed
thothal opened this issue Sep 15, 2022 · 3 comments · Fixed by #1069
Closed

Median is Translated wrongly for MSSQL in summarize #1008

thothal opened this issue Sep 15, 2022 · 3 comments · Fixed by #1069
Labels
bug an unexpected problem or unintended behavior func trans 🌍 Translation of individual functions to SQL
Milestone

Comments

@thothal
Copy link

thothal commented Sep 15, 2022

Calculating the overall median does not yield valid SQL

library(dbplyr)
library(dplyr)
d <- lazy_frame(x = 1:10, con = simulate_mssql())
d %>% summarize(med = median(x, na.rm = TRUE))
# <SQL>
# SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY `x`) AS `med`
# FROM `df`

The proper SQL syntax would look like:

--- Set up
DROP TABLE IF EXISTS #temp;
WITH rcte (x) AS (
  SELECT 1 AS x
   UNION ALL
  SELECT x + 1
    FROM rcte
   WHERE x <= 100
) 
SELECT x
  INTO #temp
  FROM rcte;

--- SQL should look like this
SELECT DISTINCT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY x) OVER () AS med
  FROM #temp;

Thus instead of

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY `x`) AS `med`
  FROM `df`;

we would expect

SELECT DISTINCT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY `x`) OVER() AS `med`
  FROM `df`;
@hadley
Copy link
Member

hadley commented Nov 21, 2022

Could you please link to the appropriate SQL server docs for this?

@thothal
Copy link
Author

thothal commented Nov 22, 2022

PERCENTILE_CONT

The OVER clause is not optional.

@hadley hadley added bug an unexpected problem or unintended behavior func trans 🌍 Translation of individual functions to SQL labels Dec 5, 2022
@hadley hadley added this to the 2.3.0 milestone Dec 5, 2022
@hadley
Copy link
Member

hadley commented Dec 5, 2022

Ooops, looks like we already fixed this for quantile(), but we forgot about median():

https://github.com/tidyverse/dbplyr/blob/main/R/backend-mssql.R#L361

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior func trans 🌍 Translation of individual functions to SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants