Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for quantile() #169

Closed
halldc opened this issue Sep 26, 2018 · 15 comments
Closed

Add support for quantile() #169

halldc opened this issue Sep 26, 2018 · 15 comments
Labels
feature func trans 🌍 help wanted ❤️ tidy-dev-day 🤓
Milestone

Comments

@halldc
Copy link

@halldc halldc commented Sep 26, 2018

Would it be possible to add support for the quantile() function?

Quite a few databases support PERCENTILE_CONT and PERCENTILE_DISC, which I think could make this possible (e.g. Google BigQuery, PostgreSQL, and Redshift).

I'd be willing to help of course, but would need some pointers on where to start.

@Prometheus77

This comment has been minimized.

@hadley hadley added feature help wanted ❤️ func trans 🌍 labels Jan 2, 2019
@hadley

This comment has been minimized.

@hadley hadley added this to the v1.4.0 milestone Jan 9, 2019
@edavidaja

This comment has been minimized.

@edavidaja

This comment has been minimized.

@hadley

This comment has been minimized.

@krlmlr

This comment has been minimized.

@hadley

This comment has been minimized.

@batpigandme batpigandme added the tidy-dev-day 🤓 label Jan 19, 2019
@halldc

This comment has been minimized.

@edavidaja

This comment has been minimized.

@hadley
Copy link
Member

@hadley hadley commented Feb 6, 2019

@edavidaja that is perfect, thank you!

@hadley
Copy link
Member

@hadley hadley commented Feb 6, 2019

A few more tweaks to make it a bit easier for me to parse:

Aggregation function:

Window function:

  • sql-server: PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY x)
  • postgres: PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY x)
  • redshift: PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY x)
  • salesforce: PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY x)
  • mariadb: PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY x)

@hadley
Copy link
Member

@hadley hadley commented Feb 6, 2019

For the databases that support the PERCENTILE_CONT() window function, I don't see how we can use this in aggregation (i.e. summarise()) context.

The SQL server docs seem to be the only place that addresses this problem and suggests using DISTINCT:

SELECT DISTINCT DepartmentName  
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY BaseRate)  
    OVER (PARTITION BY DepartmentName) AS MedianCont  
,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY BaseRate)  
    OVER (PARTITION BY DepartmentName) AS MedianDisc  
FROM dbo.DimEmployee;  

So maybe the best we can do is supply translations for those as window functions, and then suggest the user use distinct()?

@hadley
Copy link
Member

@hadley hadley commented Feb 6, 2019

Oops, those aren't window functions, but are "ordered-set aggregate" functions, and they work just fine with GROUP BY:

library(DBI)

con <- dbConnect(RPostgres::Postgres())
dbGetQuery(con, "
  SELECT cyl, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY mpg) 
  FROM mtcars 
  GROUP BY cyl  
")
#>   cyl percentile_cont
#> 1   4            26.0
#> 2   6            19.7
#> 3   8            15.2

Created on 2019-02-06 by the reprex package (v0.2.1.9000)

@halldc
Copy link
Author

@halldc halldc commented Feb 6, 2019

Thanks @hadley! 🎉

@hadley
Copy link
Member

@hadley hadley commented Mar 17, 2019

Looks like the link to teradata was actually to teradata's distribution of presto. It seems like teradata actually is ansi compliant: https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/RgAqeSpr93jpuGAvDTud3w

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature func trans 🌍 help wanted ❤️ tidy-dev-day 🤓
Projects
None yet
Development

No branches or pull requests

6 participants