We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi! 👋 I'm seeing slight differences in the way ntile() distributes "remainders" across buckets in dplyr versus in the database I'm using (Redshift).
ntile()
For example, when we divide mtcars (32 rows) into 5 buckets, the two remainders are distributed into bucket 1 & 3 (arrows mine).
library(dplyr) mtcars_dplyr <- mtcars %>% mutate(bucket = ntile(mpg, 5)) %>% select(mpg, bucket) mtcars_dplyr %>% count(bucket) #> # A tibble: 5 x 2 #> bucket n #> <int> <int> #> 1 1 7 <----- #> 2 2 6 #> 3 3 7 <----- #> 4 4 6 #> 5 5 6
Created on 2020-03-17 by the reprex package (v0.2.1)
When I do the same in the database, the remainders get distributed to the buckets in order. The Oracle docs describe it this way:
The remainder values (the remainder of number of rows divided by buckets) are distributed one for each bucket, starting with bucket 1.
library(dplyr) library(dbplyr) mtcars_db <- tbl_memdb(mtcars) mtcars_db <- mtcars_db %>% mutate(bucket = ntile(mpg, 5)) %>% select(mpg, bucket) %>% collect() mtcars_db %>% count(bucket) #> # A tibble: 5 x 2 #> bucket n #> <int> <int> #> 1 1 7 <----- #> 2 2 7 <----- #> 3 3 6 #> 4 4 6 #> 5 5 6
Update: I updated RSQLite and was able to generate a reprex of the db part to.
Let me know if there's anything else I can do on my end!
The text was updated successfully, but these errors were encountered:
SQL server docs say the same thing, so this looks like a problem with our implementation.
Sorry, something went wrong.
ntile() uses database logic: larger bins come first (#5054)
7ca1c04
Fixes #4995
Wow, quick turnaround. Thanks for your work!! 🥇
Successfully merging a pull request may close this issue.
Hi! 👋 I'm seeing slight differences in the way
ntile()
distributes "remainders" across buckets in dplyr versus in the database I'm using (Redshift).For example, when we divide mtcars (32 rows) into 5 buckets, the two remainders are distributed into bucket 1 & 3 (arrows mine).
Created on 2020-03-17 by the reprex package (v0.2.1)
When I do the same in the database, the remainders get distributed to the buckets in order. The Oracle docs describe it this way:
Created on 2020-03-17 by the reprex package (v0.2.1)
Update: I updated RSQLite and was able to generate a reprex of the db part to.
Let me know if there's anything else I can do on my end!
The text was updated successfully, but these errors were encountered: