Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ntile() with window_order() #1235

Closed
iangow opened this issue Apr 4, 2023 · 5 comments
Closed

Add support for ntile() with window_order() #1235

iangow opened this issue Apr 4, 2023 · 5 comments

Comments

@iangow
Copy link

iangow commented Apr 4, 2023

This request is similar to #169. It seems that dbplyr will translate ntile when given a value for order_by, but not with window_order(). This seems to be a documentation issue, as many window functions are translated nicely with window_order().

Ideally one could also do this:

earthquakes %>%
  select(place, mag) %>%
  group_by(place) %>%
  window_order(mag) %>%
  mutate(ntile = ntile(4))

But only this form seems to work:

earthquakes %>%
  select(place, mag) %>%
  group_by(place) %>%
  mutate(ntile = ntile(4, order_by = "mag"))

Note that there is a slight difference between dplyr::ntile() and ntile() in dbplyr.
The former is ntile(x = row_number(), n), whereas the latter requires the order_by argument.
The dplyr version of the above is something like:

earthquakes %>%
  select(place, mag) %>%
  collect() %>%
  group_by(place) %>%
  arrange(mag) %>%              # No `window_order()` in dplyr; see #627
  mutate(ntile = ntile(n = 4))  # No order_by in dplyr version.

(Note that #627 was closed.)

This is a feature request and seems clear enough, so I have not included a reprex, but could do so if this helped.

As an aside, the "ideal" code above produces the error message below. Perhaps a clearer error message would be an alternative.

Error in `purrr::pmap()`:
ℹ In index: 3.
ℹ With name: ntile.
Caused by error in `UseMethod()`:
! no applicable method for 'as.sql' applied to an object of class "c('double', 'numeric')"
Backtrace:
  1. base (local) `<fn>`(x)
 30. dbplyr (local) ntile(4)
 31. dbplyr::win_over(...)
 32. dbplyr::as.sql(order, con = con)
@iangow
Copy link
Author

iangow commented Apr 4, 2023

As a follow-up to the above, in other places window_order() works but order_by does not.

This works fine:

earthquakes %>%
  filter(!is.na(mag), place == 'Northern California') %>%
  group_by(place) %>%
  window_order(mag) %>%
  mutate(percentile = percent_rank())

But this produces an error:

earthquakes %>%
  filter(!is.na(mag), place == 'Northern California') %>%
  group_by(place) %>%
  mutate(percentile = percent_rank(order_by = "mag"))

Error message:

Error in `purrr::pmap()`:
ℹ In index: 2.
ℹ With name: percentile.
Caused by error in `percent_rank()`:
! unused argument (order_by = "mag")

@mgirlich
Copy link
Collaborator

The error message in the follow-up makes sense. dplyr::percent_rank() doesn't have an argument order_by, so dbplyr shouldn't add one either.
The feature request from the first post makes sense. This will be done by fixing the interface of ntile() to match dplyr.

@iangow
Copy link
Author

iangow commented Apr 11, 2023

The error message in the follow-up makes sense. dplyr::percent_rank() doesn't have an argument order_by, so dbplyr shouldn't add one either.
The feature request from the first post makes sense. This will be done by fixing the interface of ntile() to match dplyr.

That makes sense (I've never used the order_by argument). While ntile() is a bit different in SQL from in dplyr, it seems more important to keep the handling of SQL window functions consistent.

BTW, I discovered this in "translating" a book on SQL to dbplyr: https://iangow.github.io/sql_book/

So far everything has worked, while dbplyr has ranged from "barely any more difficult than SQL" to "much easier than SQL". I think dbplyr is the "killer app" that makes R easier (for me) than Python.

@mgirlich
Copy link
Collaborator

I just glimpsed over your book but it already looks like some nice work! I'm happy to help with questions or bugs you encounter when working with dbplyr 😄

@iangow
Copy link
Author

iangow commented Apr 18, 2023

I just glimpsed over your book but it already looks like some nice work! I'm happy to help with questions or bugs you encounter when working with dbplyr 😄

This is definitely a weird one.

Note I am simply translating someone else's (excellent) book code from SQL to dbplyr. I did like this example, where I think dbplyr is just easier to work with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants