Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several related issues with fill() #1057

Closed
moodymudskipper opened this issue Nov 30, 2022 · 3 comments · Fixed by #1060
Closed

Several related issues with fill() #1057

moodymudskipper opened this issue Nov 30, 2022 · 3 comments · Fixed by #1060
Labels
feature a feature request or enhancement verb trans 🤖 Translation of dplyr verbs to SQL

Comments

@moodymudskipper
Copy link

fill() doesn't work as expected, see reprex.

library(tidyverse)
mydb <- DBI::dbConnect(RSQLite::SQLite(), "")
mydata <- tibble(id = 1:3, val = c(NA, "hello", NA))
DBI::dbWriteTable(mydb, "mydata", mydata)

# The message is a bit cryptic, it's not obvious that `arrange()` will do some
# magic since id is already sorted
# also, typo "determinstic"
dplyr::tbl(mydb, "mydata") %>% 
  tidyr::fill(val, .direction = "down")
#> Error in `tidyr::fill()`:
#> ✖ `.data` does not have explicit order.
#> ℹ Please use `arrange()` or `window_order()` to make determinstic.

# If I do use arrange, I get a warning that doesn't make sense for a R user
dplyr::tbl(mydb, "mydata") %>% 
  arrange(id) %>% 
  tidyr::fill(val, .direction = "down")
#> Warning: ORDER BY is ignored in subqueries without LIMIT
#> ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?
#> # Source:     SQL [3 x 2]
#> # Database:   sqlite 3.39.4 []
#> # Ordered by: id
#>      id val  
#>   <int> <chr>
#> 1     1 <NA> 
#> 2     2 hello
#> 3     3 hello

# window order doesn't exist, it's a {dbplyr} function, not obvious for {dplyr} user who never use {dbplyr} explicitly
dplyr::tbl(mydb, "mydata") %>% 
  window_order(id) %>% 
  tidyr::fill(val, .direction = "up")
#> Error in window_order(., id): could not find function "window_order"

# this works perfectly
dplyr::tbl(mydb, "mydata") %>% 
  dbplyr::window_order(id) %>% 
  tidyr::fill(val, .direction = "down") 
#> # Source:     SQL [3 x 2]
#> # Database:   sqlite 3.39.4 []
#> # Ordered by: id
#>      id val  
#>   <int> <chr>
#> 1     1 <NA> 
#> 2     2 hello
#> 3     3 hello

# however if I fill "up" the order is reversed
dplyr::tbl(mydb, "mydata") %>% 
  dbplyr::window_order(id) %>% 
  tidyr::fill(val, .direction = "up") 
#> # Source:     SQL [3 x 2]
#> # Database:   sqlite 3.39.4 []
#> # Ordered by: id
#>      id val  
#>   <int> <chr>
#> 1     3 <NA> 
#> 2     2 hello
#> 3     1 hello

# and updown and downup are not supported
dplyr::tbl(mydb, "mydata") %>% 
  dbplyr::window_order(id) %>% 
  tidyr::fill(val, .direction = "updown") 
#> Error in `tidyr::fill()`:
#> ! `.direction` must be one of "down" or "up", not "updown".
#> ℹ Did you mean "down"?

dplyr::tbl(mydb, "mydata") %>% 
  dbplyr::window_order(id) %>% 
  tidyr::fill(val, .direction = "downup") 
#> Error in `tidyr::fill()`:
#> ! `.direction` must be one of "down" or "up", not "downup".
#> ℹ Did you mean "down"?

# though it is achievable
dplyr::tbl(mydb, "mydata") %>% 
  dbplyr::window_order(id) %>% 
  tidyr::fill(val, .direction = "up") %>% 
  tidyr::fill(val, .direction = "down") 
#> # Source:     SQL [3 x 2]
#> # Database:   sqlite 3.39.4 []
#> # Ordered by: id
#>      id val  
#>   <int> <chr>
#> 1     1 hello
#> 2     2 hello
#> 3     3 hello

Created on 2022-11-30 with reprex v2.0.2

I think we need :

  • A better error message : ditch the reference to arrange(), something like :

Please use call dbplyr::window_order() before fill() to set an explicit row order.

  • Fix row order after .direction = "up"
  • Support "downup" and "updown"
  • A note in the tidyr doc, or a link to a help file for the tbl_lazy method, documenting the row order issue
@mgirlich
Copy link
Collaborator

mgirlich commented Nov 30, 2022

Thanks for opening the issue. I'm not quite sure about the row ordering though.

The message is a bit cryptic, it's not obvious that arrange() will do some
magic since id is already sorted

I think in general the database can order the rows however it wants. So, it's not guaranteed to be sorted.

however if I fill "up" the order is reversed

The same as above. I don't think the order is guaranteed unless you explicitly arrange the data.

But the output should probably not keep the ordered by attribute. I'll have a look into this.

@moodymudskipper
Copy link
Author

moodymudskipper commented Nov 30, 2022

Fair point about the row ordering, the function name is a bit unfortunate though if we don't reorder, since filling "up" does fill "down" visually.

Another issue, that I can't reproduce with SQLite since it does not have date columns, but that I see with Oracle.

con %>% 
    tbl(dbplyr::in_schema("myschema", "mutable")) %>% 
    dbplyr::window_order(my_date_col) %>% 
    group_by(id) %>%
    fill( my_col, .direction = "up")

Error in new_result(connection@ptr, statement, immediate) :
nanodbc/nanodbc.cpp:1412: 00000: [RStudio][OracleOCI] (3000) Oracle Caller Interface: ORA-00932: inconsistent datatypes: expected NUMBER got DATE

Strangely this does work for .direction = "down"

@mgirlich
Copy link
Collaborator

mgirlich commented Dec 1, 2022

Fair point about the row ordering, the function name is a bit unfortunate though if we don't reorder, since filling "up" does fill "down" visually.

Unfortunately, this cannot really be avoided unless we add an extra arrange() call afterwards. As you can see in the query

SELECT `id`, MAX(`val`) OVER (PARTITION BY `..dbplyr_partion_1`) AS `val`
FROM (
  SELECT
    *,
    SUM(CASE WHEN ((`val` IS NULL)) THEN 0 ELSE 1 END) OVER (ORDER BY `id` DESC ROWS UNBOUNDED PRECEDING) AS `..dbplyr_partion_1`
  FROM `mydata`
)

the order is chosen by the database. SQLite orders it "the wrong way" (visually), other databases might order it completely random. This is just the way a database works.
So, an extra note also does not make sense to me as this could basically happen with every verb.

But in my PR I fixed:

  • the error message
  • support updown and downup
  • support non-numeric columns for ordering in the up direction.

You can install it via devtools::install_github("tidyverse/dbplyr#1060") and try it out.

@hadley hadley added feature a feature request or enhancement verb trans 🤖 Translation of dplyr verbs to SQL labels Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement verb trans 🤖 Translation of dplyr verbs to SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants