Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tally() returns incorrect result on database table with n variable #3075

Closed
JohnMount opened this issue Sep 4, 2017 · 7 comments
Closed
Labels
bug an unexpected problem or unintended behavior verbs 🏃‍♀️

Comments

@JohnMount
Copy link

dbplyr issue: dplyr::tally() results on database different than on data.frame. This is a very low priority for me as I do not use tally().

suppressPackageStartupMessages(library("dplyr"))
packageVersion("dplyr")
#> [1] '0.7.2.9000'
library("sparklyr")
packageVersion("sparklyr")
#> [1] '0.6.2'
packageVersion("dbplyr")
#> [1] '1.1.0.9000'
packageVersion("DBI")
#> [1] '0.7'

db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

dLocal <- data.frame(n = 1:3)
print(dLocal)
#>   n
#> 1 1
#> 2 2
#> 3 3

dRemote <- dplyr::copy_to(db, dLocal)
print(dRemote)
#> # Source:   table<dLocal> [?? x 1]
#> # Database: sqlite 3.19.3 [:memory:]
#>       n
#>   <int>
#> 1     1
#> 2     2
#> 3     3


tally(dLocal)
#> Using `n` as weighting variable
#>   nn
#> 1  6

tally(dRemote)
#> # Source:   lazy query [?? x 1]
#> # Database: sqlite 3.19.3 [:memory:]
#>      nn
#>   <int>
#> 1     3

DBI::dbDisconnect(db)
@hadley hadley changed the title dbplyr issue: dplyr::tally() results on database different than on data.frame tally() returns incorrect result on database table with n variable Oct 23, 2017
@hadley
Copy link
Member

hadley commented Oct 23, 2017

Minimal reprex

library(dplyr, warn.conflicts = FALSE)

db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

dLocal <- data.frame(n = 1:3)
tally(dLocal)
#> Using `n` as weighting variable
#>   nn
#> 1  6

dRemote <- dplyr::copy_to(db, dLocal)
tally(dRemote)
#> # Source: lazy query [?? x 1]
#> # Database: sqlite 3.19.3 [:memory:]
#>      nn
#>   <int>
#> 1     3

Note the absence of "Using n as weighting variable`

@hadley
Copy link
Member

hadley commented Oct 23, 2017

(probably due to use of names() instead of tbl_vars() on line 64)

@hadley hadley added bug an unexpected problem or unintended behavior verbs 🏃‍♀️ labels Oct 23, 2017
@krlmlr
Copy link
Member

krlmlr commented Mar 13, 2018

There's another inconsistency between tally() and count(), which is been present in v0.5.0 already (tested with #3412):

library(tidyverse)
tibble(a = 1:3, n = 4:6) %>% group_by(a) %>% tally()
#> Using `n` as weighting variable
#> # A tibble: 3 x 2
#>       a    nn
#>   <int> <int>
#> 1     1     4
#> 2     2     5
#> 3     3     6
tibble(a = 1:3, n = 4:6) %>% count(a)
#> # A tibble: 3 x 2
#>       a    nn
#>   <int> <int>
#> 1     1     1
#> 2     2     1
#> 3     3     1

Created on 2018-03-13 by the reprex package (v0.2.0).

I propose to stop guessing the weighting variable in tally() altogether, partly because it doesn't work as expected if there's an nn variable but not an n variable. First we change the message, then we warn if there's an n variable in the data but no wt argument is used, then we stop caring about it.

@hadley
Copy link
Member

hadley commented Mar 13, 2018

Eliminating guessing is not acceptable to me because it's important that tally(tally(x)) continue to work and return the correct answer (i.e. the total count)

@krlmlr
Copy link
Member

krlmlr commented Mar 13, 2018

We agreed to not support guessing in tally(tally(tally(x))), and tally(count(x)) currently is different from tally(tally(x)) with respect to guessing. How can we make this a bit more consistent?

@romainfrancois
Copy link
Member

Fixed in #3525. Just need to move this test to dbplyr :

test_that("tally works on tbl_sql (#3075)", {
  db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

  dLocal <- data.frame(n = 1:3)
  expect_message(tally_local <- tally(dLocal), "Using `n` as weighting variable")

  dRemote <- dplyr::copy_to(db, dLocal)
  expect_message(tally_remote <- tally(dRemote), "Using `n` as weighting variable" )
  expect_equal(pull(tally_remote, nn), 6)
})

@krlmlr krlmlr closed this as completed in 12076e5 May 2, 2018
krlmlr added a commit that referenced this issue May 2, 2018
- `tally()` works correctly on non-data frame table sources such as `tbl_sql` (#3075).
@lock
Copy link

lock bot commented Oct 29, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Oct 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior verbs 🏃‍♀️
Projects
None yet
Development

No branches or pull requests

4 participants