Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_count overrides a column named n when counting a different column #4519

Closed
gsimchoni opened this issue Aug 5, 2019 · 1 comment
Closed

Comments

@gsimchoni
Copy link

@gsimchoni gsimchoni commented Aug 5, 2019

Hi,

I think this relates to #3838 #4076 #4284 , but none of them describes this unexpected behavior which changed and broke my code when I moved to version 0.8.3 (latest on CRAN):

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tibble(n = c(1,1,1,2,3)) %>% add_count(n)
#> # A tibble: 5 x 2
#>       n    nn
#>   <dbl> <int>
#> 1     1     3
#> 2     1     3
#> 3     1     3
#> 4     2     1
#> 5     3     1
tibble(a = c(1,1,1,2,3), n = c(1,2,2,3,4)) %>% add_count(a)
#> # A tibble: 5 x 2
#>       a     n
#>   <dbl> <int>
#> 1     1     3
#> 2     1     3
#> 3     1     3
#> 4     2     1
#> 5     3     1

Created on 2019-08-05 by the reprex package (v0.3.0)

I understand why in #3838 Winston suggested for count():

to name the column to nn only if there's another column n that would be present in the output

The problem is that with add_count() the user relies on an added column of counts. The first example shows how it adds nicely a nn column when it counts a n column. The second example shows how it overrides an existing non-related n column, when the user specifically asked to add a count column, not to override. This was different in previous versions.

Obviously I can not call that column n or pass an argument through name=:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tibble(a = c(1,1,1,2,3), n = c(1,2,2,3,4)) %>% add_count(a, name = "foo")
#> # A tibble: 5 x 3
#>       a     n   foo
#>   <dbl> <dbl> <int>
#> 1     1     1     3
#> 2     1     2     3
#> 3     1     2     3
#> 4     2     3     1
#> 5     3     4     1

Created on 2019-08-05 by the reprex package (v0.3.0)

But it still breaks code, and it gets cumbersome for example when you want to chain counts, like so:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
tibble(a = c(1,1,1,2,3), b = c(1,2,2,3,4)) %>% count(a, b) %>% add_count(a)
#> # A tibble: 4 x 3
#>       a     b     n
#>   <dbl> <dbl> <int>
#> 1     1     1     2
#> 2     1     2     2
#> 3     2     3     1
#> 4     3     4     1

Created on 2019-08-05 by the reprex package (v0.3.0)

Why would it override the first n?

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Nov 28, 2019

Because name = "n" is the default, the magic never-ending appending "n" was not really predictable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants