Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr functions change variables for no reason since dplyr_0.8.0 #4221

Closed
mariodejung opened this issue Feb 25, 2019 · 4 comments
Closed

dplyr functions change variables for no reason since dplyr_0.8.0 #4221

mariodejung opened this issue Feb 25, 2019 · 4 comments
Assignees
Labels
Milestone

Comments

@mariodejung
Copy link

@mariodejung mariodejung commented Feb 25, 2019

I couldn't thing about a better title, but I run into problems since the new dplyr release.

I narrowed it down to a single dplyr call, but it changes different variables in my script, even they are not "touched" by the dplyr call.

First of all, there is a group_by_at call, because 'if there is a column "species", I want to group by it'.
If the column does not exist, I get a warning, which was fine for me, but I don't understand the class changes for the other variables. This bringt problems downstream in my script because older functions can't handle the tibble yet.

library(dplyr)

set.seed(1)
# df <- data.frame(species=c('a','b'),
#                        Intensity=rnorm(1000, 25, 3))
df <- data.frame(Intensity=rnorm(1000, 25, 3))
class(df)
df_backup <- df

df_test <- 
  df %>% 
  dplyr::group_by_at(vars(matches('^species$'))) %>%
  dplyr::summarise(`5%`=stats::quantile(log10(Intensity),.05),
                   `50%`=stats::quantile(log10(Intensity),.50),
                   `95%`=stats::quantile(log10(Intensity),.95)) 
class(df)
class(df_test)
class(df_backup)
@batpigandme
Copy link
Member

@batpigandme batpigandme commented Feb 25, 2019

Hmm, that is strange. FFR, if you can run your example through reprex, it's helpful to see the input and the output right in the issue.

library(dplyr)

set.seed(1)
df <- data.frame(Intensity=rnorm(1000, 25, 3))
class(df)
#> [1] "data.frame"
df_backup <- df
class(df_backup)
#> [1] "data.frame"
df_test <- df %>% 
  dplyr::group_by_at(vars(matches('^species$'))) %>%
  dplyr::summarise(`5%`=stats::quantile(log10(Intensity),.05),
                   `50%`=stats::quantile(log10(Intensity),.50),
                   `95%`=stats::quantile(log10(Intensity),.95)) 
class(df)
#> [1] "tbl_df"     "tbl"        "data.frame"
class(df_test)
#> [1] "tbl_df"     "tbl"        "data.frame"
class(df_backup)
#> [1] "tbl_df"     "tbl"        "data.frame"

library(lobstr)
obj_addr(df)
#> [1] "0x7f86a3fb95d8"
obj_addr(df_backup)
#> [1] "0x7f86a3fb95d8"
obj_addr(df_test)
#> [1] "0x7f86a48584a8"

Created on 2019-02-25 by the reprex package (v0.2.1)

I added the object address code from Binding basics in Advanced R. df and df_backup are just two names bound to the same value, but that is surprising that the copy on modify doesn't preserve df_backup as it was….

binding-basics

Loading

@mariodejung
Copy link
Author

@mariodejung mariodejung commented Feb 25, 2019

I even use this call within a function and it changes the objects outside the function!

library(dplyr)

set.seed(1)
df<- data.frame(Intensity=rnorm(1000, 25, 3))
class(df)
#> [1] "data.frame"
df_backup <- df
class(df_backup)
#> [1] "data.frame"
my_plotAbundanceRank <- function(data_set) {
    quantile_df <- 
        data_set %>% 
        dplyr::group_by_at(vars(matches('^species$'))) %>%
        dplyr::summarise(`5%`=stats::quantile(log10(Intensity),.05),
                         `50%`=stats::quantile(log10(Intensity),.50),
                         `95%`=stats::quantile(log10(Intensity),.95)) 
}
print(my_plotAbundanceRank(df))
#> # A tibble: 1 x 3
#>    `5%` `50%` `95%`
#>   <dbl> <dbl> <dbl>
#> 1  1.30  1.40  1.48
class(df)
#> [1] "tbl_df"     "tbl"        "data.frame"
class(df_backup)
#> [1] "tbl_df"     "tbl"        "data.frame"

Loading

@DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Mar 4, 2019

I think we just need a shallow copy in the case when no groups are specified.

data.attr("class") = NaturalDataFrame::classes();

Basically it should do what ungroup_grouped_df() does:

DataFrame ungroup_grouped_df(DataFrame df) {

suppressPackageStartupMessages(library(dplyr))
#> Warning: package 'dplyr' was built under R version 3.5.2

x <- data.frame(y = 1)

x
#>   y
#> 1 1

dplyr::group_by(x)
#> # A tibble: 1 x 1
#>       y
#>   <dbl>
#> 1     1

x
#> # A tibble: 1 x 1
#>       y
#>   <dbl>
#> 1     1
suppressPackageStartupMessages(library(dplyr))
#> Warning: package 'dplyr' was built under R version 3.5.2

x <- data.frame(y = 1)

x
#>   y
#> 1 1

dplyr::group_by(x, y)
#> # A tibble: 1 x 1
#> # Groups:   y [1]
#>       y
#>   <dbl>
#> 1     1

x
#>   y
#> 1 1

Loading

@romainfrancois romainfrancois self-assigned this Mar 4, 2019
@romainfrancois romainfrancois added this to the 0.8.1 milestone Mar 4, 2019
@lock
Copy link

@lock lock bot commented Aug 31, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

Loading

@lock lock bot locked and limited conversation to collaborators Aug 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants