New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward all_equal() to dplyr #198

Closed
krlmlr opened this Issue Nov 27, 2016 · 11 comments

Comments

Projects
None yet
3 participants
@krlmlr
Member

krlmlr commented Nov 27, 2016

Slightly different implementation than in dplyr causes trouble (tidyverse/dplyr#2264). Throw an error if dplyr is not installed.

@krlmlr krlmlr added this to the 1.3.0 milestone Jan 4, 2017

@krlmlr

This comment has been minimized.

Member

krlmlr commented Jan 4, 2017

@hadley: Do you see a problem if tibble tries to forward to dplyr::all_equal() and fails if dplyr is not installed?

@hadley

This comment has been minimized.

Member

hadley commented Jan 4, 2017

Yeah, I think that's ok.

@hadley

This comment has been minimized.

Member

hadley commented Jan 4, 2017

Alternatively, maybe we tibble simply shouldn't define an all.equal() method? Although I guess that means you'd get different behaviour depending on whether or not dplyr was loaded, which we want to avoid.

@krlmlr

This comment has been minimized.

Member

krlmlr commented Jan 4, 2017

Oh, tibble does have its own implementation of all_equal(), but the implementation is slightly different; behavior depends if dplyr is loaded or not. We'd have to move half of dplyr's C++ code here to use the same implementation.

@krlmlr krlmlr closed this in 83fe1a6 Jan 7, 2017

krlmlr added a commit that referenced this issue Jan 9, 2017

Merge tag 'v1.2-15'
- Test R 3.1.3 and later in AppVeyor, using `Depends: R (>= 3.1.0)` in `DESCRIPTION`. Support for R 3.0.0 requires a `lazyeval` update (#189).
- An attempt to partially update a missing column now throws a clearer warning (#199).
- Time series matrices (objects of class `mts` and `ts`) are now supported in `as_tibble()` (#184).
- An attempt to call `add_row()` for a grouped data frame results in a helpful error message (#179).
- The `all_equal()` function (called by `all.equal.tbl_df()`) now forwards to `dplyr` and fails with a helpful message if not installed. Data frames with list columns cannot be compared anymore, and differences in the declard class (`data.frame` vs. `tbl_df`) are ignored. This ensures consistent behavior of this function, regardless if `dplyr` is loaded or not (#198).
- Backtick `NA` names in printing (#206, #207, @jennybc).

@krlmlr krlmlr referenced this issue Jan 18, 2017

Closed

Suggest dplyr #577

krlmlr added a commit that referenced this issue Apr 1, 2017

Merge tag 'v1.3.0'
Bug fixes
=========

- Time series matrices (objects of class `mts` and `ts`) are now supported in `as_tibble()` (#184).
- The `all_equal()` function (called by `all.equal.tbl_df()`) now forwards to `dplyr` and fails with a helpful message if not installed. Data frames with list columns cannot be compared anymore, and differences in the declared class (`data.frame` vs. `tbl_df`) are ignored. The `all.equal.tbl_df()` method gives a warning and forwards to `NextMethod()` if `dplyr` is not installed; call `all.equal(as.data.frame(...), ...)` to avoid the warning. This ensures consistent behavior of this function, regardless if `dplyr` is loaded or not (#198).

Interface changes
=================

- Now requiring R 3.1.0 instead of R 3.1.3 (#189).
- Add `as.tibble()` as an alias to `as_tibble()` (#160, @LaDilettante).
- New `frame_matrix()`, similar to `frame_data()` but for matrices (#140, #168, @LaDilettante).
- New `deframe()` as reverse operation to `enframe()` (#146, #214).
- Removed unused dependency on `assertthat`.

Features
========

General
-------

- Keep column classes when adding row to empty tibble (#171, #177, @LaDilettante).
- Singular and plural variants for error messages that mention a list of objects (#116, #138, @LaDilettante).
- `add_column()` can add columns of length 1 (#162, #164, @LaDilettante).

Input validation
----------------

- An attempt to read or update a missing column now throws a clearer warning (#199).
- An attempt to call `add_row()` for a grouped data frame results in a helpful error message (#179).

Printing
--------

- Render Unicode multiplication sign as `x` if it cannot be represented in the current locale (#192, @ncarchedi).
- Backtick `NA` names in printing (#206, #207, @jennybc).
- `glimpse()` now uses `type_sum()` also for S3 objects (#185, #186, @holstius).
- The `max.print` option is ignored when printing a tibble (#194, #195, @t-kalinowski).

Documentation
=============

- Fix typo in `obj_sum` documentation (#193, @etiennebr).
- Reword documentation for `tribble()` (#191, @kwstat).
- Now explicitly stating minimum Rcpp version 0.12.3.

Internal
========

- Using registration of native routines.
@jennybc

This comment has been minimized.

Member

jennybc commented Apr 9, 2017

Alternatively, maybe we tibble simply shouldn't define an all.equal() method? Although I guess that means you'd get different behaviour depending on whether or not dplyr was loaded, which we want to avoid.

I just experienced exactly this and it was mystifying.

In my case, dplyr is not explicitly loaded -- a deliberate choice, on my part -- and yet because I have assign("%>%", dplyr::%>%, envir = .__Rprofile.env__.) in my .Rprofile, dplyr is "loaded via a namespace (and not attached)", in the sessionInfo() sense. Obviously I need to rethink the wisdom of this line of .Rprofile!

But the result is that all.equal(tibble_1, tibble_2) is calling tibble::all.equal.tbl_df(), which forwards to dplyr::all_equal(), which apparently has different default behaviour around attributes. I am accustomed to a default of check.attributes = TRUE and expected my comparison to behave that way. But dplyr::all_equal() seems to act as if check.attributes = FALSE. Or maybe it's just oblivious to attributes.

Has this ever been considered? The tidyverse/dplyr style of tibble comparison only happens with all_equal() (with the underscore) and never silently with all.equal().

@jennybc

This comment has been minimized.

Member

jennybc commented Apr 9, 2017

Related: I believe this line in my .Rprofile is why it took me so long to sympathize with people reporting that tibble printing or indexing changed during the course of a script or interactive session, as more packages gradually got loaded. I always got tibble-style printing and indexing and therefore never personally experienced this behaviour.

@hadley

This comment has been minimized.

Member

hadley commented Apr 9, 2017

Probably the safest thing to do is explicitly forward to dplyr, with a clear error if dplyr isn't installed.

@krlmlr

This comment has been minimized.

Member

krlmlr commented Apr 9, 2017

As of tibble 1.3.0, we always forward all_equal() to dplyr::all_equal() and abort if it's not installed. The implementation of all.equal.tbl_df() forwards to dplyr if it is available, otherwise it calls NextMethod() with a warning. This is to satisfy package checks for packages that suggest tibble but not dplyr.

Recent versions of import packages (readr, haven, readxl, ...) now always import tibble, so the incremental change of behavior should be rarer now. A package that returns tibbles and wants to ensure printing and [ needs to declare it in Imports: and actively import any object in its NAMESPACE.

@jennybc

This comment has been minimized.

Member

jennybc commented Apr 10, 2017

My main point is that I expected all.equal() to catch a difference in attributes, because it has always done so in the past (meaning "for most of my R life", not a comment on recent tidyverse developments). But all.equal.tbl_df() doesn't check attributes AFAICT. And so 2 tibbles with identical data but different attributes returned TRUE, which surprised me.

So all.equal() is arguably another function that should be messaged like intersect(), setdiff(), etc. Except in this case the default behaviour isn't even the same.

library(readxl)

tib1 <- read_excel(readxl_example("datasets.xlsx"), n_max = 2)
tib2 <- tib1

## new-to-me behavior re: attributes in all.equal()
attr(tib2, "foo") <- "bar"
all.equal(tib1, tib2) # huh?
#> [1] TRUE
all.equal(as.data.frame(tib1), as.data.frame(tib2))
#> [1] "Attributes: < Names: 1 string mismatch >"                            
#> [2] "Attributes: < Length mismatch: comparison on first 2 components >"   
#> [3] "Attributes: < Component 2: Modes: numeric, character >"              
#> [4] "Attributes: < Component 2: Lengths: 2, 1 >"                          
#> [5] "Attributes: < Component 2: target is numeric, current is character >"
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.3.2 (2016-10-31)
#>  system   x86_64, darwin13.4.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_CA.UTF-8                 
#>  tz       America/Vancouver           
#>  date     2017-04-09
#> Packages -----------------------------------------------------------------
#>  package    * version     date       source                            
#>  assertthat   0.1         2013-12-06 CRAN (R 3.2.0)                    
#>  backports    1.0.5       2017-01-18 cran (@1.0.5)                     
#>  cellranger   1.1.0       2016-07-27 CRAN (R 3.3.0)                    
#>  devtools     1.12.0.9000 2017-02-09 Github (hadley/devtools@d8ab190)  
#>  digest       0.6.12      2017-01-27 cran (@0.6.12)                    
#>  dplyr        0.5.0.9001  2017-04-05 Github (hadley/dplyr@cf46c46)     
#>  evaluate     0.10        2016-10-11 cran (@0.10)                      
#>  glue         0.0.0.9000  2017-04-03 Github (tidyverse/glue@6e77c3b)   
#>  htmltools    0.3.5       2016-03-21 CRAN (R 3.2.4)                    
#>  knitr        1.15.1      2016-11-22 CRAN (R 3.3.2)                    
#>  magrittr     1.5         2014-11-22 CRAN (R 3.2.0)                    
#>  memoise      1.0.0.9001  2016-12-18 Github (hadley/memoise@884d565)   
#>  pkgbuild     0.0.0.9000  2017-03-13 Github (r-pkgs/pkgbuild@5ed87aa)  
#>  pkgload      0.0.0.9000  2017-04-03 Github (r-pkgs/pkgload@3a96cf2)   
#>  R6           2.2.0       2016-10-05 CRAN (R 3.3.0)                    
#>  Rcpp         0.12.10     2017-03-31 Github (RcppCore/Rcpp@886f5df)    
#>  readxl     * 0.1.1.9000  2017-04-09 local                             
#>  rlang        0.0.0.9006  2017-04-05 Github (hadley/rlang@0756eaf)     
#>  rmarkdown    1.4.0.9000  2017-03-31 Github (rstudio/rmarkdown@5f7cd3c)
#>  rprojroot    1.2         2017-01-16 CRAN (R 3.3.2)                    
#>  stringi      1.1.3       2017-03-21 cran (@1.1.3)                     
#>  stringr      1.2.0       2017-02-18 cran (@1.2.0)                     
#>  tibble       1.3.0       2017-04-01 cran (@1.3.0)                     
#>  withr        1.0.2       2016-06-20 cran (@1.0.2)                     
#>  yaml         2.1.14      2016-11-12 cran (@2.1.14)
@krlmlr

This comment has been minimized.

Member

krlmlr commented Apr 10, 2017

Yes, all.equal.tbl_df() is different from all.equal.data.frame(), too. By default, row or column order doesn't matter either.

@hadley: Should we compare attributes in dplyr::all_equal(), and carry this over to tibble::all.equal.tbl_df()?

@hadley

This comment has been minimized.

Member

hadley commented Apr 10, 2017

Yeah, we should attributes (probably with an option to turn off)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment