Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all.equal has no tolerance #332

Closed
ChadGoymer opened this issue Mar 18, 2014 · 8 comments
Closed

all.equal has no tolerance #332

ChadGoymer opened this issue Mar 18, 2014 · 8 comments
Assignees
Labels
feature a feature request or enhancement
Milestone

Comments

@ChadGoymer
Copy link

The all.equal.data.frame function needs to have tolerance for numeric columns. For instance:

df1 <- data.frame(x = c(1/2, 1/3, 1/4, 1/5, 1/6))
df2 <- data.frame(x = c(0.5, 0.333333333, 0.25, 0.2, 0.166666666))
all.equal(df1, df2) 
# [1] TRUE

library(dplyr)
all.equal(df1, df2)
# [1] "Rows in x but not y: 5, 2Rows in y but not x: 2, 5"

Importing dplyr into a package with tests using equals() from testthat causes them to fail if tolerance is required.

@hadley hadley added this to the v0.2 milestone Mar 18, 2014
@romainfrancois
Copy link
Member

Hmm. I think this is hard in the case where we ignore row order because what we do is fill a hash map. and the hash value of two double that are close is not necessarily close:

> Rcpp::cppFunction( "size_t hash_double(double x){ return std::hash<double>()(x) ; }" )
> all.equal( 0.0, 0.00000001 )
[1] TRUE
> hash_double(0.00000001)
[1] 4487126258331716608

So tolerance + ignoring row order would be quite hard.

It would be quite easy however to implement tolerance for the case where we are not ignoring row order.

@hadley
Copy link
Member

hadley commented Mar 20, 2014

We probably shouldn't define all.equal.data.frame since we own neither the generic or the call. But it is nice for testing since then expect_equal becomes much more useful.

@romainfrancois
Copy link
Member

Could we back off and use expect_true instead of expect_equal which implicitly uses our all.equal.data.frame.

In the case of ignore_row_order==FALSE should I change the code so that it checks equality with a tolerance ?

Perhaps a workaround would be to use a std::map instead of an unordered_map. A map might be less efficient but it would only need us to define equality and comparison operators, not hashing, and I can definitely make these operators aware of tolerance. So at the expense of some performance, we could have a all.equal that is tolerant.

@ChadGoymer
Copy link
Author

I think that if a new version of a base function is implemented in a package it should preserve the default behaviour, otherwise it can cause unexpected side-effects for the user. Maybe the default values for the new arguments could be set so that if they remain unchanged the base all.equal.data.frame is invoked. Then equal_data_frame() is only used if ignore_row_order, ignore_col_order or convert is explicitly declared. Alternatively, the function could be implemented as a method for the class tbl_df, i.e. all.equal.tbl_df.

Just a few suggestions. I think dplyr is great package, by the way. Good work!

@hadley
Copy link
Member

hadley commented Apr 1, 2014

I think we should change all.equal.data.frame to all.equal.tbl_df and leave the existing C++ code as is.

@romainfrancois
Copy link
Member

easy enough, I'll look into what it would change in our testing.

@romainfrancois
Copy link
Member

What about presenting a data.frame and a tbl_df to all.equal. What should happen ? Should all.equal be made S4 generic ?

@romainfrancois
Copy link
Member

Had to make a few changes to our tests and also have the all.equal.tbl_dt method. I think this closes this issue.

krlmlr pushed a commit to krlmlr/dplyr that referenced this issue Mar 2, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants