Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all.equal has no tolerance #332

Closed
ChadGoymer opened this issue Mar 18, 2014 · 8 comments
Closed

all.equal has no tolerance #332

ChadGoymer opened this issue Mar 18, 2014 · 8 comments
Assignees
Labels
Milestone

Comments

@ChadGoymer
Copy link

@ChadGoymer ChadGoymer commented Mar 18, 2014

The all.equal.data.frame function needs to have tolerance for numeric columns. For instance:

df1 <- data.frame(x = c(1/2, 1/3, 1/4, 1/5, 1/6))
df2 <- data.frame(x = c(0.5, 0.333333333, 0.25, 0.2, 0.166666666))
all.equal(df1, df2) 
# [1] TRUE

library(dplyr)
all.equal(df1, df2)
# [1] "Rows in x but not y: 5, 2Rows in y but not x: 2, 5"

Importing dplyr into a package with tests using equals() from testthat causes them to fail if tolerance is required.

@hadley hadley added this to the v0.2 milestone Mar 18, 2014
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Mar 20, 2014

Hmm. I think this is hard in the case where we ignore row order because what we do is fill a hash map. and the hash value of two double that are close is not necessarily close:

> Rcpp::cppFunction( "size_t hash_double(double x){ return std::hash<double>()(x) ; }" )
> all.equal( 0.0, 0.00000001 )
[1] TRUE
> hash_double(0.00000001)
[1] 4487126258331716608

So tolerance + ignoring row order would be quite hard.

It would be quite easy however to implement tolerance for the case where we are not ignoring row order.

@hadley
Copy link
Member

@hadley hadley commented Mar 20, 2014

We probably shouldn't define all.equal.data.frame since we own neither the generic or the call. But it is nice for testing since then expect_equal becomes much more useful.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Mar 31, 2014

Could we back off and use expect_true instead of expect_equal which implicitly uses our all.equal.data.frame.

In the case of ignore_row_order==FALSE should I change the code so that it checks equality with a tolerance ?

Perhaps a workaround would be to use a std::map instead of an unordered_map. A map might be less efficient but it would only need us to define equality and comparison operators, not hashing, and I can definitely make these operators aware of tolerance. So at the expense of some performance, we could have a all.equal that is tolerant.

@ChadGoymer
Copy link
Author

@ChadGoymer ChadGoymer commented Mar 31, 2014

I think that if a new version of a base function is implemented in a package it should preserve the default behaviour, otherwise it can cause unexpected side-effects for the user. Maybe the default values for the new arguments could be set so that if they remain unchanged the base all.equal.data.frame is invoked. Then equal_data_frame() is only used if ignore_row_order, ignore_col_order or convert is explicitly declared. Alternatively, the function could be implemented as a method for the class tbl_df, i.e. all.equal.tbl_df.

Just a few suggestions. I think dplyr is great package, by the way. Good work!

@hadley
Copy link
Member

@hadley hadley commented Apr 1, 2014

I think we should change all.equal.data.frame to all.equal.tbl_df and leave the existing C++ code as is.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 1, 2014

easy enough, I'll look into what it would change in our testing.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 1, 2014

What about presenting a data.frame and a tbl_df to all.equal. What should happen ? Should all.equal be made S4 generic ?

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 1, 2014

Had to make a few changes to our tests and also have the all.equal.tbl_dt method. I think this closes this issue.

krlmlr pushed a commit to krlmlr/dplyr that referenced this issue Mar 2, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants