Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests (via testthat) #8

Closed
wviechtb opened this issue Sep 17, 2019 · 6 comments
Closed

Tests (via testthat) #8

wviechtb opened this issue Sep 17, 2019 · 6 comments

Comments

@wviechtb
Copy link
Owner

In #5, @kylehamilton suggested to add tests, so that we can do automated testing via testthat. Although I am all for testing, I am not sure we should do this in the context of this package. Instead, I think tests of other packages should be done by those packages themselves. So, I removed the tests that were originally added in #5, but I wanted to open up the discussion here in case there are different opinions.

@kylehamilton
Copy link
Collaborator

The only tests I could really think of that would make sense in a data package might be just checking to make sure the data is correct. In one of the test files, I had testthat check the dimensions of the data, datatype, and variable names. But if this doesn't appear to be a pressing (or existant) problem then maybe we don't need them at all. I fully admit to being ignorant in this area. Writing a lot of tests would take some time but since I'm a bit time-limited I could instead work on the documentation for the other datasets and catalog context name use etc.

@wviechtb
Copy link
Owner Author

I think checking the integrity of the data is something that the package could in principle do, but if we want to do this, we should think about how. One extreme version would be to check each variable and make sure that it is equal to what's in the data. But this becomes a bit silly -- we would then essentially have to store the data one more time in the tests. Checking the dimensions, types, and names seems more sensible. We could also check a hash of the data. This could be done via https://stat.ethz.ch/R-manual/R-devel/library/tools/html/md5sum.html or https://cran.r-project.org/web/packages/digest/index.html I haven't thought through a proper workflow for this, but I think this would in general be a nice addition.

@kylehamilton
Copy link
Collaborator

Maybe just checking md5 hash would be enough for now and if we decided to check dimensions and names later it can be added easily. Starting with checking the hash for each dataset would also take the least amount of time to set up for the datasets.

Line 10-12 are how I did checked the hash last time. https://github.com/wviechtb/metadat/blob/5993e1bc2f09ec057866f1d8c0d8171ab61b692f/tests/testthat/test_data_barone2019.R

@wviechtb
Copy link
Owner Author

Oh, I didn't see that! (sorry, I did not look through all the test files). Yes, that's it. And yes, that should automatically cover things like dimensions and names anyway, so there isn't really a need to check those separately.

@kylehamilton
Copy link
Collaborator

I assumed it would be easier to write too many tests for one dataset and then just cut out what wasn't needed and let it serve as a template. When I get some time I'll start putting something together unless I hear otherwise.

wviechtb pushed a commit that referenced this issue Oct 19, 2019
@thomased thomased mentioned this issue Feb 6, 2021
11 tasks
@wviechtb
Copy link
Owner Author

I think we can close this now. We have the md5 hash checks in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants