Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre-processing for mixture3p and IMM model fails if the setsize variable contains character information #97

Closed
GidonFrischkorn opened this issue Feb 15, 2024 · 5 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers
Milestone

Comments

@GidonFrischkorn
Copy link
Collaborator

When I tested the mixture3p model for the Vignette on the mixture models using the Bayes data set from the mixtur package, I noticed that the preprocessing and creating of the Index variables fails, because the variable is coded as Set Size 1, Set Size 2, Set Size 4, and Set Size 6.

We should include a notification that the setsize variable should be coded either numerically, if it is entered as a continuous predictor, or with factor labels that only use single digits. Alternatively, we could see if we can remove all alphabet letters from the set size labels to only maintain the digit information. But this would also fail, if someone uses labels such as "one", "two", "three", etc. for coding a set size factor.

@GidonFrischkorn GidonFrischkorn added the bug Something isn't working label Feb 15, 2024
@GidonFrischkorn
Copy link
Collaborator Author

if we want to remove the letters from the labels we could use: gsub('[[:alpha:] ]', '', labels)

@venpopov
Copy link
Owner

good catch. I think best to keep it simply. Requere it to be either numeric, or coercible to numeric (digits as factors). I'll add that as a stop condition and test

@venpopov venpopov self-assigned this Feb 15, 2024
@venpopov
Copy link
Owner

Wow, my bad, I recoded it that way to have nice facet labels in the plots, and since I never ran the mixture3p model, just wrote "it's the same as the 2p". That'll teach me!

Anyway, positive thing at the end, because we can catch this behavior.

@venpopov
Copy link
Owner

venpopov commented Feb 15, 2024

Now that I tested it, this is even a bigger issue - if you have a factor with labels such as "Set size 1", "...2", "...4" and "...8", the current code would translate it as set size 1,2,3, and 4 and never give an error. The model would have run, and the results would have been plausible, but still wrong!

Update: actually it wouldn't. It would just throw the same error you encountered. Not sure what I was referring to earlier, since the old code used as.numeric(as.character(...)). It would still return an NA vector causing problems down the line without identification what caused the issue, so has to be fixed anyway. But not as dramatic as I pointed it out to be. Negative values or 0 would still have been a problem though.

So we should really take this as an example of what to think about and what to test for when transofrming any data from the user.

I'll even reference this issue in the dev-notes guide, as an example of why unit testing us super important. We have a lot of tests for check_data, but I purposefully ignored checking ss_numeric as it seemed super straightforward. We should always ask "can this go wrong without producing errors", rather than "is this a complicated enough to be worth testing"

The most ironic part is that a couple of hours ago I used for the first time the test coverage tool described here (https://r-pkgs.org/testing-design.html#sec-testing-design-coverage). And I added tests for other things that were missing, but when I glanced over this line in red below, I thought "no point in testing that"! :)

image

@venpopov venpopov added this to the 1.0.0 milestone Feb 15, 2024
@venpopov venpopov pinned this issue Feb 15, 2024
@venpopov venpopov added the good first issue Good for newcomers label Feb 15, 2024
venpopov added a commit that referenced this issue Feb 15, 2024
@venpopov
Copy link
Owner

fixed by be789f6

@GidonFrischkorn GidonFrischkorn unpinned this issue Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants