Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.1.0 - A major restructuring of the package #41

Merged

Conversation

venpopov
Copy link
Owner

@venpopov venpopov commented Feb 1, 2024

Summary

A major restructuring of the package to support stable and generalizable development of future models. Prior to this, models were coded as a long series of if else statements in a single fit_model() function. This was unsustainable and prone to breakage when adding new models. The current pull request resolves this and closes #20.

The fit_model() function is abstracted away from the specifics of the models. It calls a series of new generic functions for:

  • setting model fitting options: configure_options()
  • validating the requested model: check_model()
  • validating the user-supplied formula: check_formula()
  • validating the user-supplied data, and performing data-preprocessing according to the selected model requirements: check_data()
  • constructing the model family, formula and prior according to the selected model: configure_model()
  • combining the user prior with the default prior for the model: combine_prior()

To achieve this, model's are coded as S3 classes of the form .model_modelname(). Each models has several class attributes, from most generic to most specific.

  • The check_model() function retrieves the corresponding model class based on a character input (e.g. for input model='IMMfull' it retrieves the object created by .model_IMMfull().
  • when this object is passed to the check_data(), and configure_model() S3 generic methods, they call class specific methods for the data preprocessing and model configuration
  • reduced code redunancy by having models share certain classes (e.g., part of the data preprocessing and argument checking for non_targets is the same for the 3p model and the IMM family of models

Additionally, multiple new utility functions improve the code readibility and stability.

Adding new models to the package will not require only to:

  • create the model class (e.g. .model_mynewmodel())
  • create S3 methods for check_data.mynewmodel() and configure_model.mynewmodel()
  • adding unit tests for the new model

This should ensure that any new model additions will not affect the existing ones.

Other

  • deprecated model_type in facor of model argument to fit_model()
  • add supported_models() function to list currently supported models

Tests

Added extensive new unit tests for most of the functionality in the package to ensure stability.

[x] Confirm that all tests passed
[x] Confirm that devtools::check() produces no errors

Release notes

  • New framework for future model development ensures package stability
  • model_type argument to fit_model() is deprecated. Please use model instead.
  • View which models are currently supported by bmm via supported_models()

@venpopov venpopov added the PR - minor Pull-request should update minor version label Feb 1, 2024
@venpopov venpopov added this to the 1.0.0 milestone Feb 1, 2024
@venpopov venpopov linked an issue Feb 1, 2024 that may be closed by this pull request
@venpopov venpopov linked an issue Feb 1, 2024 that may be closed by this pull request
# if there is setsize 1 in the data, set constant prior over thetant for setsize1
if ((1 %in% data$ss_numeric) && !is.numeric(data[[setsize_var]])) {
prior <- prior +
brms::prior_("constant(-100)", class="b", coef = paste0(setsize_var, 1), nlpar="thetant")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no error here, and given the current specification of the models using the 0 + setsize coding this should work fine. But we will have to find a more general solution should users specify the formula including an intercept and still have set size 1 in their data.
Of the top of my head I did not come up with a simple and generalizable solution, because a lot of this depends on the contrasts that are defined on the set size variable. For now, we should probably include a check if an intercept is included when set sizes vary and throw a warning. What do you think?

# if there is setsize 1 in the data, set constant prior over thetant for setsize1
if ((1 %in% data$ss_numeric) && !is.numeric(data[[setsize_var]])) {
prior <- prior +
brms::prior_("constant(0)", class="b", coef = paste0(setsize_var, 1), nlpar="a")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the comment above.

# if there is setsize 1 in the data, set constant prior over thetant for setsize1
if ((1 %in% data$ss_numeric) && !is.numeric(data[[setsize_var]])) {
prior <- prior +
brms::prior_("constant(0)", class="b", coef = paste0(setsize_var, 1), nlpar="s")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again. Probably throw same warning regarding the formula specification.

if ((1 %in% data$ss_numeric) && !is.numeric(data[[setsize_var]])) {
prior <- prior +
brms::prior_("constant(0)", class="b", coef = paste0(setsize_var, 1), nlpar="a")+
brms::prior_("constant(0)", class="b", coef = paste0(setsize_var, 1), nlpar="s")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

#' @export
check_data.vwm <- function(model, data, formula, ...) {
resp_name <- get_response(formula$formula)
if (max(abs(data[[resp_name]]), na.rm=T) > 10) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already discussed that. And I just wanted to note it down here. That we wanted to come up with a more general way of testing the scaling of the response variable.

Copy link
Collaborator

@GidonFrischkorn GidonFrischkorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective these changes all look good. I made some minor comments and created some issues for enhancements and generalizations.

Thanks for the great work!

}

non_targets <- dots$non_targets
if (max(abs(data[,non_targets]), na.rm=T) > 10) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

}

spaPos <- dots$spaPos
if (max(abs(data[,spaPos]), na.rm=T) > 10) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, generalize scaling tests.

#' @param model A string with the name of the model supplied by the user
#' @return A function of type .model_*
#' @details the returned object is a function. To get the model object, call the
#' returned function, e.g. `get_model("2p")()`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this as an additional function to make it ease to print out some basic information about the models implemented in the package.

combine_prior <- function(config_args, user_prior) {
if (!is.null(user_prior)) {
default_prior <- config_args$prior
combined_prior <- dplyr::anti_join(default_prior, user_prior, by=c('class', 'dpar','nlpar','coef','group','resp'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clearly specify how anti_join deals with duplicate specifications of priors. In its current implementation does it prioritize default priors of user priors or vice versa? I would probably delete default priors as soon as user priors are specified for a certain parameter.

In this case, we might want to throw a warning message about the deleted default prior and communicate to users that in some cases this might compromise model identification. Or do you think this is unnecessary?


```{r}
bmm::supported_models()
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once, we have a get_model_info function, we should add here that you can print out some basic information about the model using this to get an idea about the specification and maybe also a reference to the tutorial that describes its use.

@GidonFrischkorn GidonFrischkorn merged commit 4a90040 into develop Feb 2, 2024
@GidonFrischkorn GidonFrischkorn deleted the feature/issue-20-rework-the-fit_model-function branch February 2, 2024 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR - minor Pull-request should update minor version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write automatic tests for functions Rework the fit_model function
2 participants