Skip to content

tidy.anova fails when model contains many variables #1159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jwilliman opened this issue May 18, 2023 · 2 comments
Closed

tidy.anova fails when model contains many variables #1159

jwilliman opened this issue May 18, 2023 · 2 comments

Comments

@jwilliman
Copy link

The problem

tidy.anova provides an error message when one or more of the models being tested contains many predictors.

This is presumably due to the following lines of code:

  • line 63 mod_lines <- grep(modstr, x_attr$heading, value = TRUE)
  • line 96 mods <- sub(".*: ", "", strsplit(mod_lines, "\n")[[1]]).

These lines are used to obtain the model formula from the anova heading attributes and then try to split the heading into separate lines based upon line breaks \n. Unfortunately, extra \n are inserted throughout the formula in the headings of large models.

This could be fixed by adding mod_lines <- gsub("\n ", "", mod_lines) under line 63 to remove the extra \n.

Also, the warning message about unrecognised column names (Resid..Df, Resid..Dev, Deviance) when tidying anova.glm objects could be removed by addng these terms to the renamers object at the beginning of the code.

Reproducible example

utils::data(efc, package = "sjlabelled")

dat <- efc[complete.cases(efc),1:20]

mdl1 <- glm(c175empl ~ 1, data = dat)
mdl2 <- glm(c175empl ~ ., data = dat)

x <- stats::anova(mdl1, mdl2, test = "LRT")
attributes(x)$heading
#> [1] "Analysis of Deviance Table\n"                                                                                                                                                                                                                            
#> [2] "Model 1: c175empl ~ 1\nModel 2: c175empl ~ c12hour + e15relat + e16sex + e17age + e42dep + c82cop1 + \n    c83cop2 + c84cop3 + c85cop4 + c86cop5 + c87cop6 + c88cop7 + \n    c89cop8 + c90cop9 + c160age + c161sex + c172code + barthtot + \n    neg_c_7"

broom::tidy(x)
#> Warning in tidy.anova(x): The following column names in ANOVA output were not
#> recognized or transformed: Resid..Df, Resid..Dev, Deviance
#> Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 5, 2
@simonpcouch
Copy link
Collaborator

Thank you for the issue! Just pushed some fixes, these will be included in the next release of broom. :)

@github-actions
Copy link

github-actions bot commented Jun 3, 2023

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants