-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Degrees of freedom is incorrect in the glance function #273
Comments
Depends whether
of which there are 3 in this model. Also The omnibus F test in the summary is the equivalent of doing:
so the |
broom counts the intercept term as a degree of freedom used in the model. This matches the results from One case where this version would be moderately useful would be comparing dfs used between I agree an argument could be made for either (though as Gavin notes this version matches the current documentation), but I don't think it's worth breaking the reverse compatibility of this function to switch the meaning. |
The p.value in the glance table is a correct p.value for an F with 2, 29 df, and not for one with 3, 29 df. From the point of view of a long-time instructor, there is no question that what should be reported is the numerator df associated with the F test that is also reported (since its denominator df and its p value are also in the table). The total "model" df is not reported when reports and publications of analyses are written. As it stands the output will easily lead to mis-reporting of the numerator df by researchers, although statistical programmers may understand the nuance. The extra 1 df is associated with the intercept, but no test of the intercept is provided, so why include the df? When I use glance in a markdown document, I do the following: Manually "kludging" the df this way is something I would rather not have to waste my time teaching my students. I love the broom package for so many reasons. But right now I cannot recommend glance to my students because of this df issue. I would like to teach statistics, not R idiosyncracies. Thanks for considering this. Bruce Dudek |
I've heard from enough people that I've around on this! I'd like to update it to use the df used for the F-statistic. However, I don't want to do the update until the next minor (0.6.0) broom release since there's a chance it could break backwards compatibility (I'm nervous if people use |
I agree that pedagogically the mis-match in df is a problem. Particularly because the In every text or help page I've read, I don't see Oh... just saw @dgrtwo response.... Thank you!!! |
I just ran into this problem myself while teaching an applied regression class. I think there are arguments to be made for both interpretations of How about giving |
Related to #212 / possible duplicate. |
I still see this issue in broom 0.5.1. |
Can you provide a reprex? On the dev version of library(broom)
fit <- lm(mpg ~ disp + hp, data = mtcars)
summary(fit)
#>
#> Call:
#> lm(formula = mpg ~ disp + hp, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4.7945 -2.3036 -0.8246 1.8582 6.9363
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 30.735904 1.331566 23.083 < 2e-16 ***
#> disp -0.030346 0.007405 -4.098 0.000306 ***
#> hp -0.024840 0.013385 -1.856 0.073679 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.127 on 29 degrees of freedom
#> Multiple R-squared: 0.7482, Adjusted R-squared: 0.7309
#> F-statistic: 43.09 on 2 and 29 DF, p-value: 2.062e-09
anova(fit)
#> Analysis of Variance Table
#>
#> Response: mpg
#> Df Sum Sq Mean Sq F value Pr(>F)
#> disp 1 808.89 808.89 82.7454 5.406e-10 ***
#> hp 1 33.67 33.67 3.4438 0.07368 .
#> Residuals 29 283.49 9.78
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
glance(fit)[c("df", "df.residual")]
#> # A tibble: 1 x 2
#> df df.residual
#> <dbl> <int>
#> 1 2 29
packageVersion("broom")
#> [1] '0.5.2.9001' Created on 2019-04-08 by the reprex package (v0.2.1) I'm not sure what example you're referring to. |
I get df = 3 on this example using the latest broom from CRAN:
Tried installing latest broom from GitHub, but got an error (probably not broom-related). Was this fixed since 0.5.2? |
The fix is in the dev version only at the moment. Will try to fix the
install issue today.
…On Tue, Apr 9, 2019, 7:49 AM Jonas Kristoffer Lindeløv < ***@***.***> wrote:
I get df = 3 on this example using the latest broom from CRAN:
> packageVersion('broom')
[1] ‘0.5.2’
> fit = lm(mpg ~ disp + hp, data=mtcars)
> broom::glance(fit)
# A tibble: 1 x 11
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
<dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
1 0.748 0.731 3.13 43.1 0.00000000206 3 -80.3 169. 174. 283. 29
Tried installing latest broom from GitHub, but got an error (probably not
broom-related). Was this fixed since 0.5.2?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#273 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AOYTa1dqk6kefmsBFHka9KSWelJTarC-ks5vfIxGgaJpZM4Ru1UJ>
.
|
My students just pointed out the same issue: mod <- lm(mpg ~ cyl + disp, data = mtcars)
summary(mod) # 2 df overall test
#>
#> Call:
#> lm(formula = mpg ~ cyl + disp, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4.4213 -2.1722 -0.6362 1.1899 7.0516
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 34.66099 2.54700 13.609 4.02e-14 ***
#> cyl -1.58728 0.71184 -2.230 0.0337 *
#> disp -0.02058 0.01026 -2.007 0.0542 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.055 on 29 degrees of freedom
#> Multiple R-squared: 0.7596, Adjusted R-squared: 0.743
#> F-statistic: 45.81 on 2 and 29 DF, p-value: 1.058e-09
broom::glance(mod) # returns df = 3
#> # A tibble: 1 x 11
#> r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.760 0.743 3.06 45.8 1.06e-9 3 -79.6 167. 173.
#> # … with 2 more variables: deviance <dbl>, df.residual <int> Created on 2020-04-03 by the reprex package (v0.3.0) |
The |
Fixed with the release of 0.7.0 on CRAN! |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Note that the
summary
function correctly shows 2 and 29 degrees of freedom for the F, but theglance
function returns 3 and 29.The text was updated successfully, but these errors were encountered: