-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
robust standard errors: will tidy.coeftest(., conf.int=TRUE) ever be possible? #663
Comments
This depends on what information the |
Well I am not sure the debate SE within model call ( SE in model is definitely much simpler for the user, and allows the function writer to decide for the user of a sound default (much of current literature says you should use robust, not homeskedastic). SE post-model philosophy is less intuitive, as user have to specifiy themselves robust errors, and may be not that easy. But I think it is much more powerful. In my opinion, a few points speak in favour of the sandwich implementation:
On the other side, the SE-within approach is pretty much re-inventing the wheel and hard-coding a few specific functionalities decided at some point in time. Generalizing this approach to the large collection of models available for tidy would require a huge work, with a potentially huge redundancy. But that said, I am not saying sandwich implementation is intuitive, easy to use, and I fully agree with principle 17. But I think sandwich forces to think in more general terms, which can only be beneficial for implementation principles! If one needs to reinvent the wheel, better to reinvent the sandwich wheel, than to have hundred new wheels for each estimator ;-) |
Philosophy aside, would you pitch an interface? I'm not entirely sure I understand what you're looking for. Can you include some example code and what you'd hope the output would be? |
I just encountered this problem, in trying to use broom together with the dynlm package. I would like to add robust standard errors to the dynamic linear model with coeftest, but the code fails if I include this row.
|
Thanks for the discussion of the issues related to interfacing Confidence intervals: In addition to Design philosophy: The whole point about sandwich covariances is that they are essentially orthogonal to parameter estimation. Thus, it makes sense to separate estimation of the regression coefficients from estimation of the corresponding covariance matrix. Most model functions hence obtain coefficient estimates along with the "usual" covariances/standard errors under the full model assumptions (often assuming a full probability distribution). Then, you can relax the assumptions without having to re-estimate the coefficients but just by adapting the covariances. Typically, you just assume that the mean function is correctly specified but the remaining probability distribution may be misspecified. Various kinds of misspecifications are supported in So while it is convenient to just add a Therefore, I feel that separating the covariance matrix estimation from the model estimation is both useful in terms of DRY coding but also for bringing out more clearly what kind of robustness is actually achieved. Let me know if you need more details on any parts of this. Best wishes, |
For implementation, I could think of two approaches:
Using this internally would also allow to have also Implementation principlesThese both approaches definitely rely on 1) it is up to user to specify the vcov (broom will never decide which vcov to use) 2) it uses sandwich for robust estimation. Unlike the |
My main objection with R approach to robust standard errors is that it leads to messy code. It is also important in teaching, as modern textbooks in econometrics focus almost exclusively on robust standard errors, even at the introductory level. My suggestion is to create a wrapper function to include the results of coeftest, coefci and waldtest in the estimation result. library(tidyverse)
library(broom)
library(sandwich)
library(lmtest)
library(huxtable)
reg1 = lm(mpg ~ wt, data = mtcars)
reg2 = robust_se(reg1, vcov. = vcovHC)
tidy(reg2)
tidy(reg2, conf.int = TRUE)
glance(reg2)
huxreg(Standard=reg1, HC=reg2, statistics = c(N = "nobs", F = "statistic", P = "p.value")) The function adds "robust_se" to the class description to allow broom to invoke robust_se = function(model, vcov. = NULL, ...)
{
model$coeftestresult = coeftest(model, vcov. = vcov., ...)
model$ci = coefci(model, vcov. = vcov., ...)
model$waldtest = waldtest(model, vcov = vcov.)
class(model) = c("robust_se", class(model))
model
}
tidy.robust_se <- function(x, conf.int = FALSE, conf.level = 0.95) {
ctab = x$coeftestresult
result = tibble(term = rownames(ctab), estimate = ctab[,1], std.error = ctab[,2], statistic = ctab[,3], p.value = ctab[,4])
if (conf.int & conf.level) {
a = (1 - conf.level) / 2
result$conf.low = x$ci[,1]
result$conf.high = x$ci[,2]
}
result
}
glance.robust_se <- function(x) {
class(x) = class(x)[2:length(class(x))]
result = glance(x)
result$statistic = x$waldtest[2,3]
result$p.value = x$waldtest[2,4]
result
}
nobs.robust_se <- function(x, ...) {
class(x) = class(x)[2:length(class(x))]
nobs(x)
} |
Thanks everyone for the discussion of design choices! My take on OOP for modeling is in flux and I appreciate all the kind discussion and patience here. My impression is that are two key issues: (1) extensibility and (2) if estimates for different estimands should reasonably live in different objects. I'm slowly writing a blog post with some takeaways from this and other discussions and would love to continue talking about this. For the sake of this issue, I'm leaning towards @MatthieuStigler's suggestion of a For reference, here is what each of those do: library(lmtest)
library(sandwich)
fm <- lm(length ~ age, data = Mandible, subset = (age <= 28))
ct <- coeftest(fm, df = Inf, vcov. = vcovHC, type = "HC0")
class(ct)
#> [1] "coeftest"
ct
#>
#> z test of coefficients:
#>
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -11.953366 1.009817 -11.837 < 2.2e-16 ***
#> age 1.772730 0.054343 32.621 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ci <- coefci(fm, df = Inf, vcov. = vcovHC, type = "HC0")
class(ci)
#> [1] "matrix"
ci
#> 2.5 % 97.5 %
#> (Intercept) -13.932571 -9.974161
#> age 1.666219 1.879241
library(tidyverse)
library(broom)
fm %>%
coeftest(vcov. = vcovHC, type = "HC0") %>%
tidy()
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -12.0 1.01 -11.8 1.79e-23
#> 2 age 1.77 0.0543 32.6 1.44e-71 It's probably worth adding a vignette on tidying tricks for regressions, and featuring this prominently there. So now the question is how to provide robust confidence intervals in I hadn't thought about wald tests at all, and currently library(lmtest)
#> Loading required package: zoo
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
library(sandwich)
library(magrittr)
library(broom)
fm <- lm(length ~ age, data = Mandible, subset = (age <= 28))
# note to self: waldtest has a `vcov` argument rather than
# a `vcov.` argument, and does not accept a `type` argument
wt <- waldtest(fm, vcov = vcovHC)
class(wt)
#> [1] "anova" "data.frame"
wt
#> Wald test
#>
#> Model 1: length ~ age
#> Model 2: length ~ 1
#> Res.Df Df F Pr(>F)
#> 1 156
#> 2 157 -1 1020.4 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tidy(wt)
#> # A tibble: 2 x 4
#> res.df df statistic p.value
#> <dbl> <dbl> <dbl> <dbl>
#> 1 156 NA NA NA
#> 2 157 -1 1020. 2.48e-70
glance(wt)
#> Error: There is no glance method for data frames. Did you mean `dplyr::glimpse()`? Created on 2019-04-27 by the reprex package (v0.2.1) I've opened #699 for additional discussion of |
@alexpghayes I understand that a new model class is outside the scope of broom. The proper place is in the lmtest package. But the problem remains that coeftest does not contain enough information. The problem in creating broom output from coeftest is that the object created does not have the information to create a tidy object with CI or a glance object with statistic and p.value elements. CI for tidy can be created if normality is assumed, otherwise the degrees of freedom have to be known, just as you say. In order to create F-statistics for the glance.confint the vcov has to be known, not just the CI. |
Adding a "df" attribute to "coeftest" objects is a good idea, I can do that. That should also enable a confint() method for "coeftest" objects that essentially returns the same thing as coefci(). Would it be feasible for tidy.coeftest() to call this confint.coeftest()? Then you don't have to duplicate the code. As for the Wald test against the trivial model (. ~ 1): I personally have found this to be rather useless in most applications and hence always resisted from computing that automatically along with the coeftest(). Also, the next question is then always how to get a "robust" R-squared which is not straightforward (or some would say ill-defined). But @bjornerstedt maybe you have a good suggestion for a suitable name/behavior of such a function (beyond its tidy() method)? |
I've quickly had a stab at this, you can try: lmtest_0.9-37.tar.gz Example:
|
Just adding a df attribute to the coeftest object is sufficient for the specific goal of a tidy.coeftest method. For me this would help a little, as with the example I posted earlier. It does not address the basic problem that a coeftest object contains too little information. In order to get regression tables with huxtable, a glance.coeftest method has to be defined. This method needs various values from the regression that are not available in the coeftest object. I realize that this is a forum to discuss broom and that suggested modifications of lmtest should perhaps be discussed elsewhere. My wish to create glance.conftest and augment.conftest are not the subject of this thread. But for me the really important use case is being able to create regression tables with robust SE in a tidy way. |
@statibk As far as design principles, I agree that it makes sense to separate estimation of the regression coefficients from estimation of the corresponding covariance matrix. But to me it does not make sense that the coeftest object only stores a single table. In addition to the regression itself it should also store the vcov. Along the lines of estimation commands, it should also store the call. Why should the user be required to link the two objects, when the two could be put in the same object as done in the robust_se method above? |
@statibk Looks great! Are you planning a CRAN release anytime soon? Also, is the @bjornerstedt Some
The issue is that the user may not want the inverse Fisher information covariance estimate, but it is quite labor intensive to create new What The This is the price |
Thanks again for the useful feedback!
For historical reasons (the package is more than 20 years old) it still resides in a non-public repository and there wasn't an urgent need (for me) yet to move it out. I might change that in the not-so-distant future as I have some ideas about more substantial improvements. Then about the design principles and related considerations: This is all surprisingly difficult, mostly because usage in practice is so heterogeneous. For example, when you associate an Hence, I'm personally not a fan of approaches like the one Finally, for the improvement of working with But for today I'm happy that I could extend |
@alexpghayes The robust_se function is essentially a brutal hack of an elegant hack. The reason for doing this was that I want glance.conftest and augment.conftest methods. To to this more information about the regression has to be included in the object. Essentially it is the vcov that is missing. If that could be added as an attribute with a clear name such as "robust.vcov", I think that would be sufficient. @statibk huxreg invokes broom methods to create regression tables. This is the reason why it would be helpful if conftest contained the vcov. |
…returned something Changes in Version 0.9-37 o coeftest() gained a "df" attribute facilitating subsequent processing of its output, e.g., for computing the corresponding confidence intervals. Suggested by Alex Hayes in tidymodels/broom#663. o Based on the new "df" attribute of "coeftest" objects, a method for confint() is added. confint(coeftest(object, ...)) should match the output of coefci(object, ...). o Based on the new "df" attribute of "coeftest" objects, a method for df.residual() is added. df.residual(coeftest(object, ...)) returns NULL if a normal (rather than t) approximation was used in coeftest(object, ...) even if df.residual(object) returned something different. (NEWS truncated at 15 lines)
In case someone needs it quickly, here is my code to use the new confint(). Will try to pull request it, once...
Created on 2019-07-07 by the reprex package (v0.3.0) |
Are you still planning on turning this into a PR? I think it'd be a great addition! If so, let's move discussion to the PR itself! |
Ok, did a Pull request, let us follow-up on there for the details of that solution, and keep this issue open for bigger-picture discussion!? |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Package sandwich provides a great and consistent approach to use heteroskedasticity consistent standard errors in R. This is used through the
coeftest()
function, for which there is atidy::coeftest()
method.Unfortunately, the
conf.int=TRUE
won't work, as a coeftest does not contain such information. Is there anyway to bypass this? Or is the only possibility to request lmtest author to add a conf.int argument?Thanks!
The text was updated successfully, but these errors were encountered: