Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robust standard errors: will tidy.coeftest(., conf.int=TRUE) ever be possible? #663

Open
MatthieuStigler opened this issue Mar 12, 2019 · 16 comments

Comments

Projects
None yet
4 participants
@MatthieuStigler
Copy link
Contributor

commented Mar 12, 2019

Package sandwich provides a great and consistent approach to use heteroskedasticity consistent standard errors in R. This is used through the coeftest() function, for which there is a tidy::coeftest() method.

Unfortunately, the conf.int=TRUE won't work, as a coeftest does not contain such information. Is there anyway to bypass this? Or is the only possibility to request lmtest author to add a conf.int argument?

Thanks!

@alexpghayes

This comment has been minimized.

Copy link
Collaborator

commented Mar 13, 2019

This depends on what information the coeftest object stores, which I'd need to check. In general, I'd advocate for using estimatr for robust standard errors. The philosophy of sandwich is not the most intuitive in terms of user interface to me, because all of a sudden you get model estimates living in two separate objects: the model object and the sandwich object that contains the standard errors. In estimatr, this information all lives together. Also, estimatr has tidy() methods, which may solve your problem out of box!

@MatthieuStigler

This comment has been minimized.

Copy link
Contributor Author

commented Mar 13, 2019

Well I am not sure the debate SE within model call (estimatr, stata, ...) versus the SE post-model call (sandwich, R base with summary()) is really that simple.

SE in model is definitely much simpler for the user, and allows the function writer to decide for the user of a sound default (much of current literature says you should use robust, not homeskedastic).

SE post-model philosophy is less intuitive, as user have to specifiy themselves robust errors, and may be not that easy. But I think it is much more powerful.

In my opinion, a few points speak in favour of the sandwich implementation:

  • Speaking about implementation philosophy, sandwich has a strong theoretical basis in terms of sandwich formulas for M-estimators, hence covering a very large class of models.
  • The strong theory-based approach was neatly implemented using R's object-oriented capabilities (see paper), allows sandwich to work for many models in R: authors need simply to add estfun(), meat() and bread()` methods for their estimator.
  • This framework is flexible enough to accommodate for newer developments from the sandwich industry, such as the cluster one. I believe this requires only changes at the sandwich side, not changes for each estimators

On the other side, the SE-within approach is pretty much re-inventing the wheel and hard-coding a few specific functionalities decided at some point in time. Generalizing this approach to the large collection of models available for tidy would require a huge work, with a potentially huge redundancy.

But that said, I am not saying sandwich implementation is intuitive, easy to use, and I fully agree with principle 17. But I think sandwich forces to think in more general terms, which can only be beneficial for implementation principles! If one needs to reinvent the wheel, better to reinvent the sandwich wheel, than to have hundred new wheels for each estimator ;-)

@alexpghayes

This comment has been minimized.

Copy link
Collaborator

commented Apr 9, 2019

Philosophy aside, would you pitch an interface? I'm not entirely sure I understand what you're looking for. Can you include some example code and what you'd hope the output would be?

@bjornerstedt

This comment has been minimized.

Copy link

commented Apr 21, 2019

I just encountered this problem, in trying to use broom together with the dynlm package. I would like to add robust standard errors to the dynamic linear model with coeftest, but the code fails if I include this row.

library(dynlm)
library(broom)
library(sandwich)
library(tidyverse)
data("FrozenJuice", package = "AER") 
FOJCPI <- FrozenJuice[, "price"]/FrozenJuice[, "ppi"]
FOJC_pctc <- 100 * diff(log(FOJCPI))
FDD <- FrozenJuice[, "fdd"]
dynlm(FOJC_pctc ~ L(FDD, 0:18)) %>% 
  # coeftest(vcov. = vcovHAC) %>% 
  tidy( conf.int = TRUE) %>%
  mutate(Lag = row_number()-2) %>% 
  filter(Lag >= 0) %>% 
  ggplot() + 
      geom_line(aes(Lag, estimate), color = "blue") +
      geom_line(aes(Lag, conf.low), color = "grey") +
      geom_line(aes(Lag, conf.high), color = "grey") + ylab("Multiplier")

@alexpghayes alexpghayes added this to the 0.7.0 milestone Apr 21, 2019

@statibk

This comment has been minimized.

Copy link

commented Apr 26, 2019

Thanks for the discussion of the issues related to interfacing sandwich functionality via lmtest::coeftest. I just wanted to add two points:

Confidence intervals: In addition to lmtest::coeftest there is also lmtest::coefci with the same kind of interface. These could both be leveraged. The default methods for both functions are very straightforward, though, so it wouldn't be hard to re-implement something similar specifically for broom::tidy. Let me know if I can help either by providing something within lmtest or an adapted/modified version for broom.

Design philosophy: The whole point about sandwich covariances is that they are essentially orthogonal to parameter estimation. Thus, it makes sense to separate estimation of the regression coefficients from estimation of the corresponding covariance matrix. Most model functions hence obtain coefficient estimates along with the "usual" covariances/standard errors under the full model assumptions (often assuming a full probability distribution). Then, you can relax the assumptions without having to re-estimate the coefficients but just by adapting the covariances. Typically, you just assume that the mean function is correctly specified but the remaining probability distribution may be misspecified. Various kinds of misspecifications are supported in sandwich: heteroscedasticity (vcovHC), heteroscedasticity and autocorrelation in time series (vcovHAC and friends), heteroscedasticity and cluster correlation (vcovCL), panel data structure (vcovPL), and a few more flavors.

So while it is convenient to just add a robust = TRUE option to the modeling function itself it often obscures what kind of robustness is actually achieved. A prominent misconception are robust standard errors for binary cross-section data because these are not actually robust against anything. Any misspecification will be in the mean equation and the sandwich covariances do not help you for that, see: https://stackoverflow.com/questions/27367974/different-robust-standard-errors-of-logit-regression-in-stata-and-r

Therefore, I feel that separating the covariance matrix estimation from the model estimation is both useful in terms of DRY coding but also for bringing out more clearly what kind of robustness is actually achieved.

Let me know if you need more details on any parts of this.

Best wishes,
Achim

@MatthieuStigler

This comment has been minimized.

Copy link
Contributor Author

commented Apr 26, 2019

For implementation, I could think of two approaches:

  • The simple one: just provide a tidy.coeftest method. This would be made much easier with Achim adding a conf.int argument to coeftest, which would append the confidence interval to the coeftest output. This would allow close to 15 models (see methods(estfun) from sandwhich) +40 (see pks depending on sandwich, though no full overlap with broom) to use robust standard errors:

    model %>%
      coeftest(vcov. = vcovHC) %>%
      tidy()
    
  • The ambitious one: provide an optional vcov. =NULL argument in tidy(), if NULL use summary(model) for se, else coeftest(x, vcov. = vcovArg). User would do:

    model %>%
      tidy(vcov. = vcovHC)
    

Using this internally would also allow to have also glance() and augment() using the vcov argument. I might not realize though how much is involved in this latter approach.

Implementation principles

These both approaches definitely rely on 1) it is up to user to specify the vcov (broom will never decide which vcov to use) 2) it uses sandwich for robust estimation. Unlike the estimatr approach that requires rewriting every modelling function (every time a new vcovHC flavour goes out), it relies on the existing sandwich framework. For packages not yet covered by sandwich, contributors have to include statistically well-defined methods such as bread() and estfun() ([Object-Oriented Computation of Sandwich Estimators](see https://cran.r-project.org/web/packages/sandwich/vignettes/sandwich-OOP.pdf
). This could be worth discussing further, and adding as a model-implementation-principles: do not reinvent the robust wheel, just give me bread and estfun

@bjornerstedt

This comment has been minimized.

Copy link

commented Apr 27, 2019

My main objection with R approach to robust standard errors is that it leads to messy code. It is also important in teaching, as modern textbooks in econometrics focus almost exclusively on robust standard errors, even at the introductory level.

My suggestion is to create a wrapper function to include the results of coeftest, coefci and waldtest in the estimation result.

library(tidyverse)
library(broom)
library(sandwich)
library(lmtest)
library(huxtable)

reg1 = lm(mpg ~  wt, data = mtcars) 
reg2 = robust_se(reg1, vcov. = vcovHC) 

tidy(reg2)
tidy(reg2, conf.int = TRUE)
glance(reg2)
huxreg(Standard=reg1, HC=reg2, statistics = c(N = "nobs", F = "statistic", P = "p.value"))

The function adds "robust_se" to the class description to allow broom to invoke
the broom methods tidy.robust_se and glance.robust_se.

robust_se = function(model, vcov. = NULL, ...)
{
  model$coeftestresult = coeftest(model, vcov. = vcov., ...)
  model$ci = coefci(model, vcov. = vcov.,  ...)
  model$waldtest = waldtest(model, vcov = vcov.)  
  
  class(model) = c("robust_se", class(model))
  model
}

tidy.robust_se <- function(x, conf.int = FALSE, conf.level = 0.95) {
  ctab = x$coeftestresult
  result = tibble(term = rownames(ctab), estimate = ctab[,1], std.error = ctab[,2], statistic = ctab[,3], p.value = ctab[,4]) 
  if (conf.int & conf.level) {
    a = (1 - conf.level) / 2
    result$conf.low = x$ci[,1]
    result$conf.high = x$ci[,2]
  }
  result
}

glance.robust_se <- function(x) {
  class(x) = class(x)[2:length(class(x))]
  result = glance(x)
  result$statistic = x$waldtest[2,3]
  result$p.value = x$waldtest[2,4]
  result
}

nobs.robust_se <- function(x, ...) {
  class(x) = class(x)[2:length(class(x))]
  nobs(x)
}
@alexpghayes

This comment has been minimized.

Copy link
Collaborator

commented Apr 27, 2019

Thanks everyone for the discussion of design choices! My take on OOP for modeling is in flux and I appreciate all the kind discussion and patience here. My impression is that are two key issues: (1) extensibility and (2) if estimates for different estimands should reasonably live in different objects. I'm slowly writing a blog post with some takeaways from this and other discussions and would love to continue talking about this.

For the sake of this issue, I'm leaning towards @MatthieuStigler's suggestion of a tidy.coeftest() method. broom already takes this approach. I like the second approach (a vcov. argument to tidy()) but it'll likely involve more work than I can reasonably commit to at the moment. The issue then is how to combine the results of coeftest() and coefci() together.

For reference, here is what each of those do:

library(lmtest)
library(sandwich)

fm <- lm(length ~ age, data = Mandible, subset = (age <= 28))

ct <- coeftest(fm, df = Inf, vcov. = vcovHC, type = "HC0")
class(ct)
#> [1] "coeftest"
ct
#> 
#> z test of coefficients:
#> 
#>               Estimate Std. Error z value  Pr(>|z|)    
#> (Intercept) -11.953366   1.009817 -11.837 < 2.2e-16 ***
#> age           1.772730   0.054343  32.621 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ci <- coefci(fm, df = Inf, vcov. = vcovHC, type = "HC0")
class(ci)
#> [1] "matrix"
ci
#>                  2.5 %    97.5 %
#> (Intercept) -13.932571 -9.974161
#> age           1.666219  1.879241

broom already has a tidy.coeftest() method, which works like so:

library(tidyverse)
library(broom)

fm %>% 
  coeftest(vcov. = vcovHC, type = "HC0") %>% 
  tidy()
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)   -12.0     1.01       -11.8 1.79e-23
#> 2 age             1.77    0.0543      32.6 1.44e-71

It's probably worth adding a vignette on tidying tricks for regressions, and featuring this prominently there.

So now the question is how to provide robust confidence intervals in tidy.coeftest(). After some digging, I think the best option is to calculate these manually in broom. The issue is that this requires knowing the residual degrees of freedom, information not currently stored in the coeftest object. @statibk would you be willing to add a df.residual attribute to coeftest objects to enable this?

I hadn't thought about wald tests at all, and currently broom does not support waldtest output (at least, not intentionally). In practice, waldtest returns an anova object and you can do the following:

library(lmtest)
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
library(sandwich)
library(magrittr)
library(broom)

fm <- lm(length ~ age, data = Mandible, subset = (age <= 28))

# note to self: waldtest has a `vcov` argument rather than
# a `vcov.` argument, and does not accept a `type` argument
wt <- waldtest(fm, vcov = vcovHC)
class(wt)
#> [1] "anova"      "data.frame"
wt
#> Wald test
#> 
#> Model 1: length ~ age
#> Model 2: length ~ 1
#>   Res.Df Df      F    Pr(>F)    
#> 1    156                        
#> 2    157 -1 1020.4 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

tidy(wt)
#> # A tibble: 2 x 4
#>   res.df    df statistic   p.value
#>    <dbl> <dbl>     <dbl>     <dbl>
#> 1    156    NA       NA  NA       
#> 2    157    -1     1020.  2.48e-70
glance(wt)
#> Error: There is no glance method for data frames. Did you mean `dplyr::glimpse()`?

Created on 2019-04-27 by the reprex package (v0.2.1)

I've opened #699 for additional discussion of waldtest() output. @bjornerstedt I see where you're coming from but worry that creating entirely new model classes is out of scope for broom, and will have a high maintenance burden.

@bjornerstedt

This comment has been minimized.

Copy link

commented Apr 27, 2019

@alexpghayes I understand that a new model class is outside the scope of broom. The proper place is in the lmtest package. But the problem remains that coeftest does not contain enough information. The problem in creating broom output from coeftest is that the object created does not have the information to create a tidy object with CI or a glance object with statistic and p.value elements. CI for tidy can be created if normality is assumed, otherwise the degrees of freedom have to be known, just as you say. In order to create F-statistics for the glance.confint the vcov has to be known, not just the CI.

@statibk

This comment has been minimized.

Copy link

commented Apr 29, 2019

Adding a "df" attribute to "coeftest" objects is a good idea, I can do that. That should also enable a confint() method for "coeftest" objects that essentially returns the same thing as coefci(). Would it be feasible for tidy.coeftest() to call this confint.coeftest()? Then you don't have to duplicate the code.

As for the Wald test against the trivial model (. ~ 1): I personally have found this to be rather useless in most applications and hence always resisted from computing that automatically along with the coeftest(). Also, the next question is then always how to get a "robust" R-squared which is not straightforward (or some would say ill-defined). But @bjornerstedt maybe you have a good suggestion for a suitable name/behavior of such a function (beyond its tidy() method)?

@statibk

This comment has been minimized.

Copy link

commented Apr 29, 2019

I've quickly had a stab at this, you can try: lmtest_0.9-37.tar.gz

Example:

library("lmtest")
library("sandwich")
m <- lm(dist ~ speed, data = cars)
ct <- coeftest(m, df = Inf, vcov = sandwich)
confint(ct)
##                  2.5 %    97.5 %
## (Intercept) -28.440965 -6.717225
## speed         3.151009  4.713809
coefci(m, df = Inf, vcov = sandwich)
##                  2.5 %    97.5 %
## (Intercept) -28.440965 -6.717225
## speed         3.151009  4.713809
confint(m)
##                  2.5 %    97.5 %
## (Intercept) -31.167850 -3.990340
## speed         3.096964  4.767853
df.residual(ct)
## [1] Inf
@bjornerstedt

This comment has been minimized.

Copy link

commented Apr 29, 2019

Just adding a df attribute to the coeftest object is sufficient for the specific goal of a tidy.coeftest method. For me this would help a little, as with the example I posted earlier. It does not address the basic problem that a coeftest object contains too little information. In order to get regression tables with huxtable, a glance.coeftest method has to be defined. This method needs various values from the regression that are not available in the coeftest object.

I realize that this is a forum to discuss broom and that suggested modifications of lmtest should perhaps be discussed elsewhere. My wish to create glance.conftest and augment.conftest are not the subject of this thread. But for me the really important use case is being able to create regression tables with robust SE in a tidy way.

@bjornerstedt

This comment has been minimized.

Copy link

commented Apr 29, 2019

@statibk As far as design principles, I agree that it makes sense to separate estimation of the regression coefficients from estimation of the corresponding covariance matrix. But to me it does not make sense that the coeftest object only stores a single table. In addition to the regression itself it should also store the vcov. Along the lines of estimation commands, it should also store the call. Why should the user be required to link the two objects, when the two could be put in the same object as done in the robust_se method above?

@alexpghayes

This comment has been minimized.

Copy link
Collaborator

commented Apr 30, 2019

@statibk Looks great! Are you planning a CRAN release anytime soon? Also, is the lmtest development repo public?

@bjornerstedt Some brief comments, although I also agree we should wrap the design discussion down.

coeftest() is in one sense a very elegant hack. The user begins by fitting some model. Suppose they use stats::lm(). Implicitly they are requesting OLS coefficients, inverse Fisher information covariance estimates, and a whole load of additional linear model related quantities.

The issue is that the user may not want the inverse Fisher information covariance estimate, but it is quite labor intensive to create new lm-style classes and infrastructure for every conceivable estimand-estimator pairing.

What coeftest() does is allow the user to get regression tables, but using a new covariance estimator. This violates a key principle for model objects, "estimation-uniquess" for lack of a better term, which states that for a given estimand, you should only calculate estimates with a single estimator. But coeftest() is very explicit about the covariance estimator in use, and I believe purposefully does not intend the result to be extensible as a model object.

The robust_se class that you propose violates the estimator-uniqueness principle - it contains multiple estimates for a regression table. And the resulting object contains many other estimands, suggesting that you could write S3 methods that combine these estimands. But developing on top of a robust_se object is a bad idea -- users and developers will constantly be unsure which estimate of a regression table is used in any given calculation. For example, as written, vcov() and tidy() will produce conflicting results.

This is the price lmtest pays -- on one hand it avoids the labor of implementing a plethora of new model classes, but this design restricts the package to calculating estimates for a single estimands at a time. If users want to compute on multiple of these estimands at once, they must do it themselves. In your case, you want a tidy computation on multiple estimands. This is exactly the case when full-on model classes are a more appropriate solution. My guess is that estimatr is more likely to provide this than lmtest.

@statibk

This comment has been minimized.

Copy link

commented Apr 30, 2019

Thanks again for the useful feedback!

lmtest 0.9-37 is now out on https://CRAN.R-project.org/package=lmtest. @alexpghayes if you need anything else for the tidy() interface just let me know.

For historical reasons (the package is more than 20 years old) it still resides in a non-public repository and there wasn't an urgent need (for me) yet to move it out. I might change that in the not-so-distant future as I have some ideas about more substantial improvements.

Then about the design principles and related considerations: This is all surprisingly difficult, mostly because usage in practice is so heterogeneous. For example, when you associate an lm model with a different covariance matrix, which standard assumptions are you giving up exactly? Will you still consider R-squared? Will you still consider AIC or BIC? What about prediction intervals? Strictly speaking none of these make sense anymore, nevertheless you will see them applied in practice along with so-called robust standard errors. Even worse for logit/probit models for cross-section data where there is really not a single reason why so-called robust standard errors should be applied. (Clustered or panel data would be different.) Therefore, it is very hard - if not impossible - to implement a tool that is fully consistent while at the same time allowing all standard practices.

Hence, I'm personally not a fan of approaches like the one estimatr (or before it Zelig) has taken. However, it clearly addresses a need that many users feel they have. So I'm trying to be pragmatic about this and just try to provide building blocks in my packages that can be re-used by such packages.

Finally, for the improvement of working with huxtable and sandwich: I'm not familiar enough with the details of the design and intended workflow of broom+huxtable, hence I can't make a good suggestion how to solve this. But I'm open to suggestions and discussions. If possible and reasonable I'm also willing to adapt lmtest or other packages to facilitate this!

But for today I'm happy that I could extend lmtest to make one small detail simpler :-)

@bjornerstedt

This comment has been minimized.

Copy link

commented Apr 30, 2019

@alexpghayes The robust_se function is essentially a brutal hack of an elegant hack. The reason for doing this was that I want glance.conftest and augment.conftest methods. To to this more information about the regression has to be included in the object. Essentially it is the vcov that is missing. If that could be added as an attribute with a clear name such as "robust.vcov", I think that would be sufficient.

@statibk huxreg invokes broom methods to create regression tables. This is the reason why it would be helpful if conftest contained the vcov.

clrpackages pushed a commit to clearlinux-pkgs/R-lmtest that referenced this issue May 1, 2019

[update to 0.9.37] coeftest(object, ...) even if df.residual(object) …
…returned something

Changes in Version 0.9-37

  o coeftest() gained a "df" attribute facilitating subsequent processing
    of its output, e.g., for computing the corresponding confidence intervals.
    Suggested by Alex Hayes in tidymodels/broom#663.

  o Based on the new "df" attribute of "coeftest" objects, a method for
    confint() is added. confint(coeftest(object, ...)) should match the output
    of coefci(object, ...).

  o Based on the new "df" attribute of "coeftest" objects, a method for
    df.residual() is added. df.residual(coeftest(object, ...)) returns NULL
    if a normal (rather than t) approximation was used in
    coeftest(object, ...) even if df.residual(object) returned something
    different.

(NEWS truncated at 15 lines)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.