Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support na.action argument #44

Open
tappek opened this issue Apr 15, 2023 · 2 comments
Open

Support na.action argument #44

tappek opened this issue Apr 15, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@tappek
Copy link
Collaborator

tappek commented Apr 15, 2023

Currently, the na.action argument is (silently) ignored. (Of course, NA dropping works).

Adding support for it would have many benefits in presence of NAs, e.g., making it easier to add residuals back to the original data and it seems sandwich::vcovBS on plm objects estimated from a data frame with dropped NAs requires it via use of expand.model.frame.

An old branch was once started on R-Forge and has the below information:
https://r-forge.r-project.org/scm/viewvc.php/branches/exp_na.action/?root=plm

################# First take on na.action for plm objects ####################

  • plm, residuals.plm: element "na.action" is now in plm objects. This enables extraction of residuals
    padded to match the original data in presence of NAs when model is estimated with
    the na.action argument set to, e.g. na.exclude. Example:
    plm_object <- plm(..., data = data, na.action = na.exclude)
    residuals(plm_object) # will have the length to match number of rows of data

  • many more files changes to accomodate the change (but likely not all)

        Caution developers:
        ==================
        This means, when referring to residuals, care must be taken to use plm_object$residuals
        and not residuals(plm_object) (or the shortcut resid(plm_object)) to get the residual vector in the correct
        length and without NA padding. This is also how it is done in standard R's lm().
        Many function (statistical tests, summary statistics) need the residuals (sum of squared residuals, number of residuals).
        The functions might fail if not refered to the residuals by plm_object$residuals (think of function sum with NA values
        (without , na.rm = TRUE)) or not fail while relying on the wrong residual vector (especially so for length(residuals(plm_object))).
        
        In this experimental branch, it was tried to do the necessary changes for some functions, but likely not all.
    
@tappek tappek added the enhancement New feature or request label Apr 15, 2023
@tappek
Copy link
Collaborator Author

tappek commented Apr 15, 2023

Test code for the part relating to the mentioned sandwich::vcovBS issue in case of NA dropping is here: https://gist.github.com/tappek/4cb65ab25d64f019ec629df5d11bd2bc

@kmfrick
Copy link

kmfrick commented May 24, 2024

One other detail to think about is how NA dropping affects differencing and lagging of variables, which can be VERY subtle and has inconsistencies across packages. Bear with me for a second as I mention other packages before getting back to plm.

  • Stata's xtdpd allows for specifying an if argument which is something like plm's subset (although subset is also ignored in plm and pgmm currently). What this does is, after differencing and lagging, it drops observations that don't match the condition. This produces different results than dropping the observations before processing, because it deletes less data.
  • In fixest's feols, I can imitate this by adding a variable that is 1 everywhere and NA on the observations that I want to be dropped. The function will compute differences and lag first first, then drop the observations where that variable is NA, then realize that it's collinear with the intercept or one of the fixed effects and drop one of the two.
  • I can't do this in plm because it currently isn't able to drop NAs this way nor to drop the intercept or one of the fixed effects in case of collinearity.

Therefore, implementing handling of NAs that is similar to what fixest does would catch two birds with one stone: once it can drop NAs, implementing subset is as simple as adding a variable that is 1 everywhere and NA on the observation that don't match the subset condition and letting NA dropping work its magic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants