Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-standard names in formula evaluation #37

Open
sondalex opened this issue Nov 22, 2022 · 10 comments
Open

Non-standard names in formula evaluation #37

sondalex opened this issue Nov 22, 2022 · 10 comments

Comments

@sondalex
Copy link

sondalex commented Nov 22, 2022

Hi,

I have noticed backticks do not work in plm formula.
I think it would be useful for people who would want
to create two stage models without transforming the data twice.

Example:

library(plm)
data("Grunfeld", package="plm")
model1 = plm(inv ~ value + capital + factor(year),
              data = Grunfeld, model = "pooling")
model1
# model results
# ...

plm(inv ~ value + capital + `factor(year)`, data=model.frame(model1), model="within", effect="individual")
# Error in eval(predvars, data, env) : object 'factor(year)' not found
@tappek
Copy link
Collaborator

tappek commented Nov 22, 2022

I will need to look at that more closely. A quick check of ?formula gives this:
Variable names can be quoted by backticks `like this` in formulae, although there is no guarantee that all code using formulae will accept such non-syntactic names.

-> I read this as backticks in formulae can cause issues already in only base R setups.

However, in your example why would you like to estimate a two-way FE model as a one-way model and a dummy for the other dimenson (and not simply setting effect = "twoways")?

@sondalex
Copy link
Author

The example is just to illustrate the idea. My use case would have been to long to explain.

@ycroissant
Copy link
Owner

ycroissant commented Nov 23, 2022 via email

@tappek
Copy link
Collaborator

tappek commented Nov 23, 2022

It does not run with 2.6-1 on my end, also not with 1.7-0 (i.e., in the non-current pFormula times).

By the time plm gets to access the data, the non-syntactially valid name factor(year) has been converted to syntactially valid name factor.year., thus it is not found anymore. See for the term syntactically valid name ?make.names and run make.names("factor(year)").

The issue is not with backticks per se but with the paranthesis in factor(year) making it a syntactically not valid name.
This is illustrated by the following backtick example:

library(plm)
data("Grunfeld", package="plm")
model1 <- plm(inv ~ value + capital + factor(year),
             data = Grunfeld, model = "pooling")
data2 <- model.frame(model1)
data2[ , "a"] <- rnorm(200)
form <- inv ~ value + capital + `a`
plm(form, data=data2, model="within", effect="individual") # works
form2 <- inv ~ value + capital + `factor(year)`
plm(form2, data=data2, model="within", effect="individual") # errors

It works with lm tough. Not sure if this is worth the effort for plm to make it work, also due to the general warning in ?formula.

A workaround would be to ensure syntactically valid names und use these in the formula, so something along these lines:

colnames(data2) <- make.names(colnames(data2), unique = TRUE)
plm(inv ~ value + capital + factor.year., data=data2, model="within", effect="individual")

@sondalex
Copy link
Author

Thank you for your digging into this issue

@tappek tappek changed the title Backticks evaluation in formula Non-standard names in formula evaluation Nov 29, 2022
@m0byn
Copy link

m0byn commented Feb 17, 2023

Although there exists a workaround I came across this issue and have to say it is rather suprising! I am using age groups as variables, so it is rather intuitive to include numers in column names. Since it works with the lm function I do argue the value of digging deeper into this issue is worth the effort!

@santoshbs
Copy link

I am having the same issue. My dependent variable starts with "z_". plm() keeps saying object not found.

@tappek
Copy link
Collaborator

tappek commented Apr 25, 2023

Do you have a reproducible example for your z_ case? The following z_ case works:

library(plm)
data(Grunfeld)
Grunfeld$`z_a` <- Grunfeld$inv
plm(z_a ~ value + capital, data = Grunfeld)

Model Formula: z_a ~ value + capital

Coefficients:
  value capital 
0.11012 0.31007 

@santoshbs
Copy link

Thank you, @tappek.

I am not sure how to create a reproducible example. I will try.

Just FYI - while the same dataset and variable names worked with lm() and lmer(), plm() kept showing object not found. For some reason, colnames(df_pdataframe) and head(df_pdataframe) kept showing different column names. Anyways, I had to go back to lmer().

@tappek
Copy link
Collaborator

tappek commented Apr 28, 2023

Spontaneously, I cannot come up with a reason why colnames and head would show different column names for a pdata.frame as we do not provide specialised methods for pdata.frames in the package and there is nothing special for column names in a pdata.frame. Here a reproducible example would help as well to identify a possible cause.

https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example gives some hints how to create one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants