Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour for avg_slopes() when variable name in the grid equal name of standard output columns #697

Closed
strengejacke opened this issue Feb 25, 2023 · 4 comments

Comments

@strengejacke
Copy link
Contributor

strengejacke commented Feb 25, 2023

See this example, where a non-focal variable is names group in one case and groups in the other case. I don't think this behaviour is intentional? Maybe you should also check for "masked" column names? Btw, which variable names are not "allowed"? group, type, term? (and se/ci/p related - but also more?)

library(marginaleffects)
set.seed(123)

dat <- data.frame(
  outcome = rbinom(n = 100, size = 1, prob = 0.35),
  var_binom = as.factor(rbinom(n = 100, size = 1, prob = 0.2)),
  var_cont = rnorm(n = 100, mean = 10, sd = 7),
  group = sample(letters[1:4], size = 100, replace = TRUE),
  groups = sample(letters[1:4], size = 100, replace = TRUE)
)

m1 <- glm(outcome ~ var_binom + var_cont + group,
  data = dat, family = binomial()
)
m2 <- glm(outcome ~ var_binom + var_cont + groups,
  data = dat, family = binomial()
)

avg_slopes(m1, variables = "var_cont")
#> 
#>  Group     Term Estimate Std. Error     z Pr(>|z|)    2.5 % 97.5 %
#>      a var_cont  0.00233    0.00647 0.360    0.719 -0.01035 0.0150
#>      b var_cont  0.00222    0.00615 0.360    0.719 -0.00984 0.0143
#>      c var_cont  0.00294    0.00815 0.361    0.718 -0.01303 0.0189
#>      d var_cont  0.00293    0.00814 0.361    0.718 -0.01301 0.0189
#> 
#> Prediction type:  response 
#> Columns: type, group, term, estimate, std.error, statistic, p.value, conf.low, conf.high
avg_slopes(m2, variables = "var_cont")
#> 
#>      Term Estimate Std. Error    z Pr(>|z|)   2.5 % 97.5 %
#>  var_cont  0.00132    0.00696 0.19     0.85 -0.0123  0.015
#> 
#> Prediction type:  response 
#> Columns: type, term, estimate, std.error, statistic, p.value, conf.low, conf.high

Created on 2023-02-25 with reprex v2.0.2

vincentarelbundock added a commit that referenced this issue Feb 25, 2023
@vincentarelbundock
Copy link
Owner

Thanks a lot for the report. I appreciate it.

Should be fixed now (stricter).

Reserved keywords are here: https://github.com/vincentarelbundock/marginaleffects/blob/main/R/sanitize_variables.R#L60

vincentarelbundock added a commit that referenced this issue Feb 25, 2023
@strengejacke
Copy link
Contributor Author

Thanks! Isn't type missing in that list?

@vincentarelbundock
Copy link
Owner

Good catch, but I think I should remove the type column entirely. It is a remnant of another era when people could estimate different types in one call.

@vincentarelbundock
Copy link
Owner

type is no longer a column, and that (very minor) breaking change is noted in NEWS.md.

You can retrieve the type argument from attr(x, "type")

Tristan-Siegfried pushed a commit to Tristan-Siegfried/marginaleffects that referenced this issue Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants