-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
take look at hockey explainer results #78
Comments
tl; drThe slide is correct as-is but the code needed a small change but it doesn't matter. details:The data have: levels(nhl_train$on_goal)
#> [1] "yes" "no"
# This is greater than 1 / 2 (so a positive log-odds)
mean(nhl_train$on_goal == "yes")
#> [1] 0.5515917 What is parsnip doing?The intercept in our model is negative: final_glm_spline_wflow %>%
tidy() %>%
filter(grepl("Intercept", term))
#> # A tibble: 1 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -0.241 0.0381 -6.32 2.57e-10
A positive slope for defenseman indicate that being a defenseman:
final_glm_spline_wflow %>%
tidy() %>%
filter(grepl("position", term))
#> # A tibble: 4 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 position_defenseman 0.129 0.0333 3.88 0.000103
#> 2 position_goalie -0.104 2.06 -0.0506 0.960
#> 3 position_left_wing 0.00438 0.0266 0.164 0.870
#> 4 position_right_wing 0.0287 0.0273 1.05 0.294 Just to be sure, what do the raw data say? nhl_train %>%
mutate(binned_x = ntile(coord_x, 15)) %>%
group_by(binned_x, position) %>%
summarize(
on_goal_rate = mean(on_goal == "yes"),
mean_x = mean(coord_x),
.groups = "drop"
) %>%
filter(position != "goalie") %>%
ggplot(aes(mean_x, on_goal_rate, col = position)) +
geom_line() +
geom_point() +
lims(y = 0:1) I believe that the y-axis format of
is correct However ... what is dalex doing?The slide has: library(DALEXtra)
glm_explainer <- explain_tidymodels(
final_glm_spline_wflow,
data = dplyr::select(nhl_train, -on_goal),
# DALEX required an integer for factors:
y = as.integer(nhl_train$on_goal),
verbose = FALSE
)
set.seed(123)
pdp_coord_x <- model_profile(
glm_explainer,
variables = "coord_x",
N = 500,
groups = "position"
) Let's reformat the data to run glm() manually at first: wflow_mold <- extract_mold(final_glm_spline_wflow)
train_data <-
bind_cols(wflow_mold$predictors, wflow_mold$outcome) %>%
mutate(
# This re-encodes 1 = yes, 2 = no
on_goal = as.integer(on_goal)
) If you were to run: int_glm <- glm(on_goal ~ ., data = train_data, family = binomial)
#> Error in eval(family$initialize): y values must be 0 <= y <= 1 So, for this type of model explainer, DALEX never has to fit the model so it never calls I changed the slide to be library(DALEXtra)
glm_explainer <- explain_tidymodels(
final_glm_spline_wflow,
data = dplyr::select(nhl_train, -on_goal),
# DALEX required an integer for factors:
y = as.integer(nhl_train$on_goal) - 1,
verbose = FALSE
) This is more appropriate in case it ever does need to run |
#62 (comment)
The text was updated successfully, but these errors were encountered: