Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

take look at hockey explainer results #78

Closed
topepo opened this issue Jul 22, 2022 · 1 comment
Closed

take look at hockey explainer results #78

topepo opened this issue Jul 22, 2022 · 1 comment
Assignees

Comments

@topepo
Copy link
Member

topepo commented Jul 22, 2022

#62 (comment)

@topepo topepo self-assigned this Jul 22, 2022
@topepo
Copy link
Member Author

topepo commented Jul 22, 2022

tl; dr

The slide is correct as-is but the code needed a small change but it doesn't matter.

details:

The data have:

levels(nhl_train$on_goal)
#> [1] "yes" "no"

# This is greater than 1 / 2 (so a positive log-odds)
mean(nhl_train$on_goal == "yes")
#> [1] 0.5515917

What is parsnip doing?

The intercept in our model is negative:

final_glm_spline_wflow %>%
  tidy() %>%
  filter(grepl("Intercept", term))
#> # A tibble: 1 × 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)   -0.241    0.0381     -6.32 2.57e-10

glm() via parsnip is modeling the probability of being not on-goal (as expected).

A positive slope for defenseman indicate that being a defenseman:

  • increases the prob of not on goal, or
  • defenseman are less likely to have shots on goal
final_glm_spline_wflow %>%
  tidy() %>%
  filter(grepl("position", term))
#> # A tibble: 4 × 5
#>   term                estimate std.error statistic  p.value
#>   <chr>                  <dbl>     <dbl>     <dbl>    <dbl>
#> 1 position_defenseman  0.129      0.0333    3.88   0.000103
#> 2 position_goalie     -0.104      2.06     -0.0506 0.960   
#> 3 position_left_wing   0.00438    0.0266    0.164  0.870   
#> 4 position_right_wing  0.0287     0.0273    1.05   0.294

Just to be sure, what do the raw data say?

nhl_train %>%
  mutate(binned_x = ntile(coord_x, 15)) %>%
  group_by(binned_x, position) %>%
  summarize(
    on_goal_rate = mean(on_goal == "yes"),
    mean_x = mean(coord_x),
    .groups = "drop"
  ) %>%
  filter(position != "goalie") %>%
  ggplot(aes(mean_x, on_goal_rate, col = position)) +
  geom_line() +
  geom_point() +
  lims(y = 0:1)

I believe that the y-axis format of

"Predicted probability of not being on goal"

is correct

However ... what is dalex doing?

The slide has:

  library(DALEXtra)

  glm_explainer <- explain_tidymodels(
    final_glm_spline_wflow,
    data = dplyr::select(nhl_train, -on_goal),
    # DALEX required an integer for factors:
    y = as.integer(nhl_train$on_goal),
    verbose = FALSE
  )

  set.seed(123)
  pdp_coord_x <- model_profile(
    glm_explainer,
    variables = "coord_x",
    N = 500,
    groups = "position"
  )

Let's reformat the data to run glm() manually at first:

wflow_mold <- extract_mold(final_glm_spline_wflow)

train_data <-
  bind_cols(wflow_mold$predictors, wflow_mold$outcome) %>%
  mutate(
    # This re-encodes 1 = yes, 2 = no
    on_goal = as.integer(on_goal)
  )

If you were to run:

int_glm <- glm(on_goal ~ ., data = train_data, family = binomial)
#> Error in eval(family$initialize): y values must be 0 <= y <= 1

So, for this type of model explainer, DALEX never has to fit the model so it never calls glm() or fits the model. The as.integer() is wrong but it doesn't produce and error since it is never needed.

I changed the slide to be

  library(DALEXtra)

  glm_explainer <- explain_tidymodels(
    final_glm_spline_wflow,
    data = dplyr::select(nhl_train, -on_goal),
    # DALEX required an integer for factors:
    y = as.integer(nhl_train$on_goal) - 1,
    verbose = FALSE
  )

This is more appropriate in case it ever does need to run glm() for some other explainer.

@topepo topepo closed this as completed Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant