Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_summary_report doesn't support more than 5 populations or models to compare #70

Closed
filbert42 opened this issue Jul 21, 2022 · 3 comments

Comments

@filbert42
Copy link

When I'm trying to compare more than 5 populations or models with create_summary_report it throws me an error:

<simpleError in data.frame(value = lvls_str, label = labels_values, stringsAsFactors = FALSE): arguments imply differing number of rows: 6, 10>.

Is it a bug or is it an intended behavior? :)
image

@uriahf
Copy link
Owner

uriahf commented Aug 10, 2022

@filbert42 Are you familiar with a good reproducible example that is relevant?

I'll try to use fairness::compas dataset but it might be more intuitive to have examples of ROC with >5 subpopulations rather than specific metric in a table.

@uriahf
Copy link
Owner

uriahf commented Aug 14, 2022

  • Support filtering for more than 5 populations when calling render_performance_table() 06b41ce

  • Support filtering for more than 5 populations when calling create_table_for_prevalence() eb0f222

  • Support filtering for more than 5 populations when calling create_table_for_auc() de596ac

  • Provide colors for more than 5 populations when calling render_performance_table(): 30417c3
    New Color Palette:
    c("#1b9e77", "#d95f02", "#7570b3", "#e7298a", "#07004D",
    "#E6AB02", "#FE5F55", "#54494B", "#006E90" , "#BC96E6",
    "#52050A", "#1F271B", "#BE7C4D", "#63768D", "#08A045",
    "#320A28", "#82FF9E, "#2176FF", "#D1603D", "#585123")

  • Change color defaults for all create_*_curve() and plot_*_curve() functions 06b41ce 0421ac9 202bfff

  • Update crosstalk checkboxes colors in the summary report template cf9e90a

@uriahf
Copy link
Owner

uriahf commented Aug 15, 2022

Should be fine now:

4756e3d
b3ba132

Reproducible Example:

library(purrr)
library(fairness)
library(dplyr)
library(rtichoke)

#collapse-show
# extract data

compas <- fairness::compas
df     <- compas[, !(colnames(compas) %in% c('probability', 'predicted'))]

# partitioning params
set.seed(77)
val_percent <- 0.3
val_idx     <- sample(1:nrow(df))[1:round(nrow(df) * val_percent)]

# partition the data
df_train <- df[-val_idx, ]
df_valid <- df[ val_idx, ]

# fit logit models
model1 <- glm(Two_yr_Recidivism ~ .,            
              data   = df_train, 
              family = binomial(link = 'logit'))

df_valid$prob_1 <- predict(model1, df_valid, type = 'response')
df_valid$Two_yr_Recidivism_01 <- ifelse(df_valid$Two_yr_Recidivism == 'yes', 1, 0)

named_group_split <- function(.tbl, ...) {
  grouped <- group_by(.tbl, ...)
  names <- rlang::inject(paste(!!!group_keys(grouped), sep = " / "))
  
  grouped %>% 
    group_split(.keep = FALSE) %>% 
    rlang::set_names(names)
}

df_valid_for_rtichoke <- df_valid %>%
  select(prob_1, Two_yr_Recidivism_01, ethnicity) %>% 
  list(
    probs = select(., ethnicity, prob_1) %>% 
      named_group_split(ethnicity) %>% 
      map(~ .x %>%
            pull(prob_1)),
    reals = select(., ethnicity, Two_yr_Recidivism_01) %>% 
      named_group_split(ethnicity)  %>% 
      map(~ .x %>%
            pull(Two_yr_Recidivism_01))
  )

create_summary_report(
  probs = df_valid_for_rtichoke$probs,
  reals = df_valid_for_rtichoke$reals
)

@uriahf uriahf closed this as completed Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

2 participants