Skip to content

cal_estimate_*() with factor variable passed to .by fails #127

@tonyelhabr

Description

@tonyelhabr

This may or may not be intended behavior, but my expectation is that this would work.

library(probably)
packageVersion("probably")
#> [1] '1.0.1.9000'
data("segment_logistic")
segment_logistic$dummy_group <- c(
  rep("A", 500),
  rep("B", 300),
  rep("C", 210)
)

## 1. works as expected for a character field
cal_estimate_beta(segment_logistic, Class, .by = dummy_group)
#> 
#> ── Probability Calibration
#> Method: Beta calibration
#> Type: Binary
#> Source class: Data Frame
#> Data points: 1,010, split in 3 groups
#> Truth variable: `Class`
#> Estimate variables:
#> `.pred_good` ==> good
#> `.pred_poor` ==> poor

## 2. doesn't work with a factor group?
segment_logistic$dummy_group <- factor(segment_logistic$dummy_group)
cal_estimate_beta(segment_logistic, Class, .by = dummy_group)
#> Error in family$linkfun(mustart): Argument mu must be a nonempty numeric vector

## 3. works for an integer field that is like a pseudo-category
segment_logistic$dummy_group <- as.numeric(segment_logistic$dummy_group)
cal_estimate_beta(segment_logistic, Class, .by = dummy_group)
#> 
#> ── Probability Calibration
#> Method: Beta calibration
#> Type: Binary
#> Source class: Data Frame
#> Data points: 1,010, split in 3 groups
#> Truth variable: `Class`
#> Estimate variables:
#> `.pred_good` ==> good
#> `.pred_poor` ==> poor

I found the same issue with cal_estimate_isotonic(), so I think this is affecting all of the `cal_estimate_*() functions.

AFAICT this is an issue with using split_dplyr_groups() in cal_*_impl_grp()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions