Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude NAs from percentile calculation in make_strata() #236

Merged
merged 5 commits into from May 7, 2021
Merged

Exclude NAs from percentile calculation in make_strata() #236

merged 5 commits into from May 7, 2021

Conversation

brian-j-smith
Copy link
Contributor

This is a fix for an error returned by make_strata() when trying to create percentiles from numeric data that contain missing values. The code below will reproduce the error.

library(rsample)
make_strata(c(NA, 1:100))

@brian-j-smith
Copy link
Contributor Author

PS There is also an unused variable in the function which I have removed.

@juliasilge
Copy link
Member

Looks like we already took care of excluding NA values for the categorical case (but missed the numeric case, as pointed out here):

library(rsample)

x4 <- rep(c(NA, LETTERS[1:7]), c(5, 37, 26, 3, 7, 11, 10, 2))
unique(x4)
#> [1] NA  "A" "B" "C" "D" "E" "F" "G"
unique(make_strata(x4, pool = 0.01))
#> [1] E A D B C F G
#> Levels: A B C D E F G

Created on 2021-05-07 by the reprex package (v2.0.0)

@brian-j-smith
Copy link
Contributor Author

Hi @juliasilge. Thanks for taking a look. You might also think about excluding missing values from the nunique check, i.e. making the following change.

length(num_vals) <= nunique -> length(na.omit(num_vals)) <= nunique

@juliasilge
Copy link
Member

juliasilge commented May 7, 2021

Thanks so much @brian-j-smith! 🙌

No more errors for the numeric case now:

library(rsample)
make_strata(c(NA, 1:100))
#>   [1] (25.8,50.5] [1,25.8]    [1,25.8]    [1,25.8]    [1,25.8]    [1,25.8]   
#>   [7] [1,25.8]    [1,25.8]    [1,25.8]    [1,25.8]    [1,25.8]    [1,25.8]   
#>  [13] [1,25.8]    [1,25.8]    [1,25.8]    [1,25.8]    [1,25.8]    [1,25.8]   
#>  [19] [1,25.8]    [1,25.8]    [1,25.8]    [1,25.8]    [1,25.8]    [1,25.8]   
#>  [25] [1,25.8]    [1,25.8]    (25.8,50.5] (25.8,50.5] (25.8,50.5] (25.8,50.5]
#>  [31] (25.8,50.5] (25.8,50.5] (25.8,50.5] (25.8,50.5] (25.8,50.5] (25.8,50.5]
#>  [37] (25.8,50.5] (25.8,50.5] (25.8,50.5] (25.8,50.5] (25.8,50.5] (25.8,50.5]
#>  [43] (25.8,50.5] (25.8,50.5] (25.8,50.5] (25.8,50.5] (25.8,50.5] (25.8,50.5]
#>  [49] (25.8,50.5] (25.8,50.5] (25.8,50.5] (50.5,75.2] (50.5,75.2] (50.5,75.2]
#>  [55] (50.5,75.2] (50.5,75.2] (50.5,75.2] (50.5,75.2] (50.5,75.2] (50.5,75.2]
#>  [61] (50.5,75.2] (50.5,75.2] (50.5,75.2] (50.5,75.2] (50.5,75.2] (50.5,75.2]
#>  [67] (50.5,75.2] (50.5,75.2] (50.5,75.2] (50.5,75.2] (50.5,75.2] (50.5,75.2]
#>  [73] (50.5,75.2] (50.5,75.2] (50.5,75.2] (50.5,75.2] (75.2,100]  (75.2,100] 
#>  [79] (75.2,100]  (75.2,100]  (75.2,100]  (75.2,100]  (75.2,100]  (75.2,100] 
#>  [85] (75.2,100]  (75.2,100]  (75.2,100]  (75.2,100]  (75.2,100]  (75.2,100] 
#>  [91] (75.2,100]  (75.2,100]  (75.2,100]  (75.2,100]  (75.2,100]  (75.2,100] 
#>  [97] (75.2,100]  (75.2,100]  (75.2,100]  (75.2,100]  (75.2,100] 
#> Levels: [1,25.8] (25.8,50.5] (50.5,75.2] (75.2,100]

Created on 2021-05-07 by the reprex package (v2.0.0)

@juliasilge juliasilge merged commit 80b62b1 into tidymodels:master May 7, 2021
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators May 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants