Skip to content

Commit

Permalink
Merge pull request #104 from tidymodels/confint-api-test-number-2000
Browse files Browse the repository at this point in the history
Confint api test number 2000
  • Loading branch information
topepo committed Jul 12, 2019
2 parents 751bff0 + b8d1fe0 commit cc4ccc3
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 26 deletions.
3 changes: 2 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# `rsample` 0.0.4.9000
# `rsample` 0.0.5

* Added three functions to compute different bootstrap confidence intervals.
* A new function (`add_resample_id`) augments a data frame with columns for the resampling identifier.
* Updated `initial_split`, `mc_cv`, `vfold_cv`, `bootstraps`, and `group_vfold_cv` to use tidyselect on the stratification variable.
* Updated `initial_split`, `mc_cv`, `vfold_cv`, `bootstraps` with new `breaks` parameter that specifies the number of bins to stratify by for a numeric stratification variable.


# `rsample` 0.0.4

Small maintenence release.
Expand Down
36 changes: 17 additions & 19 deletions tests/testthat/test_bootci.R
Original file line number Diff line number Diff line change
Expand Up @@ -93,28 +93,28 @@ test_that("Wrappers -- selection of multiple variables works", {
bt_resamples <- bootstraps(attrition, times = 1000, apparent = TRUE) %>%
mutate(res = map(splits, func))

iris_tidy <-
attrit_tidy <-
lm(Age ~ HourlyRate + DistanceFromHome, data = attrition) %>%
tidy(conf.int = TRUE) %>%
dplyr::arrange(term)

pct_res <-
int_pctl(bt_resamples, res) %>%
inner_join(iris_tidy, by = "term")
inner_join(attrit_tidy, by = "term")
expect_equal(pct_res$conf.low, pct_res$.lower, tolerance = .01)
expect_equal(pct_res$conf.high, pct_res$.upper, tolerance = .01)


t_res <-
int_t(bt_resamples, res) %>%
inner_join(iris_tidy, by = "term")
inner_join(attrit_tidy, by = "term")
expect_equal(t_res$conf.low, t_res$.lower, tolerance = .01)
expect_equal(t_res$conf.high, t_res$.upper, tolerance = .01)


bca_res <-
int_bca(bt_resamples, res, .fn = func) %>%
inner_join(iris_tidy, by = "term")
inner_join(attrit_tidy, by = "term")
expect_equal(bca_res$conf.low, bca_res$.lower, tolerance = .01)
expect_equal(bca_res$conf.high, bca_res$.upper, tolerance = .01)

Expand All @@ -135,26 +135,24 @@ test_that('Upper & lower confidence interval does not contain NA', {
}

set.seed(888)
bt_resamples <- bootstraps(data.frame(x = 1:100), times = 1000, apparent = TRUE) %>% mutate(res = map(splits, bad_stats))
bt_resamples <- bootstraps(data.frame(x = 1:100), times = 1000, apparent = TRUE) %>%
mutate(res = map(splits, bad_stats))

expect_warning(
expect_error(
int_pctl(bt_resamples, res),
"missing values"
)
expect_error(
int_pctl(bt_resamples, res),
"missing values"
)

expect_warning(
expect_error(
int_t(bt_resamples, res),
"missing values"
)
expect_error(
int_t(bt_resamples, res),
"missing values"
)

expect_error(
int_bca(bt_resamples, res, .fn = bad_stats),
"missing values"
)

# expect_error(
# int_bca(bt_resamples, res, .fn = bad_stats),
# "missing values"
# )
})

# ------------------------------------------------------------------------------
Expand Down
12 changes: 6 additions & 6 deletions vignettes/Applications/Intervals.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ library(GGally)
theme_set(theme_bw())
```

The bootstrap was originally intended for estimating confidence intervals for complex statistics whose variance properties are difficult to analytically derive. Davison and Hinkley's [_Bootstrap Methods and Their Applications_](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) is a great resource for these methods. `rsample` contains a few function to compute the most common types of intervals.
The bootstrap was originally intended for estimating confidence intervals for complex statistics whose variance properties are difficult to analytically derive. Davison and Hinkley's [_Bootstrap Methods and Their Application_](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) is a great resource for these methods. `rsample` contains a few function to compute the most common types of intervals.

To demonstrate the computations for the different types of intervals, we'll use a nonlinear regression example from [Baty _et al_ (2015)](https://www.jstatsoft.org/article/view/v066i05). The showed data that monitored oxygen uptake in a patient with rest and exercise phases (in the data frame `O2K`).
To demonstrate the computations for the different types of intervals, we'll use a nonlinear regression example from [Baty _et al_ (2015)](https://www.jstatsoft.org/article/view/v066i05). They showed data that monitored oxygen uptake in a patient with rest and exercise phases (in the data frame `O2K`).

```{r O2K-dat}
library(tidymodels)
Expand All @@ -31,7 +31,7 @@ ggplot(O2K, aes(x = t, y = VO2)) +
geom_point()
```

The authors fit a segmented regression model where the transition point was known (this is the time when exercise commenced).Their model was:
The authors fit a segmented regression model where the transition point was known (this is the time when exercise commenced). Their model was:

```{r O2K-fit}
nonlin_form <-
Expand Down Expand Up @@ -114,7 +114,7 @@ nls_coef %>%

## Percentile intervals

The most basic type of interval uses _percentiles_ of the resampling distribution. To get the percentile intervals, the `rset` objects is passed as the first argument and the second argument is the list column of tidy results:
The most basic type of interval uses _percentiles_ of the resampling distribution. To get the percentile intervals, the `rset` object is passed as the first argument and the second argument is the list column of tidy results:

```{r pctl}
p_ints <- int_pctl(nlin_bt, models)
Expand Down Expand Up @@ -166,7 +166,7 @@ nls_coef %>%

## t-intervals

Bootstrap _t_-intervals are estimated by computing intermediate statistics that are _t_-like in structure. To use these, we require the estimated variance _for each individual resampled estimate_. In our example, this comes along with the fitted model object. We can extract the standard errors of the parameters. Luckily, most `tidy()` provide this in a column names `std.err`.
Bootstrap _t_-intervals are estimated by computing intermediate statistics that are _t_-like in structure. To use these, we require the estimated variance _for each individual resampled estimate_. In our example, this comes along with the fitted model object. We can extract the standard errors of the parameters. Luckily, most `tidy()` provide this in a column named `std.error`.

The arguments for these intervals are the same:

Expand Down Expand Up @@ -216,7 +216,7 @@ fold_incr <- function(split, ...) {
term = "fold increase",
estimate = unname(quants[2]/quants[1]),
# We don't know the analytical formula for this
std.err = NA_real_
std.error = NA_real_
)
}
```
Expand Down

0 comments on commit cc4ccc3

Please sign in to comment.