Permalink
Branch: master
Find file Copy path
08f47ee Jun 10, 2018
3 contributors

Users who have contributed to this file

@ismayc @andrewpbray @mine-cetinkaya-rundel
224 lines (176 sloc) 5.06 KB
---
title: "Examples using `mtcars` data"
author: "Chester Ismay and Andrew Bray"
date: "2018-01-05"
output:
rmarkdown::html_vignette
vignette: |
%\VignetteIndexEntry{mtcars example}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r include=FALSE}
knitr::opts_chunk$set(fig.width = 8, fig.height = 5)
```
**Note**: The `type` argument in `generate()` is automatically filled based on the entries for `specify()` and
`hypothesize()`. It can be removed throughout the examples that follow. It is left in to reiterate the type of generation process being performed.
## Data preparation
```{r message=FALSE, warning=FALSE}
library(infer)
library(dplyr)
mtcars <- mtcars %>%
mutate(cyl = factor(cyl),
vs = factor(vs),
am = factor(am),
gear = factor(gear),
carb = factor(carb))
# For reproducibility
set.seed(2018)
```
***
One numerical variable (mean)
```{r}
mtcars %>%
specify(response = mpg) %>% # formula alt: mpg ~ NULL
hypothesize(null = "point", mu = 25) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "mean")
```
One numerical variable (median)
```{r}
mtcars %>%
specify(response = mpg) %>% # formula alt: mpg ~ NULL
hypothesize(null = "point", med = 26) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "median")
```
One categorical (2 level) variable
```{r}
mtcars %>%
specify(response = am, success = "1") %>% # formula alt: am ~ NULL
hypothesize(null = "point", p = .25) %>%
generate(reps = 100, type = "simulate") %>%
calculate(stat = "prop")
```
Two categorical (2 level) variables
```{r}
mtcars %>%
specify(am ~ vs, success = "1") %>% # alt: response = am, explanatory = vs
hypothesize(null = "independence") %>%
generate(reps = 100, type = "permute") %>%
calculate(stat = "diff in props", order = c("0", "1"))
```
One categorical (>2 level) - GoF
```{r}
mtcars %>%
specify(cyl ~ NULL) %>% # alt: response = cyl
hypothesize(null = "point", p = c("4" = .5, "6" = .25, "8" = .25)) %>%
generate(reps = 100, type = "simulate") %>%
calculate(stat = "Chisq")
```
Two categorical (>2 level) variables
```{r warning = FALSE}
mtcars %>%
specify(cyl ~ am) %>% # alt: response = cyl, explanatory = am
hypothesize(null = "independence") %>%
generate(reps = 100, type = "permute") %>%
calculate(stat = "Chisq")
```
One numerical variable one categorical (2 levels) (diff in means)
```{r}
mtcars %>%
specify(mpg ~ am) %>% # alt: response = mpg, explanatory = am
hypothesize(null = "independence") %>%
generate(reps = 100, type = "permute") %>%
calculate(stat = "diff in means", order = c("0", "1"))
```
One numerical variable one categorical (2 levels) (diff in medians)
```{r}
mtcars %>%
specify(mpg ~ am) %>% # alt: response = mpg, explanatory = am
hypothesize(null = "independence") %>%
generate(reps = 100, type = "permute") %>%
calculate(stat = "diff in medians", order = c("0", "1"))
```
One numerical one categorical (>2 levels) - ANOVA
```{r}
mtcars %>%
specify(mpg ~ cyl) %>% # alt: response = mpg, explanatory = cyl
hypothesize(null = "independence") %>%
generate(reps = 100, type = "permute") %>%
calculate(stat = "F")
```
Two numerical vars - SLR
```{r}
mtcars %>%
specify(mpg ~ hp) %>% # alt: response = mpg, explanatory = cyl
hypothesize(null = "independence") %>%
generate(reps = 100, type = "permute") %>%
calculate(stat = "slope")
```
One numerical variable (standard deviation)
**Not currently implemented**
```{r eval=FALSE}
mtcars %>%
specify(response = mpg) %>% # formula alt: mpg ~ NULL
hypothesize(null = "point", sigma = 5) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "sd")
```
### Confidence intervals
One numerical (one mean)
```{r}
mtcars %>%
specify(response = mpg) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "mean")
```
One numerical (one median)
```{r}
mtcars %>%
specify(response = mpg) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "median")
```
One numerical (standard deviation)
```{r}
mtcars %>%
specify(response = mpg) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "sd")
```
One categorical (one proportion)
```{r}
mtcars %>%
specify(response = am, success = "1") %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "prop")
```
One numerical variable one categorical (2 levels) (diff in means)
```{r}
mtcars %>%
specify(mpg ~ am) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "diff in means", order = c("0", "1"))
```
Two categorical variables (diff in proportions)
```{r}
mtcars %>%
specify(am ~ vs, success = "1") %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "diff in props", order = c("0", "1"))
```
Two numerical vars - SLR
```{r}
mtcars %>%
specify(mpg ~ hp) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "slope")
```
Two numerical vars - correlation
```{r}
mtcars %>%
specify(mpg ~ hp) %>%
generate(reps = 100, type = "bootstrap") %>%
calculate(stat = "correlation")
```