<a href="https://colab.research.google.com/github/yardsale8/probability_simulations_in_R/blob/main/2_1_simulating_the_binomial_distribution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [6]:
library(dplyr)
library(tidyr)
library(purrr)
library(devtools)
install_github('yardsale8/purrrfect', force = TRUE)
library(purrrfect)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: usethis

Downloading GitHub repo yardsale8/purrrfect@HEAD




[36m──[39m [36mR CMD build[39m [36m─────────────────────────────────────────────────────────────────[39m
* checking for file ‘/tmp/RtmpvpQ25T/remotes2a91181bae4/yardsale8-purrrfect-d91fae7/DESCRIPTION’ ... OK
* preparing ‘purrrfect’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘purrrfect_1.0.1.tar.gz’



Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)


Attaching package: ‘purrrfect’


The following objects are masked from ‘package:base’:

    replicate, tabulate




# Simulating the Binomial Distribution

In this notebook, we will apply what we've learned to problems related to the binomial distribution.  We will do this by

1. Simulating the raw outcomes using `sample` and converting to the number of success, as well as
2. Simulating the number of successes directly using `rbinom`.

Finally, we will illustrate how to use simulations to estimate the expected value.

### Review - Bernoulli Process

A simple outcomes is generated by a [Bernoulli process](https://en.wikipedia.org/wiki/Bernoulli_process), provided
1. Outcomes are independent,
2. There are two possible outcomes (denoted success and failure), and
3. The probability of a success is contant.

### The binomial distribution

Suppose that we are generating $n$ outcomes from a Bernoulli process with the probability of success given by $p$.  If $X$ is the number of successes in the $n$ trials, then $X$ will have a [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution)

## Strategies for simulating the binomial distribution

1. Simulate the raw Bernoulli trials, then transform into the number of success using either reshaping or `mutate` + `map`.
2. Simulate the number of successes directly using `rbinom`.

### Example - LeBron shoots free throws

LeBron James is NBA player with a stored career.  Over the course of his career, he has made 73.5% of all free throw attempts.  Suppose that we want to model the number of shots he makes in 10 attempts using the binomial distribution.

We will illustrate this process in three ways,

1. Simulating individual shots and reshaping to compute number of successes,
2. Simulating individual shots and using `mutate` + `map` to compute number of successes, and
3. Simulating the number of successing directly using `rbinom`

#### Setting up a sample space

We need to sample from a space with a 73.5% chance of a made free throw, which will be accomplished using a probability vector

In [12]:
shot <- c('Make', 'Miss')
shot.probs <- c(0.735, 1 - 0.735)

In [13]:
replicate(10, sample(shot, 10, replace = TRUE, prob = shot.probs))

.trial,.outcome
<dbl>,<list>
1,"Make, Make, Make, Make, Make, Make, Make, Make, Miss, Make"
2,"Make, Make, Make, Make, Make, Miss, Miss, Make, Make, Miss"
3,"Make, Make, Miss, Make, Make, Miss, Make, Miss, Miss, Make"
4,"Make, Make, Miss, Make, Make, Make, Make, Make, Miss, Miss"
5,"Make, Make, Miss, Miss, Make, Make, Make, Miss, Make, Make"
6,"Make, Make, Make, Make, Miss, Make, Miss, Make, Make, Make"
7,"Miss, Make, Make, Miss, Make, Miss, Make, Make, Make, Miss"
8,"Make, Miss, Make, Make, Miss, Make, Miss, Make, Make, Make"
9,"Make, Make, Miss, Make, Make, Make, Make, Make, Make, Make"
10,"Make, Make, Make, Make, Miss, Miss, Make, Make, Make, Make"


#### Approach 1 - Simulate individual shots, reshape, and compute successes.

Comment out lines to explore the output in each step.

In [20]:
N <- 10
(replicate(N, sample(shot, 10, replace = TRUE, prob = shot.probs), .reshape = 'stack')
 %>% mutate(is.success = ifelse(.outcome == 'Make', 1, 0))
 %>% group_by(.trial) %>% summarise(num.successes = sum(is.success))
 )

.trial,num.successes
<dbl>,<dbl>
1,7
2,5
3,8
4,8
5,9
6,8
7,7
8,9
9,5
10,8


#### Approach 2 - Simulate individual shots, recode outcomes, and compute successes using `mutate` and `map`

Comment out lines to explore the output in each step.

In [22]:
N <- 10
(replicate(N, sample(shot, 10, replace = TRUE, prob = shot.probs))
 %>% mutate(is.success = map(.outcome, \(x) ifelse(x == 'Make', 1, 0)))
 %>% mutate(num.successes = map_int(is.success, sum))
 )

.trial,.outcome,is.success,num.successes
<dbl>,<list>,<list>,<int>
1,"Miss, Make, Make, Make, Miss, Make, Miss, Miss, Make, Make","0, 1, 1, 1, 0, 1, 0, 0, 1, 1",6
2,"Make, Make, Make, Miss, Miss, Make, Make, Make, Make, Make","1, 1, 1, 0, 0, 1, 1, 1, 1, 1",8
3,"Make, Make, Miss, Make, Miss, Make, Make, Make, Make, Make","1, 1, 0, 1, 0, 1, 1, 1, 1, 1",8
4,"Miss, Make, Make, Make, Make, Make, Make, Make, Miss, Make","0, 1, 1, 1, 1, 1, 1, 1, 0, 1",8
5,"Make, Make, Make, Make, Make, Make, Make, Make, Make, Miss","1, 1, 1, 1, 1, 1, 1, 1, 1, 0",9
6,"Miss, Make, Miss, Make, Make, Miss, Miss, Make, Make, Make","0, 1, 0, 1, 1, 0, 0, 1, 1, 1",6
7,"Miss, Make, Miss, Miss, Make, Miss, Make, Make, Make, Make","0, 1, 0, 0, 1, 0, 1, 1, 1, 1",6
8,"Make, Make, Miss, Make, Miss, Miss, Make, Make, Miss, Miss","1, 1, 0, 1, 0, 0, 1, 1, 0, 0",5
9,"Miss, Make, Make, Make, Make, Make, Make, Make, Make, Make","0, 1, 1, 1, 1, 1, 1, 1, 1, 1",9
10,"Miss, Miss, Make, Make, Miss, Make, Make, Make, Make, Make","0, 0, 1, 1, 0, 1, 1, 1, 1, 1",7


#### Approach 3 - Simulate the number of succeses directly using `rbinom`

Note that the signature of `rbinom` is `rbinom(n, size, prob)` (use help!).  Here will use `n = 1` to mean one experiment per row, `size = 10` to represent the attempts per trial/experiment, and `prob = 0.735`

**Important.** Be sure to use `replicate_int` to get a simple integer `.outcome` column.

In [25]:
N <- 10
(replicate_int(N, rbinom(1, 10, 0.735))
)

.trial,.outcome
<dbl>,<int>
1,6
2,7
3,10
4,7
5,8
6,5
7,6
8,8
9,5
10,4


In [23]:
?rbinom