<a href="https://colab.research.google.com/github/yardsale8/probability_simulations_in_R/blob/main/1_3_understanding_levels_of_abstraction_in_R.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
library(dplyr)
library(tidyr)
library(purrr)
library(devtools)
install_github('yardsale8/purrrfect', force = TRUE)
library(purrrfect)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: usethis

Downloading GitHub repo yardsale8/purrrfect@HEAD




[36m──[39m [36mR CMD build[39m [36m─────────────────────────────────────────────────────────────────[39m
* checking for file ‘/tmp/RtmpASLmoT/remotes27e39574508/yardsale8-purrrfect-d91fae7/DESCRIPTION’ ... OK
* preparing ‘purrrfect’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘purrrfect_1.0.1.tar.gz’



Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)


Attaching package: ‘purrrfect’


The following objects are masked from ‘package:base’:

    replicate, tabulate




# Creating Random Variables with `mutate` and `map`

In previous notebooks, we performed simulations that resulted table such that
1. There was row per simulated trial, and
2. The outcomes of each trial were stored in a list column.

In this notebook, we will explore techniques for turning a list outcome column into a random variable, that is a number.

## Levels of Abstraction

Before we starting working with a list column of outcomes, it is important that we understand the concepts of nested data structures and levels of abstraction.

## Understanding Levels of Abstraction

<img src="https://github.com/yardsale8/probability_simulations_in_R/blob/main/img/1_3_levels_of_abstraction.png?raw=true" width="600"/>

A list column is an example of a nested data stucture, that is a data structure that contains another data structure.  In the case shown above, we have
1. A dataframe containing
2. A list `.outcome` column containing
3. Integer vectors containing
4. Raw integers.

Each of these represents a level of abstraction.

### Exploring the levels of abstraction using `str`

The `str` functions
- shows us the *structure* of an table, and
- is useful in displaying the levels of abstraction.

In [2]:
die <- 1:20
(trials <- replicate(10, sample(die, 3, replace=TRUE)))

.trial,.outcome
<dbl>,<list>
1,"17, 3, 8"
2,"16, 8, 5"
3,"7, 5, 2"
4,"1, 9, 20"
5,"16, 3, 18"
6,"9, 7, 2"
7,"18, 4, 11"
8,"18, 10, 14"
9,"7, 11, 13"
10,"3, 15, 18"


In [3]:
trials %>% str

tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
 $ .trial  : num [1:10] 1 2 3 4 5 6 7 8 9 10
 $ .outcome:List of 10
  ..$ : int [1:3] 17 3 8
  ..$ : int [1:3] 16 8 5
  ..$ : int [1:3] 7 5 2
  ..$ : int [1:3] 1 9 20
  ..$ : int [1:3] 16 3 18
  ..$ : int [1:3] 9 7 2
  ..$ : int [1:3] 18 4 11
  ..$ : int [1:3] 18 10 14
  ..$ : int [1:3] 7 11 13
  ..$ : int [1:3] 3 15 18


### Reading the `str` output

<img src="https://github.com/yardsale8/probability_simulations_in_R/blob/main/img/1_3_str_shows_levels.png?raw=true" sidth="600">

### Piercing levels of abstraction with `mutate` and `map`
<img src="https://github.com/yardsale8/probability_simulations_in_R/blob/main/img/1_3_mutate_map_and_levels.png?raw=true" width="600">

The functions `mutate` and `map` provide the tools needed to reach into a table and transform data at various levels.
- `mutate(col1 = f(col1))` will apply the functions `f` to the whole `col1`
- `mutate(col1 = map(col1, f)` will apply the functions `f` to each element of`col1`
- `mutate(col1 = map(col1, f)` will apply the functions `f` to each element of`col1`
- `mutate(col1 = map(col1, \(x) map(x, f))` will apply the functions `f` to each element of each of the lists in `col1`


### Saving Simple Outcomes
An experiment with only one outcome per trial is said to have simple outcomes.  In this case, we should be able to store the outcomes in column that isn't a list column, but instead holds raw integers/double/characters/Booleans.  To do this, we will need to specify an alternative form of `replicate`

#### Example 1 - Flip a fair coin once

Suppose we roll a fair coin and want to know the probility of a head.

Note that if we use `replicate` we get a list column.

In [20]:
coin <- c('H', 'T')
(trials <- replicate(10, sample(coin, 1, replace = TRUE)))

.trial,.outcome
<dbl>,<list>
1,H
2,H
3,T
4,T
5,H
6,H
7,T
8,T
9,H
10,H


In [21]:
trials %>% str

tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
 $ .trial  : num [1:10] 1 2 3 4 5 6 7 8 9 10
 $ .outcome:List of 10
  ..$ : chr "H"
  ..$ : chr "H"
  ..$ : chr "T"
  ..$ : chr "T"
  ..$ : chr "H"
  ..$ : chr "H"
  ..$ : chr "T"
  ..$ : chr "T"
  ..$ : chr "H"
  ..$ : chr "H"


Note that we have an extra, unneeded level of abstraction here.  A list of one character.  This could simple be the character!

We can simplify the output by using `replicate_chr` to force the output column to be a column of characters.

In [24]:
(trials <- replicate_chr(10, sample(coin, 1, replace = TRUE)))

.trial,.outcome
<dbl>,<chr>
1,T
2,T
3,H
4,H
5,T
6,H
7,H
8,T
9,T
10,T


### <font color='red'> Exercise 1.3.1 - Simple Dice Rolls</font>

Set up an experiment that involves rolling a fair 6-sided die once.  Be sure to
1. Make the outcome column have the integer type, and
2. Use `str` to verify the structure.

In [27]:
# Your code here

## Saving Compound Simulations

Most of the experiments in the previous lectures were compound experiments, with multiple outcomes per trial.  As you have seen, we have been using `replicate` on `sample` with the resulting outcomes stored in a `list` column.  In the next exercise, you will use `str` to explore and describe the levels of abstraction for such an experiment.

### <font color='red'> Exercise 1.3.2 - Many Dice Rolls</font>

Set up an experiment that involves rolling a fair 6-sided die four times.  Use `str` to explore the structure, the describe the levels of abstraction.

In [26]:
# Your code here

tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
 $ .trial  : num [1:10] 1 2 3 4 5 6 7 8 9 10
 $ .outcome: chr [1:10] "T" "T" "H" "H" ...


<font color="orange"> < Your description here></font>