# Chapter 11: Randomness and Sampling

Whether modeling a real world example like rolling a die or tossing a coin, or randomly selecting a subset of the population for a survey, the process of selecting a choice from a list of options at random is very useful! This chapter will focus on the foundations of random selection including how to implement a random choice in python and extend this random sampling to dataframes. Additionally, we introduce control statements that allow us to select or iterate a statement or process. We ultilize these ideas to <i> simulate</i> experiments, or imitate real world examples using code.


In [1]:
import numpy as np
import pandas as pd

To start, suppose we have a list of choices that are equally likely to occur and we want to choose one. In python, the function
```python
np.random.choice(list)
```
will output exactly one item from the input list and it will do so randomly.

To illustrate, suppose we toss a fair coin and want to know the outcome. Since we expect a random output of heads or tails, we create a list titled <i> coin </i> with those options and call the random.choice function on <i> coin </i> to give us exactly one result from the list as below.

In [2]:
coin = ['heads', 'tails']

flip = np.random.choice(coin)
flip

'tails'

The random choice function does not have a fixed output and running it multiple times will eventually provide a different result. In fact, if we want to run this experiment more than once it might be useful to keep track of the results in an array. 

# Investigation with Arrays

We can store information in arrays as seen in Chapter (chapter number here). Below we create an array that contans our first experiment result, our first flip, and append our next outcome.

In [4]:
first = np.array([flip])
first

array(['tails'], dtype='<U5')

Tossing the coin again, that is choosing randomly from our list of possible outcomes, and then adding this outcome to our list <i> first </i> we use the append function call. This takes an array and elements to be appended as input and outputs an extended array where elements are attached to a copy of the input array.

In [5]:
flip_2 = np.random.choice(coin)
flip_2

'heads'

In [6]:
np.append(first, flip_2)

array(['tails', 'heads'], dtype='<U5')

Since elements are attached to a copy of the input array, appending an item to a list creates a new list and does not change the original. For exampling printing out <i>first</i> gives us the original list with one coin flip result.

In [7]:
first

array(['tails'], dtype='<U5')

To remedy this, we must use assignment. We can either rename our new list <i>first </i> or assign it a new name entirely.

In [8]:
first = np.append(first,'tails')
first

array(['tails', 'tails'], dtype='<U5')

Appending items to lists is an important tool when working with python. However the random choice function allows an additional argument that corresponds to repeating an experiment with results automatically output in an array format. In fact we can repeat the coin flip experiment as many times as we want. Here we repeat 7 times!

In [9]:
outcomes = np.random.choice(coin, 7)
outcomes

array(['heads', 'tails', 'heads', 'heads', 'tails', 'heads', 'tails'],
      dtype='<U5')

Since our experiment is small, we can easily count how many 'heads' or 'tails' we have. If we have a large experiment it might be tedious, even erroneous, to count by hand. We can instead find all instances where our <i>outcomes </i> array is equal to 'heads' and sum over these instances.


 To do this, we consider the array <i>outcomes == 'heads'</i>, which elementwise compares the array <i>outcomes</i> with 'heads' and outputs a Boolean array. That is, each element in <i> outcomes </i> is compared to 'heads' and the truth value of each comparison is output in an array.

In [10]:
outcomes == 'heads'

array([ True, False,  True,  True, False,  True, False])

To count the number of 'heads' in our array <i> outcomes </i>, we can now sum over this Boolean array. Summing over a Boolean array counts all True instances, which is exactly the number of heads in this example.

In [11]:
sum(outcomes == 'heads')

4

We can do the same with 'tails'.

In [12]:
sum(outcomes == 'tails')

3

Since summing over a Boolean array counts all True instances, another way to count the number of 'tails' is by creating the Boolean array <i>outcomes !='heads'</i>, and summing over this instead. Here the Boolean array we create has a True instance when our coin flip landed on 'tails' and false when 'heads' was the result of our flip.

In [13]:
outcomes !='heads'

array([False,  True, False, False,  True, False,  True])

Summing over this array also counts the number of tails!

In [14]:
sum(outcomes !='heads')

3