# Section 1: Simulating Coin Tosses

## Creating the Simulation

Let's generate some data to analyze by running some random simulations. First we will define a function which gives the result of a fair coin toss. This uses the package 'random'.

In [None]:
import random as r # Import the package 'random' with an abbreviation.

# The package contains a function called 'random'. Let's see what it does.
r.random()

The function seems to generate a random real number. Let's experiment to see if we can pin down exactly what it does.

In [None]:
for j in range(20): # Use range(20) to run over all j in the list [0,1,...,19]
    print(r.random())

Apparently it generates a random number between 0 and 1. Indeed, this is the case, as we can check in the documentation here: https://docs.python.org/2/library/random.html#module-random

Now let's use this function to define a new 'coin toss' function.

In [None]:
# Define our 'coin toss' function.
def coin():
    u = r.random()
    if u < 0.5:
        return "H"
    else:
        return "T"

# Toss the coin 10 times and look at the result.
for i in range(0, 10):
    print(coin(), end =" ") # To print the output on the same line, we include an option in the print function.

Since we wish to experiment with randomness, it would be good to have a function which runs the coin toss experiment several times. The input of our function will be a positive integer and the output will be a string of H's and T's.

In [None]:
def run_coin(n):
    result = "" # Initialize the output of the function as an empty string
    for k in range(n):
        result = result + coin() # Loop over the range of n and add the coin toss result to the output each iteration
    return result

run_coin(20)

### Exercise 

Write a function 'count_heads' whose input is a result of the run_coin(n) function and whose output is the number of H's in the string.

Suppose that we would like to determine whether the coin we are tossing is fair. We could toss it, say, 20 times then count the number of heads. Of course, we don't expect to get 10 heads every time we do this, but if we ran this experiment several times then we should expect the average over all experiments to converge to 10.

Let's try it.

In [None]:
for j in range(10):
    print(count_heads(run_coin(20)),end = " ")

## Statistics of the Experiment

Looks reasonable, but it's hard to eyeball whether this is actually a good result. Let's write some functions to:

1) automate our experiment

2) analyze it quantitatively


In [None]:
# First we write a function to carry out the experiment. 
# We want to toss the coin 'num_tosses' times, then repeat this 'num_repeats' times.
# We should store our results in a list.

def coin_experiment(num_tosses,num_repeats):
    results = []
    for j in range(num_repeats):
        results.append(count_heads(run_coin(num_tosses)))
    return results

coin_experiment(20,10)

In [None]:
# Now let's define a function to take the mean of a list.
# Certainly such a function already exists in some package, but let's just create it our ourselves.

def mean(L):
    return sum(L)/len(L)

# Try computing the mean of an experiment:
mean(coin_experiment(20,10))

How do you expect the mean of the experiment to behave if we fix the number of tosses and increase the number of repeats? Vice-versa? We could play around with parameters to get an idea for it.

In [None]:
def mean_list(num_tosses,max_repeats):
    L = [] # Start with an empty list
    for j in range(1,max_repeats):
        L.append(mean(coin_experiment(k,j))/(k/2)) # Append mean value of the experiment for each value of j
        # We suspect the answer tends toward k/2 as j increases. 
        # This normalization will make the behavior more apparent.
    return L

L = mean_list(5,25)
L

It's not so clear what's going on here. An important part of data analysis is visualization. Let's look at a plot of this data. A standard package for generating plots is called matplotlib.

In [None]:
import matplotlib.pyplot as plt 
# plt is a common abbreviation for matplotlib. 
# Here we are only importing a specific module from matplotlib
import numpy as np # We'll also use a function from numpy

trend = np.ones(len(L)) 
# We suspect that the mean_list tends toward 1 as the number of repeats increases.
# Let's plot the trend line to improve the visualization.

plt.plot(L)

plt.plot(trend, color='black', linewidth=1.0 )

This doesn't illustrate a long-term trend as well as we might hope. Let's run the experiment again and let the max number of repeats go out farther.

In [None]:
L = mean_list(25,100)
trend = np.ones(len(L)) 

plt.plot(L); # Semicolons supress [<matplotlib.lines.Line2D at 0x113fa86a0>] from the output
plt.plot(trend, color='black', linewidth=1.0 );

# We also add labels to the axes
plt.xlabel('Number of Repeats');
plt.ylabel('Normalized Mean Over Repeats');

This shows a more clear trend. There were actually two parameters we were interested in playing with. There are various ways we could visualize data across both parameters. One option would be to plot several curves for varying numbers of flips.

In [None]:
max_repeats = 100

trend = np.ones(max_repeats) 

for k in range(25,100,25): # range(25,100,25) creates the list [25,50,75]
    L = mean_list(k,max_repeats)
    plt.plot(L);

plt.plot(trend, color='black', linewidth=1.0 );
plt.xlabel('Number of Repeats');
plt.ylabel('Normalized Mean Over Repeats');

Another way to visualize this would be to plot the mean across both parameters as a function with two inputs and one output.

In [None]:
fig = plt.figure()
ax = fig.gca(projection='3d')

# Make data.
x = np.arange(5,50,5)
y = np.arange(55,100,5)
X, Y = np.meshgrid(X, Y)
Z = np.sqrt(X**2 + Y**2)
Z = np.sin(R)

## More Statistics and Lambda Functions