# Lecture 1 (pt. 2)

## Basic python operations
## Set operations
## First Random Experiment and Simulations
## Fair experiment and relative frequency

Working with lists:

In [2]:
a=[2,3]
b=a
a is b


True

In [3]:
b.append(4)
b

[2, 3, 4]

In [4]:
a

[2, 3, 4]

You can also append to lists with +:

In [6]:
b = a.copy()

b is a

False

In [8]:
b.append(5)
b

[2, 3, 4, 5, 5]

In [9]:
a

[2, 3, 4]

Lists and tuples may contain any other objects, including other lists and tuples:

In [10]:
c =(1,2,4)

In [11]:
type(c)

tuple

Note that tuples and lists are ordered collections, and we can access their members directly:

In [12]:
a[0]

2

In [13]:
c[0]

1

* Negative indexes start from the end of the list, with -1 denoting the last member in the list:

In [14]:
c[-1]

4

## Modules and Libraries

* Many of the tools we will use in the class are not directly part of Python
* Instead, they are libraries or modules that provide particular functionality
* These include:

* **numpy** provides arrays, linear algebra, and math functions (many similar to the core MATLAB functions)
* **matplotlib** provides functions to generate plots similar to those in MATLAB
* **random** contains functions for generating random numbers and choices
* **scipy** provides many tools used in scientific computing including optimization, signal processing, and statistics
* **pandas** provides tools for working with data

To work with these libraries, import them:

In [None]:
!jt -r  #(reset to default theme in jupyter)

In [None]:
# use jupytertheme to change font size and theme (used in lecture only)
!jt -t grade3 -fs 24  -tfs 24 -ofs 22 -cellw 100%
#!jt  -fs 12


# Sets

In [15]:
Aset = {1,2,4}
print(Aset)

{1, 2, 4}


In [17]:
Aset= set([1,1,1,2,3,4])
print(Aset)

{1, 2, 3, 4}


## Union

In [18]:
Bset= {3,4,5}
AunionB = set.union(Aset,Bset)
print(AunionB)

{1, 2, 3, 4, 5}


## set difference

In [19]:
Cset = Aset - Bset
print(Cset)

{1, 2}


## set intersection

In [20]:
Dset = Aset.intersection(Bset)
print(Dset)

{3, 4}


see [More about Python Sets]{https://realpython.com/python-sets/}

## import some library for the following experiments

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('bmh')

# First Random Experiment and Simulations

Consider the following questions:

1. **If you flip a coin 20 times, how many times do you think it will come up heads?** 

2. **If you flip the coin 20 times and it comes up heads 6 times, do you think it is a *fair* or *unfair* coin? How *confident* can you be in your answer?**

* Can you conduct an experiment to answer these questions? 
* What can be a potential problem?

If we take a **fair coin** and flip it 20 times and count the number of heads, and then **repeat the experiment many times**, we can **estimate** how often 6 or fewer heads occurs. If it occurs very rarely (say, less than 5% of the time, then we can say that the coin is unlikely to be fair).

*Here we use 6 or fewer heads because if 5 heads occurs, that is an even  more extreme outcome than 6 heads occurring, and so we want to count up how often we see an outcome as extreme OR MORE as 6 heads occurring.*

* The **problem** is that we may need to repeat the experiment (of flipping the coin 20 times) many times to accurately estimate how often 6 or fewer heads come up. This may require thousands of coin flips!

We can overcome this problem by using a computer to flip the coin in a **simulation**. A **computer simulation** is a computer program that models reality and allows us to conduct experiments that:

* would require a lot of time to carry out in real life
* would require a lot of resources to carry out in real life
* would not be possible to repeat in real life (for instance, simulation of the next day's weather or stock market performance)

Let's build simulations of our coin flip experiment and learn about some Python libraries:

In [None]:
# Simple library for working with random phenomena
import numpy
import random

# to learn about random.choice, random.choices, and random.randint function




In [None]:
# Generate a random integer between 1 and 10 (both inclusive)
random_number = random.randint(1, 10)

Suppose we want to see how often 6 or fewer heads occurs. We can reduce the printing by only printing those extreme events:

* We really don't care about the particular experiment on which those events occur. Instead, we are really just looking at the **frequency** of those events.

<div class="alert alert-info" role="alert">
  <strong>Relative Frequency</strong>
    
The <strong>relative frequency</strong> of an event is the number of times that an event occurs divided by the number of times the experiment is conducted. 
</div>

Let's modify the experiment to calculate the relative frequency of getting 6 or fewer heads on 20 flips of a fair coin:

# Fair Experiment

**<font color=blue>Example 1:</font>**: The probability of getting any number on a fair 6-sided die is 1/6. Let's compare these to the *relative frequencies*.

But first let's see how to count the number of occurrences of each outcome:

In [None]:
num_sims=100
values=[]
for sim in range(num_sims):
    die=random.choice(range(1,7))
    values+=[die]
print(values)

Let's first keep a counter for each face value and increment that counter whenever we see that face value. Start with a list of 6 zeros:

We can use these counters to make our first plots. Let's start with a simple bar graph:

Adding some labels for the bar plot

In [None]:
plt.bar(vals, counters)
plt.xlabel('Die Face')
plt.ylabel('Number of Occurences')

Here is a more elegant approach (using ```numpy```) if we just want the counts of the outcomes:

In [None]:
num_sims=1000
outcomes=[]
for sim in range(num_sims):
    die=random.choice(range(1,7))
    outcomes+=[die]
    
# The magic counting code goes here...

# TO BE COMPLETED IN CLASS

Then to get the **relative frequencies** is easy:

In [None]:
num_sims=100
outcomes=[]
for sim in range(num_sims):
    die=random.choice(range(1,7))
    outcomes+=[die]

# TO BE COMPLETED IN CLASS    


* How does the relative frequency of each outcome change as we increase/decrease the number of simulations?

* What is your conclusion in terms of amount of data needed?

* Does the relative frequency *converge* to some value as the number of simulations increases?