
[Try this handy Probability calculator](https://www.calculators.org/math/probability.php)

## What is Probability?
Definition: the likelihood of an event occurring

Definition  

- $A -> \text{event}$
- $P(A) -> \text{Probability}$

- $ P(A) = \frac{preferred}{all}$
- preferred means outcomes we want to have happen (favorable)
- all means entire sample space

Let's take a coin flip as an example

if you want HEADS as your outcome and there are only two sides of a coin then you can represent it as:

- $ P(A) = \frac{1}{2} = 0.5$

now imagine you have six sided die. we could write it as:

- $ P(A) = \frac{1}{6} = 0.167$  
- you can set A to any of the 6 sides of the die to get your probability

What if you wanted to roll a side of the die that was divisible by 3? you would now have 2 preferred outcomes out of the total possible outcomes of 6. so,

- $ P(A) = \frac{2}{6} = 0.33$ 

note that the probability of two independent outcomes are a product of the two outcomes.

- $ P(A and B) = P(A).P(B)$

Let's use cards to enhance your understanding

- $P(AceofSpades) = P(Ace).P(Spade)$

Let's take winning the lottery and you buy one ticket and there are 100 million tickets sold.

- $P(A) = \frac{1}{1000000}$

## Expected Values

Expected Values Definition: the average outcome we expect if we run an experiment many times.

imagine we don't know what the probability of getting heads. We would flip a coin several times and record the outcome. we we do that we call that a trial. when we do multiple trials we call that an experiment.

trial = several coin flips and recorded outcome
experiment = several trials

So, if we toss a coin 20 times and record 20 outcomes, that is called a single experiment with 20 trials. We'll do the same in machine learning. we'll conduct experiments made up of multiple trials.

so the probabilities we come up with during these experiments are called Experimental Probabilities.

Basically if we don't know the theorietical(true) probabilities we conduct experiments to create experimental probabilities to use in our applications.

Let's take for instance going to the grocery store. If I go to the store and record how many times I stand in line and come up with 8 out of 10 times I have to wait in line, then I can say with a good approximation that 80% of the time I have to wait in line at the grocery store.

## Experimental Probabilities
Experimental Probabilities are easy to compute

- $P(A) = \frac{successful-trials}{all trials}$


## Expected Values

$E(A) ->$ the outcome we expect to occure when we run an experiment

### Categorical Outcomes

- $ A -> SPADE $ or specific suit is equal to $ \frac{1}{4} $

if we did a selection trial of 20 times then, 
- $ E(A) = P(0.25).20$

so, we would expect to select a SPADE 5 times out of the 20 times we selected. There is no guarantee that we'll get a SPADE 5 times during that trial.

### Numerical Outcomes

We use a slightly different formula

Sample Space = {A,B,C}

- $A.P(A) + B.P(B) + C.P(C)$

to get the expected value

Say we're shooting arrows at a target and after a few trials we get our probabilities,

- $A = 10, P(A) = 0.5$
- $B = 20, P(B) = 0.4$
- $C = 100, P(C) = 0.1$

so, $E(X) = (0.5)10 + (0.4)20 + (0.1)100 = 23$ 

## Probability Frequency Distribution

notice below when we run our two dice vectors we have 7 show up 6 times in the matrix

so, 

- $ P(7) = \frac{6}{36} $ or just $\frac{1}{6}$


In [1]:
# Load library
import numpy as np
#create a vector as a row (one die)
vector_row = np.array([1,2,3,4,5,6])

#create a vector as a column (second die)
vector_column = np.array([[1],
                          [2],
                          [3],
                          [4],
                          [5],
                          [6]])

In [2]:
matrix = vector_row + vector_column
matrix

array([[ 2,  3,  4,  5,  6,  7],
       [ 3,  4,  5,  6,  7,  8],
       [ 4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10],
       [ 6,  7,  8,  9, 10, 11],
       [ 7,  8,  9, 10, 11, 12]])

The probability frequency distribution is a collection of the probabilities for each possible outcome. This is how we know that 7 is the most probably number of the two dice. THis is usually expressed as a graph or matrix as in the above example.

We next create a sum table of all the numbers from our matrix

sum_table = (2,3,4,5,6,7,8,9,10,11,12)

Then we create a Frequency Table from the matrix

freq_table = (1,2,3,4,5,6,5,4,3,2,1)

next we create a probability associated with each number 2-12

prob_table = (1/36, 1/18, 1/12, 1/9, 5/36, 1/6, 5/36, 1/9, 1/12, 1/18, 1/36)

we basically divide the frequency by the size of the sample space

So the prob_table is called the __"Probability Frequency Distribution"__


In [3]:
import pandas as pd
df = pd.DataFrame(matrix)
df

Unnamed: 0,0,1,2,3,4,5
0,2,3,4,5,6,7
1,3,4,5,6,7,8
2,4,5,6,7,8,9
3,5,6,7,8,9,10
4,6,7,8,9,10,11
5,7,8,9,10,11,12


In [4]:
df.columns

RangeIndex(start=0, stop=6, step=1)

In [5]:
# change your index and column to start with 1 instead of zero 
df.columns += 1

In [6]:
df.columns

RangeIndex(start=1, stop=7, step=1)

In [7]:
df

Unnamed: 0,1,2,3,4,5,6
0,2,3,4,5,6,7
1,3,4,5,6,7,8
2,4,5,6,7,8,9
3,5,6,7,8,9,10
4,6,7,8,9,10,11
5,7,8,9,10,11,12


In [8]:
df.index += 1

In [9]:
df

Unnamed: 0,1,2,3,4,5,6
1,2,3,4,5,6,7
2,3,4,5,6,7,8
3,4,5,6,7,8,9
4,5,6,7,8,9,10
5,6,7,8,9,10,11
6,7,8,9,10,11,12


## Complements

The complement of an event is everything that an event is not.

If we toss a coin for all possible outcomes we've then exhausted the sample space and have

P(A) + P(B) = 1

we are 100% certain we will get either a heads or tails.

All events have a complement and we notate that with an apostrophe such as A'

also (A')' would equal A

Rolling a die

- $P(A) = P(1) + P(2) + P(4) + P(5) + P(6) = $
- $ \frac{1}{6} + \frac{1}{6} + \frac{1}{6} + \frac{1}{6} + \frac{1}{6} =\frac{5}{6}$

You can also say "The probability of not getting a 3 is", 

- $ = 1 - \frac{1}{6} $
- $ = \frac{5}{6} $
