# Introduction to Statistics
This notebook will serve as an introduction into statistics and probability

# Section 1.0 -  Probability

Probability is the likelihood of an event occurring.

$$P(X)=\frac{Preferred\;Outcomes}{Sample\;Space}$$

An **event** is a set of outcomes from a sample space that we are interested in. In other words an event is the preferred outcomes.<br><br>
The Probability Formula:
* The Probability of event X occurring equals the **number** of preferred outcomes over the **number** of outcomes in the
sample space.
* Preferred outcomes are the outcomes we want to occur or the outcomes we are interested in. We also call refer to such
outcomes as “Favorable”.
* Sample space refers to all possible outcomes that can occur. Its “size” indicates the amount of elements in it.

<br>If two events are **independent**:
The probability of them occurring simultaneously equals the product of them occurring on their own.<br>
For example, the probability of drawing an Ace does not depend of the probability of drawing a spade. So the probability of drawing an Ace of Spaces equals:
$$P(A\spadesuit)=P(A)*P(\spadesuit)$$
$$P(A\spadesuit)=\frac{4}{52}*\frac{13}{52}= 0.0192 \approx 1.92%$$


## Section 1.1 - Expected Values

**Trail** - Observing an event occurance and recording the outcome.  
**Experiment** - A collection of one or more trails.  
**Experimental Probability** - The probability we assign an event, based on an experiment we conduct.  

**Expected Value** - The specific outcome we expect to occur when we run an experiment.

1. <u>Example of a Trail</u>: Flipping a coin and recording the outcome
2. <u>Example of an Experiment</u>: Flipping a coin 20 time and recording the 20 individual outcomes
<br><br>



### Expected Value for categorical variables
A categorical variable is something like:
* Product Ratings (e.g. "Poor", "Average", "Good", "Excellent")
* Survey responses (e.g. "Yes", "No", "Maybe")

A numerical value **n** must be assigned to the variable.

The expected value for categorical variables is $E(X) = n*p$

#### Example:
Lets say we asked 1000 people how statisted they are. With the options being "Unsatisfied", "Neutral", or "Satisfied".<br>
Let assume we assigned "Unsatisfied" to 1, "Neutral" to 2, and "Satisfied" to 3. And the results of the survery as such:
* "Unsatisfied" (score = 1) -> 10 responses
* "Neutral" (score = 2) -> 30 responses
* "Satisfied (score = 3) -> 60 responses

$$E(X) = 1*\frac{10}{100} + 2*\frac{30}{100} + 3*\frac{60}{100} = 2.5

### Expected Value for numerical variables
$$E(X) = \sum_{i=1}^n x_i*p_i$$

## Sections 1. 2 - Combinatorics

Combinatorics is a branch of mathematics focused on counting, arranging and combinining objects - often under specific rules or constraints

### Section 1.2.1 - Permutations
Permutations represent the number of different possible ways we can arrange a number of elements.
$$P(n) = n*(n-1)*(n-2)*(n-3)*...*1$$

Characteristics of Permutations:
* Arranging all elements within the sample space.
* No repetition.
* $P(n) = n*(n-1)*(n-2)*(n-3)*...*1 = n!$ (Called "n factorial")

#### Example:
* If we need to arrange 3 people, we would have P(3)= 6 ways of doing so.
* Assume the people are "Jabari", "Ameer", "Tariq"

<br>We could arrange the following ways:
1. Jabari, Ameer, Tariq
2. Jabari, Tariq, Ameer
3. Ameer, Jabari, Tariq
4. Ameer, Tariq, Jabari
5. Tariq, Jabari, Ameer
6. Tariq, Ameer, Jabari

### Rules for factorials:
* $0!=1$
* If $n < 0, n!$ does not exist
* $(n + k)! = n!*(n+1)*...*(n+k)$
* $(n-k)! = \frac{n!}{(n-k+1)*...*(n-k+k)} = \frac{n!}{(n-k+1)*...*(n)}$
* $\frac{n!}{k!}=\frac{k!*(k+1)*...*n}{k!}=(k+1)*...*n$

#### Examples:
Let n=7,k=4
* $(7 + 4)! = 11! = 7!*8*9*10*11$
* $(7-4)! = 3! = \frac{7!}{4*5*6*7}$
* $\frac{7!}{4!}=5*6*7$

### Section 2.2 - Variations
Variations represent the number of different possible ways we can <u>pick</u> and <u>arrange</u> a number of elements.<br><br>
Variations **with** repetition
$$\overline{V}(n,p) = n^p$$
Intution behind the formula (\w repetition):
* We have n-many options for the first element
* We still have n-many options for the second element because repetition is allowed.
* We have n-many options for each of the p-many elements
* $n*n*n*...*n = n^p$

<br><br>Variations **without** repetition
$$V(n,p)=\frac{n!}{(n-p)!}$$
Intution behind the formula (\wo repetition):
* We have n-many options for the first element
* We still have (n-1)-many options for the second element because we can't repeat the value we chose to start with.
* We have less options left for each additional element
* $n*(n-1)*(n-2)*...*(n-p+1) = \frac{n!}{(n-p)!}$

### Section 2.3 - Combinations
Combinations represent the number of different possible ways we can pick a number of elements.
$$C(n,p) = C_p^n = \frac{n!}{(n-p)!p!}$$
Characteristics of Combinations:
* Takes into account double-counting. (Selecting Jabari, Ameer, Tariq and Makenna is the same as selecting Makenna, Tariq, Ameer and Jabari)
* All the different permutations of a single cobmination are different variations
* $$ C = \frac{V}{P} = \frac{n!/(n-p)!}{p!} = \frac{n!}{p!(n-p)!}$$
* Combinations are symmetric, so $C_p^n = C_{n-p}^n$, since selectiing p elements is the same as omitting n-p elements

#### Section 2.3.1 - Combinations where order matters
$$\overline{C}_p^n = C_p^{n+p-1}$$
In this case, selecting Jabari, Ameer, Tariq and Makenna is **NOT** the same as selecting Makenna, Tariq, Ameer and Jabari

#### Section 2.3.2 - Combinations with seperate sample spaces
Combinations represent the number of different possible ways we can pick a number of elements.
$$ C = n_1 * n_2 *...* n_p$$
where $n_1$ is the size of the first sample space, $n_2$ is the size of the second sample space,...,$n_p$ is the size of the p-th sample space
Characteristics of Combinations with separate sample spaces:
* The option we choose for any element does not affect the number of options for the other elements.
* The order in which we pick the individual elements is arbitrary.
* We need to know the size of the sample space for each individual element. $(n_1,n_2...n_p)$

## Section 1.3 - Bayesian Notation

### Section 1.3.1 - Sets
A **set** is a collection of elements, which hold certain values. Additionally, every event has a set of outcomes
that satisfy it.
The null-set (or empty set), denoted $\emptyset$, is an set which contain no values.

$$x \in A$$
where the Element x is lower-case and the Set A is upper-case<br>
Notation:
* $x \in A$ means "Element x is a part of set A". Example: $2 \in All\,even\,numbers$
* $A \ni x$ means "Set A contains element x". Example: $All\,even\,numbers \ni 2$
* $x \notin A$ means "Element x is NOT a part of set A'. Example $1 \in All\,even\,numbers$
* $\forall x:$ means "For all/any x such that...". Example: $\forall x:x \in All\,even\,numbers$
* $A \subseteq B$ means "A is a subset of B"   Example: $Even\,numbers \subseteq Intergers$

Remember! Every set has at least 2 subsets
* $ A \subseteq A$
* $ \emptyset \subseteq A$

### Section 1.3.2 - Multiple Events

In [1]:
from ipycanvas import Canvas
from math import pi,sin,cos

def write_text(c,string,x,y,size):
    c.fill_style = "white"
    c.font = f"{size}px serif"
    c.fill_text(string, x, y)

def draw_circle(c,x,y,r,color,fontsize,text):
    c.fill_style = color
    c.fill_circle(x, y, r)
    write_text(c,text,x-r/3,y-r/3,fontsize)

def draw_intersect(c,x2,y2,r2,color):  
    c.fill_style = color 
    c.global_composite_operation = 'source-atop';
    c.fill_circle(x2,y2,r2)
    c.global_composite_operation = 'destination-over';

canvas = Canvas(width=1200, height=300)
sectionWidth=400
sectionStart=0

write_text(canvas,"Not touching at all",50,32,18)
draw_circle(canvas,150,150,100,"red",32,"A")
draw_circle(canvas,325,150,50,"orange",32,"B")

sectionStart=400
canvas.stroke_style = "white"
canvas.stroke_rect(sectionStart, 0, sectionWidth, canvas.height)
write_text(canvas,"Intersect (Partially Overlap)",sectionStart+50,32,18)
x1=sectionStart+200;y1=150;r1=100
x2=sectionStart+300;y2=150;r2=50

intersectColor="#CC710A"
draw_circle(canvas,x1,y1,r1,"red",32,"A")
draw_intersect(canvas,x2,y2,r2,intersectColor)
draw_circle(canvas,x2,y2,r2,"orange",32,"B")
canvas.global_composite_operation = 'source-over'
write_text(canvas,"B",x2+r2/4,y2-r2/3,32)

sectionStart=800
canvas.stroke_style = "white"
canvas.stroke_rect(sectionStart, 0, sectionWidth, canvas.height)
write_text(canvas,"One completely overlaps the other",sectionStart+50,32,18)
draw_circle(canvas,sectionStart+200,150,100,"red",32,"A")
draw_circle(canvas,sectionStart+200,175,50,intersectColor,32,"B")

canvas.global_composite_operation = 'destination-over'
canvas.fill_style = "#197186"
canvas.fill_rect(0, 0, canvas.width, canvas.height)
canvas.stroke_style = "white"
canvas.stroke_rect(sectionStart, 0, sectionWidth, canvas.height)

canvas

Canvas(height=300, width=1200)

Examples:
1. Not touching at all: $A \subseteq \clubsuit$ , $B \subseteq \spadesuit$
2. Intersecting: $A \subseteq \clubsuit$ , $B \subseteq Queen$
3. Completely overlaps: $A \subseteq Black\;Cards$ , $B \subseteq \spadesuit$

The **intersection** of two or more events expresses the set of outcomes that satisfy all the events
simultaneously. Graphically, this is the area where the sets intersect.<br>
We denote the interection of two sets as:
$$A \cap B$$

In [2]:
from ipycanvas import Canvas
from math import pi,sin,cos

def write_text(c,string,x,y,size):
    c.fill_style = "white"
    c.font = f"{size}px serif"
    c.fill_text(string, x, y)

def draw_circle(c,x,y,r,color,fontsize,text):
    c.fill_style = color
    c.fill_circle(x, y, r)
    write_text(c,text,x-r/3,y-r/3,fontsize)

def draw_intersect(c,x2,y2,r2,color):  
    c.fill_style = color 
    c.global_composite_operation = 'source-atop';
    c.fill_circle(x2,y2,r2)
    c.global_composite_operation = 'destination-over';

canvas = Canvas(width=400, height=300)
sectionWidth=400
sectionStart=0

sectionStart=0
canvas.stroke_style = "white"
canvas.stroke_rect(sectionStart, 0, sectionWidth, canvas.height)
write_text(canvas,"Union",sectionStart+50,32,18)
x1=sectionStart+200;y1=150;r1=100
x2=sectionStart+300;y2=150;r2=50

intersectColor="#CC710A"
draw_circle(canvas,x1,y1,r1,"red",32,"A")

draw_intersect(canvas,x2,y2,r2,intersectColor)
draw_circle(canvas,x2,y2,r2,"orange",32,"B")
canvas.global_composite_operation = 'source-over'
write_text(canvas,"B",x2+r2/4,y2-r2/3,32)


canvas.global_composite_operation = 'destination-over'
draw_circle(canvas,x1,y1,r1*1.05,"white",32,"A")
draw_circle(canvas,x2,y2,r2*1.1,"white",32,"A")
canvas.fill_style = "#197186"
canvas.fill_rect(0, 0, canvas.width, canvas.height)
canvas.stroke_style = "white"
canvas.stroke_rect(sectionStart, 0, sectionWidth, canvas.height)

canvas

Canvas(height=300, width=400)

The **union** of two or more events expresses the set of outcomes that satisfy at least one of the events.<br>
Graphically, this is the area that includes both sets. We denote the union of two sets as:
$$ A\cup B$$
$$ A \cup B = A + B - A \cap B $$

In [3]:
from ipycanvas import Canvas
from math import pi,sin,cos

def write_text(c,string,x,y,size):
    c.fill_style = "white"
    c.font = f"{size}px serif"
    c.fill_text(string, x, y)

def draw_circle(c,x,y,r,color,fontsize,text):
    c.fill_style = color
    c.fill_circle(x, y, r)
    write_text(c,text,x-r/3,y-r/3,fontsize)

def draw_intersect(c,x2,y2,r2,color):  
    c.fill_style = color 
    c.global_composite_operation = 'source-atop';
    c.fill_circle(x2,y2,r2)
    c.global_composite_operation = 'destination-over';

canvas = Canvas(width=800, height=300)
sectionWidth=400
sectionStart=0



write_text(canvas,"Mutually Exclusive Sets",50,32,18)
draw_circle(canvas,150,150,100,"red",32,"A")
draw_circle(canvas,325,150,50,"orange",32,"B")

sectionStart=400
write_text(canvas,"Complements",sectionStart+50,32,18)
draw_circle(canvas,sectionStart+150,150,100,"red",32,"A")
canvas.global_composite_operation = 'destination-over';
canvas.fill_style = "orange"
canvas.fill_rect(sectionStart, 0, sectionStart, canvas.height)
canvas.global_composite_operation = 'source-atop';
write_text(canvas,"B",sectionStart+sectionWidth-100,150,32)
canvas.stroke_style = "white"
canvas.stroke_rect(sectionStart, 0, sectionWidth, canvas.height)

canvas.global_composite_operation = 'destination-over';
canvas.fill_style = "#197186"
canvas.fill_rect(0, 0, canvas.width, canvas.height)
canvas.stroke_style = "white"
canvas.stroke_rect(sectionStart, 0, sectionWidth, canvas.height)

canvas

Canvas(height=300, width=800)

Sets with no overlapping elements are called **mutually exclusive**. Graphically, their circles never touch. <br>
If $ A \cap B = \emptyset$,then the two sets are mutually exclusive <br>

The **complement** of a event is ALL that are in the sample space but are NOT in the event
$$ A^c = \forall x:x \notin A $$
Remember:<br>
All complements are mutually exclusive, but not all mutually exclusive sets are complements.

Example: <br>
Dogs and Cats are mutually exclusive sets, since no species is simultaneously a feline and a canine, but the two are not complements, since there exist other types of animals as well.

### Section 1.3.3 - Conditional Probability

For any two events A and B, such that the likelihood of B occurring is greater than 0 ($𝑃(𝐵) > 0$), the conditional probability
formula states the following:

$$ P(A | B) = \frac{P(A \cap B)}{P(B)} $$

This reads as "The probability of A occuring given that B has occurred equal the probability of them both happening simultaneously divided by the probability of B occurring". <br><br>

Remember $ P(A | B)$ is not the same as $P(B | A) $ even if $ P(A|B) = P(B|A) $ numerically

#### Section 1.3.3.1 - Law of Total Probability

The law of total probability dictates that for any set A, which is a union of many mutually exclusive sets $B_1,B_2,...,B_n$, its probability equals the
following sum.

$$ P(A) = P(A | B_1)*P(B_1) + P(A | B_2)*P(B_2) + ... + P(A | B_n)*P(B_n)

#### Section 1.3.3.2 - Multiplication Rule

The multiplication rule calculates the probability of the intersection based on the conditional probability.

$$ P(A \cap B) = P(A|B)*P(B) $$

Intution behind the formula:
* If event B occurs 40% of the time ($P(B)=0.4$) and event A occurs 50% of the time that event B occurs ($P(A|B)=0.5$), then they would simultaneously occur 20% of the time ($P(A|B)*P(B)=0.5*0.4 = 0.2$) 

#### Section 1.3.3.3 - Bayes' Law

Bayes’ Law helps us understand the relationship between two events by computing the different conditional probabilities. <br>
We also call it Bayes’ Rule or Bayes’ Theorem.
$$ P(A|B) = \frac{P(B|A)*P(A)}{P(B)} $$

Intution behind the formula
* According to the multiplication rule $ P(A \cap B) = P(A|B)*P(B) $ , so $P(B \cap A) = P(B|A)*P(A)$
* Since $P(A \cap B) = P(B \cap B)$, we plug in $P(B|A)*P(A)\,for\,P(A \cap B)$ in the probability formula $P(A|B) = \frac{P(A \cap B)}{P(B)}$