# Introduction to Statistics

Gerard Tromp, Prof Bioinformatics, FMHS Stellenbosch

2018-02-26

# Probability

- Fundamental to statistics  
- Describes the outcomes of processes numerically  
- Means of quantifying uncertainty  
- Historically based on games of chance (gambling)  

## Probability theory

- Chevalier de Mere circa 1650  
  - game of dice  
     - one six in 4 throws (single die)  
     - one double-six in 24 throws (pair of dice)  
- Blaise Pascal and Pierre de Fermat

![DieOutcomes](static/Figures/DiceAllStates.png)

## Probability theory

### Basic definitions (based on gaming)

1. Random experiment  
2. Elementary outcome  
3. Sample space  

### Coin toss

Outcomes:  
- Elementary outcomes  
  - Heads, Tails  
- Sample space  
  - {Heads, Tails} or $\large\{H,T\}$


- Curly braces is a representation of a set  
- Elementary outcomes are elements in the set of outcomes  
- Sample space is a set  

### Probabilities

- Sum of all outcomes = 1  
- For "fair" die  
  - sample space / outcome set is {1, 2, 3, 4, 5, 6}  
  - probability for each is 1/6 (0.166667)  
- For "loaded" die 
  - depends on loading (assume 1 appears 25% of the time)
  - P({1,2,3,4,5,6}) =  P({1}) + P({2,3,4,5,6}) = 1
  - $\large 1 - P(\{1\}) = P(\{2,3,4,5,6\})$
  - P(2) = 0.75/5 = 0.15 = P(3) = P(4) ...

### Characteristic properties of probability 
<p>
Probabilites are non-negative  
The probability of any outcome O<sub>i</sub> is greater or equal to zero
$$\Large P(O_i) \geq 0$$
<p>
The sum of all probabilities in a set of outcomes is equal to 1
$$\large \sum_{i=1}^{n} P(O_i) = 1 $$



### Assigning probabilites

- Classical  
- Frequentist  
- Personal  

- Objectivist   
  - Frequentist  
  - Classical  
- Subjectivist (Bayesian)  
  - Conditions on Personal  

### Basic operations

#### *Event* is a *set* of elementary outcomes  

For the pairs of dice:  
  

| Event description | Event's elementary outcomes | Probability |  
|:------------------|:---------------------------:|------------:|  
| A: Dice add to 3  | {(1,2), (2,1)}               | P(A) = 2/36 |   
| B: Dice add to 6  | {(1,5), (2,4), (3,3), (4,2), (5,1)} | P(B) = 5/36 |
| C: Red die (first) shows 1 | {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6)} | P(C) = 6/36 |
| D: Green die (second) shows 2 | {(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)} | P(D) = 6/36 |

### Basic operations

#### Event combination

- Logical operations  
For any given events **M** and **N**  
  - M **and** N  
  - M **or** N  
  - **not** M  

#### Utility of event combination

For the example with the dice:

What is the probability that that either die shows a 1?
![DieOutcomes](static/Figures/DiceAllStates.png)

#### Utility of event combination

Let:  
- C = red die == 1  
- D = grean die == 1
Then:
- C = {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6)}  
- D = {(1,1),(2,1),(3,1),(4,1),(5,1),(6,1)}  

**But there is overlap in outcomes**

#### Utility of event combination

| Logical     | Set Notation |  
|-------------|------------|  
| C **or** D | C &cup; D |  
| C **and** D | C &cap; D |  

- C &cap; D  
  - {(1,1)}  
- C &cup; D  
  - {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(3,1),(4,1),(5,1),(6,1)} 

### Rules for combining events

#### **Addition Rule:**

$$\Large P(C\enspace OR\enspace D) = P(C) + P(D) - P(C\enspace AND\enspace D)$$

As illustrated there is the overlap of {(1,1)} represent by P(C AND D) when adding P(C) + P(D) in this example

#### **Special addition rule**

**IF** P(C) and P(D) are **mutually exclusive**, then: 
- P(C AND D) = 0
- P(C OR D) = P(C) + P(D) \[- P(C &cup; D) = 0 \]

### Rules for combining events

#### **Subtraction Rule:**  

For any event E:

$$\large P(E) = 1 - P(NOT\enspace E)$$



## Conditional probability

For pair of dice, what if the tosses were consecutive?  Red before green
Let:  
- A be sum of faces = 3
- C be red = 1   
  
Then 
- P(A) = 2/36 since A={(1,2),(2,1)}
- What is P(A|C)?

## Conditional probability

$$\large P(E|F) = \frac{P(E\enspace and\enspace F)}{P(F)}$$
  
Lemma: 
  - P(E|E) = 1  
   
When E and F mutually exclusive:  
  - P(E|F) = 0

For mutually exclusive events:  
- once one has occured the other is impossible 

### Rules for combining events

#### **Multiplication rule**

$$\large P(E\enspace and\enspace F) = P(E|F)P(F)$$
  
It follows that:  
- P(F)P(E|F) = P(E)P(F|E)  
  


### Rules for combining events

#### **Special multiplication rule**

For independent events:  
**axiom** P(E|F) == P(E)
  
  
$$\large P(E\enspace and\enspace F) = P(E)P(F)$$

## Summary of rules of probability

1. Addition rule
2. Subtraction rule
3. Multiplication rule

## Solving de Mere's problems

1. one six in 4 throws (single die)  
2. one double-six in 24 throws (pair of dice)  


### one six in 4 throws (single die)

- throws are independent  
- P(A) = P(not 6) = 1 - 1/6 = 5/6  
  - per throw
- P(E) = P(6) in 4 throws = 1 - P(not E)
  
Then:  

P(not E) = P(A<sub>1</sub> and A<sub>2</sub> and A<sub>3</sub> and A<sub>4</sub>) = $(\frac{5}{6})^4$ = 0.482
  
P(E) =  1 - P(not E) = 1 - 0.482 = **0.518**
  


### one double six in 24 throws (pair of  dice)

- throws are independent  
- P(B) = P(not double 6) = 1 - 1/36 = 35/36  
  - per throw of pair of dice
- P(F) = P(d6) in 24 throws = 1 - P(not F)
  
Then:  

P(not F) = P(B<sub>1</sub> and B<sub>2</sub> and &hellip; and  A<sub>24</sub>) = $(\frac{35}{36})^{24}$ = 0.509
  
P(F) =  1 - P(not F) = 1 - 0.509 = **0.491**

## Extending rules of probability

### Bayes' theorem

Assume:
1. Disease affects 1/10,000 individuals  
2. Test can detect disease  
  1. Positive 99% of the time if person has disease  
  2. Positive 2% of the time if person DOES NOT have the disease  

<span style="color:red">*You just tested positive, what is the probability you have the disease?*<span>

### Bayes example

Let the events be:  
- A: person has disease
- B: person tests positive  

P(A) = 1/10000 = **0.0001**  
P(B|A) = **0.99**  
P(B|not A) = **0.02**  

We want to know:  
**P(A|B)**

### Bayes example

Construct contigency table:  

|  | A | (not A) |  
|--|---|---------|  
| **B** | A and B | (not A) and B |  
| **(not B)** | A and (not B) | (not A) and (not B) |  

|  | A | (not A) |  sum<sub>r</sub> |  
|--|---|---------|  
| **B** | (A and B) | P((not A) and B) | P(B) |  
| **(not B)** | P(A and (not B)) | P((not A) and (not B)) | P( not B) |   
| **sum<sub>r</sub>** | p(A)  | P(not A) | 1 |  


**Note**

The generalized 2x2 contingency table format is:
  
| | T | F |  
|-|---|---|  
| **T** | a | b |  
| **F** | c | d |  

So that we can easily refer to the cells

### Bayes example

We can now compute:  
- p(A and B) = p(B|A)P(A) = (0.99)\*(0.0001) = 0.00099  
- p((not A) and B) = P(B|(not A)) = (0.02)(0.9999) - 


In [18]:
A = 0.0001
BnotA = 0.02
BgA = 0.99
AandB = (BgA*A)
notAandB  = (1- (A))*BnotA
print("A and B = %8.6f" % AandB )
print("not A   = %8.6f" %  notAandB )
print("B       = %8.6f" % (AandB + notAandB))


A and B = 0.000099
not A   = 0.019998
B       = 0.020097


### Bayes example

|  | A | (not A) |  sum<sub>r</sub> |  
|--|---|---------|  
| **B** | 0.000099  | 0.019998 | 0.020097 |  
| **(not B)** | P(A and (not B)) | P((not A) and (not B)) | P( not B) |   
| **sum<sub>r</sub>** | 0.0001  | 0.9999 | 1 |  



In [19]:
print("A and not B     = %8.6f" % (A - AandB) )
print("not A and not B = %8.6f" % ( (1-A)- notAandB   ) )


A and not B     = 0.000001
not A and not B = 0.979902


### Bayes example

|  | A | (not A) |  sum<sub>r</sub> |  
|--|---|---------|  
| **B** | **0.000099**  | 0.019998 | **0.020097** |  
| **(not B)** | 0.000001 |0.979902 | 0.979903 |   
| **sum<sub>r</sub>** | 0.0001  | 0.9999 | 1 |  
  
We want $\large\enspace P(A|B) = \frac{P(A\enspace and\enspace B)}{P(B)}$ 


In [20]:
## Calculate P(A|B) 

print("A given B = %8.6f" % (AandB/(AandB + notAandB)) )

A given B = 0.004926


## Bayes' theorem

$$ P(A|B) = \frac{P(B)P(B|A)}{P(A)P(B|A)+P(not\enspace A)P(B|not\enspace A)}$$  
  
$$ = \frac{P(A\enspace and B)}{P(A\enspace and B)+P((not\enspace A)\enspace and\enspace B)}$$
  
$$ = \frac{P(A\enspace and\enspace B)}{P(B)}$$
  
$$ = P(A|B)$$