# Probabilities and Information Theory

## Introduction

**Probability Theory** is the study of probability and its interpretations through a set of rigorous mathematical formalism and axioms. It defines probability in terms of a probability space and probability measures between $0$ and $1$. In conjunction with Linear Algebra, Probability is one of the bases for Machine Learning.

## Fundamentals of Probability

In this section, the fundamental concepts of probability theory, **Random Experiments**, **Sets**, **Indepedence**, **Conditional Probability**, and **Baye's Rule** are discussed.

### Random Experiments

The **Probability** of an **Event** , defined as the chance of its realization,  is encoded as a **Positive Real Numeral** between $0$ and $1$. Formally speaking, we first define a **Probability Space** composed of the three following components: a **Sample Space** $\Omega$ representing all possible outcomes of an experiment, a set of possible **Events**, as well as a **Probability Function** $P$ measuring the chance of each event to occur.

```{note}
Note the $\Omega$ contains itself but also the empty set of events $\varnothing$
```

The probability function is defined such as it always respect the following requirements:
- $P(\varnothing)=0$
- $P(\Omega)=1$
- $P(A \cup B)=P(A) + P(B)$ for two disjoint events $A$ and $B$
- $P(\overline{A}) = 1 - P(A)$ with $\overline{A}$ being the complement of the event $A$.

Let us illustrate those concepts by comparing three random experiments by **Monte Carlo Simulation** in conjunction with the **Theroy**.

#### Dice Rowling

Consider the following experiment where two independant and non-pipped six-sided dices are rolled, and we want to find the probability of the both dice being odd. The problem can be formalised as follow:
- $\Omega = \left \{ 1, \dots, 6  \right \}^2$
- $A = \left \{ (i, j) \; | \; i + j \; \text{is odd} \right \}$
- $P(B) = \frac{|B|}{|\Omega|}$ where $|.|$ denotes the number of element in the given set


The probability function is symmetric and can be resumed in the following table:

In [8]:
import numpy as np
import pandas as pd

omega = list(range(1, 6 + 1))

pf = np.array([[i + j for j in omega] for i in omega])
df = pd.DataFrame(data=pf, index=omega, columns=omega)

df.head(len(df))

Unnamed: 0,1,2,3,4,5,6
1,2,3,4,5,6,7
2,3,4,5,6,7,8
3,4,5,6,7,8,9
4,5,6,7,8,9,10
5,6,7,8,9,10,11
6,7,8,9,10,11,12


<!-- 
Monte Carlo Simulations W/ Theory
- Rolling Dice
- Password Hacking
- Sampling w/ or wo/ Replacement
-->


### Sets
### Independence
### Conditional Probability
### Baye's Rule

## Probability Distributions
### Random Variables
### Moment Based Descriptors
### Discrete Distributions
### Continuous Distributions
### Joint Distributions

## Statistical Inference
### Random Sample
### Normal Sampling
### Central Limit Thorem
### Point Estimation
### Confidence Intervals
### Bayesian Statistics

## Hypothesis Testing
### Single Sample Hypothesis
### Two Sample Hypothesis
### Analysis of Variance
### Godness of Fit

## Information Theory
### Self-Information and Entropy
### Kullback-Leilber Divergence
### Jensen-Shanon Divergence
### Wasserstein Distance

## Applications
### Noise in Computer Graphics
### Probabilistic Dynamic Modeling
### Logistic Regression