# Bayesian Learning

## Probability Basics

**Definition 1**
A random experiment or a random trial is a procedure that, at least theoretically, can be repeated infinite times. It is characterized as follows:

1. Configuration (A precisely specified system that can be reconstructed)
2. Procedure (An instruction on how to execute the experiment, based on the config)
3. Unpredictability of the outcome

A set $ \Omega = \{\omega_1, \omega_2, \dots, \omega_n\} $ is called *sample space* of a random experiment, if eac hexperiment outcome is associated with at most one element $ \omega \in \Omega $. The elements in $ \Omega $ are called *outcomes*.

Let $ \Omega $ be a finite sample space. Each subset $ A \subseteq \Omega $ is called an event, which occurs iff the experiment outcome $ \omega $ is a member of $ A $. The set of all events $ \mathcal{P}(\Omega) $, is called the event space or $ \sigma $-algebra.

Ex:
* Experiment: Rolling a dice
* Sample Space: $ \Omega = \{1, 2, 3, 4, 5, 6\} $
* Some Event: $ A = \{2, 4, 6\} $

* Experiment: Rolling two dice at the same time
* Sample Space: $ \Omega = \{\{1, 1\}, \{1, 2\}, \dots, \{6, 6\}\} $
* Some Event: $ B = \{\{1, 2\}\} $

* Experiment: Rolling two dice in succession
* Sample Space: $ \Omega = \{(1, 1), (1, 2), \dots, (6, 6)\} $
* Some Event: $ C = \{(1, 2), (2, 1)\} $

### How to capture the Nature of Probablity

1. Classic, symmetry-based
2. Frequentist
3. Axiomatic
4. Subjectivist, Bayesian, prognostic

**Classical/Laplace Probability**

If each elementary event $ \{\omega\}, \omega \in \Omega $ gets assigned the same probability, then the probability $ P(A) $ of an event $ A $ is defined as follows:

$ P(A) = \frac{|A|}{|\Omega|} $

**Frequentist**

Basis is the **empirical law of large numbers**:

In a random experiment, the average of the outcomes obtained from a large number of trials is close to the expected value, it will come closer as more trials are performed.

**Axiomatic**

(a) Postulate a function $ P() $ (That assigns a probability to every event in $ \mathcal{P}(\Omega) $)

(b) Specify the required properties (of $ P() $ in the form of axioms)

**Subjectivist, Bayesian, Prognostic**

Consider (prior) knowledge about the hypotheses:

$ p(h \mid D) = \frac{p(D \mid h) \cdot p(h)}{p(D)} $
* Likelihood: how well does $ h $ explain (entail, induce, invoke) the data $ D $?
* Prior: how probable is the hypothesis $ h $ a priori (in principle)?

### Axiomatic Approach to Probability

**Probability Measure**
Let $ \Omega $ be a set, called sample space, and let $ \mathcal{\Omega} $ be an event space. A Function $ P, P: \mathcal{P}(\Omega) \rightarrow \mathbb{R} $, which maps each event $ A $ onto a a real number $ P(A) $ is called probability measure if it has the following properties:

1. $ P(A) \geq 0 $ (Axiom I)
2. $ P(\Omega) = 1 $ (Axiom II)
3. $ A \cap B = \emptyset \Rightarrow P(A \cup B) = P(A) + P(B) $ (Axiom III)

**Probability Space**
Let $ \Omega $ be a sample space, $ \mathcal{P}(\Omega) $ be an event space, and $ P: \mathcal{P}(\Omega) \rightarrow \mathbb{R} $ be a probability measure. Then the tuple $ (\Omega, P) $ as well as the tripe $ (\Omega, \mathcal{P}(\Omega), P) $ is called probability space.

The Kolmogorov Axioms also imply:
1. $ P(A) + (P(\overline{A}) = 1 $
2. $ P(\emptyset) = 0 $
3. $ A \subseteq B \Rightarrow P(A) \leq P(B) $
4. $ P(A \cup B) = P(A) + P(B) - P(A \cap B) $
5. Let $ A_1, A_2, \dots A_n $ be mutually exclusive (incompatible), then holds:
   - $ P(A_1 \cup A_2 \cup \dots \cup A_n) = P(A_1) + P(A_2) + \dots + P(A_n) $

### Conditional Probability

Let $ (\Omega, \mathcal{P}(\Omega), P) $ be a probability space and let $ A, B \in \mathcal{P}(\Omega) $ be two events. Then the probability of the occurence of event $ A $ given that event $ B $ is known to have occurred is defined as follows:

$ P(A \mid B) = \frac{P(A \cup B)}{P(B)} $ if $ P(B) > 0 $

This is called "probability of A under condition B".

![img1](img/topic4img1.png)

### Total Probability

Let $ (\Omega, \mathcal{P}(\Omega), P) $ be a probability space and let $ A_1, A_2, \dots, A_n $ be mutually exclusive events with $ \Omega = A_1 \cup \dots \cup A_n, P(A_i) > 0, i = 1, \dots, n $. Then for each $ B \in \mathcal{P}(\Omega) $ holds:

$ P(B) = \sum\limits_{i = 1}{k} P(A_i) \cdot P(B \mid A_i) $

![img2](img/topic4img2.png)

### Independence of Events

Let $ (\Omega, \mathcal{P}(\Omega), P) $ be a probability space and let $ A, B \in \mathcal{P}(\Omega) $ be two events. Then $ A $ and $ B $ are called statistically independet iff the following holds true:

$ P(A \cap B) = P(A) \cdot P(B) $ (multiplication rule)

$ \Rightarrow P(A \mid B) = P(A \mid \overline{B}) $
$ \Leftrightarrow P(A \mid B) = P(A) $

The statistical independence of $ k $ events can also be determined by checking whether the multiplication rule holds true for all subsets of the $ k $ events.