# Representation

This chapter will formalize uncertainty representation, through the notion of degree of belief, distributions, and how those distributions interact.

## 2.1 Degrees of Belief and Probability

Using A≻B to represent "A is more plausible than B." Through this you can get many baseline axioms - universal comparibility and transitivity. From these, $P(A) > P(B)$ iff $A≻B$

## 2.2 Probability Distributions

### 2.2.1 Discrete Probability Distributions

Represented as a _probability mass function_ (pmf), assigning a probability to every possible assigment of its input variable to a value. Masses must sum to one.

Notation - $P(x^3)$ is an assignment, equivalent to $P(X=3)$.

Parameters of a distribution govern the probabilities. For example a dice roll has 6 parameters, but only 5 independent, since the last can be constrained.

### 2.2.2 Continuous Probability Distributions

Instead of a pmf, we use a _probability distribution function_ (pdf), which integrates to 1. Another way to represent it is a _cumulative distribution function_ (cdf), which can be defined as

$$
\text{cdf}_X (x) = P(X \leq x) = \int_{-\inf}^{x} p(x')dx'
$$

Related is the _quantile function_ or _inverse cumulative distribution function_. The value of $\text{quantile}_X (\alpha)$ is the value $x$ such that $P(X \leq x) = \alpha$.


**Some distributions:**

Uniform, $\mathcal{U}(a,b)$. Assigns probability density uniformly between two points, giving pdf of $p(x) = 1/(b-a)$. 

Gaussian or Normal distribution, $\mathcal{N}(\mu, \sigma^2)$, defined by mean and variance. The pdf at $x$ is $\mathcal{N}(x|\mu,\sigma^2) = \frac{1}{\sigma}\phi \big(\frac{x-\mu}{\sigma} \big)$, for $\phi(x) = \frac{1}{2\pi} \exp \big(-\frac{x^2}{2} \big)$

Gaussians are convienient because they have few parameters, however assign probability to large negative and positive values. To avoid this, you could use a truncated Gaussian. $\mathcal{N}(x| \mu, \sigma^2, a, b) = \frac{ \frac{1}{\sigma}\phi \big(\frac{x-\mu}{\sigma} \big)}{\Phi (\frac{b-\mu}{\sigma}) - \Phi (\frac{a-\mu}{\sigma})}$

The _support_ of a distribution is where it has non-zero values.

The gaussian is unimodal by itself. Multimodal phenomenon can be represented in different ways, one being a gaussian mixture model, a weighted average of different gaussians. $p(x| \mu_{1:n}, \sigma^2_{1:n}, \rho_{1:n}) = \sum_{i=1}^{n} \rho_i \mathcal{N}(x|\mu_i, \sigma^2_i)$

Another option is to represent the distribution over a continuous variable as a piecewise-uniform density.

## 2.3 Joint Distributions

A _joint distribution_ is a probability distribution over multiple variables. A distribution over a single variable is called a univariate distribution, over multiple is multivariate.

If over two discrete variables $X$ and $Y$, then $P(x,y)$ is the probability that $X=x$ and $Y=y$. The _marginal_ distribution of a variable or set of variables can be found by summing the others (law of total probability):

$$
P(x) = \sum_y P(x,y)
$$

### 2.3.1 Discrete Joint Distributions

The joint distribution of two discrete variables can be shown as a table, enumerating all possible assignments. For $n$ binary variables, need $2^n-1$ independent parameters to specify the joint distribution (e.g. 3 variables can be configured 8 ways, the last configuration is 1-$\theta_{1:7}$).

In some cases, we can assume variables are independent ($X\perp Y$), in that case $P(x,y) = P(x)P(y)$. For $n$ binaries, $P(x_{1:n}) = \prod_i P(x_i)$, and the number of paramteres is $n$. This is often a poor assumption, but conversely saves complexity.

Joint distributions can be represented by _factors_, $\phi$. A factor over a set of variables is a function from assignments to real numbers.

In [1]:
# An Algorithmic implementation of discrete factors
include("util_funcs.jl")

struct Variable
    name::Symbol
    m::Int # number of possible values
end

const Assignment = Dict{Symbol, Int}
const FactorTable = Dict{Assignment, Float64}

struct Factor
    vars::Vector{Variable}
    table::FactorTable
end

variablenames(φ::Factor) = [var.name for var in φ.vars]

select(a::Assignment, varnames::Vector{Symbol}) = Assignment(n=>a[n] for n in varnames)

function assignments(vars::AbstractVector{Variable})
    names = [var.name for var in vars]
    return vec([Assignment(n=>v for (n,v) in zip(names,values)) 
            for values in product((1:v.m for v in vars)...)])
end

function normalize!(φ::Factor)
    z = sum(p for  (a,p) in φ.table)
    for (a,p) in φ.table
        φ.table[a] = p/z
    end
    return φ
end;

In [2]:
# Instantiating a table:
X = Variable(:x, 2)
Y = Variable(:y, 2)
Z = Variable(:z, 2)
φ = Factor([X,Y,Z], FactorTable(
        (x=1, y=1, z=1) => 0.08, 
        (x=1, y=1, z=2) => 0.31,
        (x=1, y=2, z=1) => 0.09, 
        (x=1, y=2, z=2) => 0.37,
        (x=2, y=1, z=1) => 0.01, 
        (x=2, y=1, z=2) => 0.05,
        (x=2, y=2, z=1) => 0.02, 
        (x=2, y=2, z=2) => 0.07,
    )
)

Factor(Variable[Variable(:x, 2), Variable(:y, 2), Variable(:z, 2)], Dict(Dict(:y => 1,:z => 2,:x => 2) => 0.05,Dict(:y => 2,:z => 2,:x => 2) => 0.07,Dict(:y => 1,:z => 1,:x => 1) => 0.08,Dict(:y => 2,:z => 1,:x => 1) => 0.09,Dict(:y => 1,:z => 2,:x => 1) => 0.31,Dict(:y => 1,:z => 1,:x => 2) => 0.01,Dict(:y => 2,:z => 2,:x => 1) => 0.37,Dict(:y => 2,:z => 1,:x => 2) => 0.02))

Another approach is to use a _decision tree_, which is most effective when there are many variables and many repeated values.

### 2.3.2 Continuous Joint Distributions

Similar approach can be taken for continuous variables.

A simple distribution is the _multivariate uniform distribution_, $\mathcal{U}(\mathbf{a}, \mathbf{b})$ is a uniform distribution over a box.

A mixture model can be created from a weighted collection of MV uniform distributions - for $n$ variables and $k$ mixture components, need $k(2n+1) - 1$ independent parameters.

It is common to represent piecewise constant density functions by discretizing each variable independently - represented by a set of bin edges for each variable. For $n$ variables and $m$ bins for each, need $m^n -1$ parameters to define the distribution.

Another useful distribution is the mutlivariate gaussian distribution:

$$
\mathcal{N}(\mathbf{x} | \mathbf{\mu}, \mathbf{\Sigma}) = \frac{1}{(2\pi)^{n/2} |\mathbf{\Sigma}|^{1/2}} \exp \big( -\frac{1}{2} (\mathbf{x}-\mathbf{\mu}^T \mathbf{\Sigma}^{-1} (\mathbf{x} - \mathbf{\mu}) \big)
$$

$\mathbf{x}$ is in $\mathbf{R}^n$ called _mean vector_, and $\mathbf{\Sigma}$ is the _covariance matrix_. This has $n + (n+1)n/2$ independent paramters - components in $\mu$ added to the components of the upper traingle of the covariance matrix. You can also define multivariate gaussian mixture models.

## 2.4 Conditional Distributions
