> The _support_ of a distribution is the set of values that are assigned non-zero density

> From a joint distribution, we can compute a __marginal distribution__ of a variable or a set of variables by summing out all other variables:

$$P(x) = \sum_y P(x,y)$$

> We can represent joint distributions in terms of factors. A factor `φ` over a set of
`n` variables is a function from assignments of those variables to the real numbers.

### Definition of variables, assignments and factors

> A variable is given a name and takes values from `1` to `m`

In [1]:
struct Variable
    name::Symbol
    m::Int # number of possible values
end

In [2]:
Variable(:x, 5)

Variable(:x, 5)

In [3]:
const Assignment = Dict{Symbol, Int}
const FactorTable = Dict{Assignment, Float64}

struct Factor
    vars::Vector{Variable}
    table::FactorTable
end

We want to make it easy to assign variables. We would like to do something like so (but it fails now):

In [4]:
Assignment((x=1, y=2))

LoadError: BoundsError: attempt to access Int64 at index [2]

### Some utility functions for conversion

In [5]:
Base.Dict{Symbol, V}(a::NamedTuple) where V = 
    Dict{Symbol, V}( k=>v for (k,v) in zip(keys(a), values(a)) )

In [6]:
Assignment((x=1, y=2))

Dict{Symbol, Int64} with 2 entries:
  :y => 2
  :x => 1

In [7]:
Dict{Symbol, String}((x="hello", y="world"))

Dict{Symbol, String} with 2 entries:
  :y => "world"
  :x => "hello"

In [8]:
Base.convert(::Type{Dict{Symbol, V}}, a::NamedTuple) where V = 
    Dict{Symbol, V}(a)

In [9]:
convert(Assignment, (x=1,))

Dict{Symbol, Int64} with 1 entry:
  :x => 1

In [10]:
Base.isequal(a::Dict{Symbol, V}, b::NamedTuple) where V = 
    length(a) == length(b) &&
    all(a[k] == v for (k,v) in zip(keys(b), values(b)))

In [11]:
a = Assignment((a=1,b=2,c=3))

isequal(a, (a=1,c=3,b=2)), isequal(a, (a=1,c=2,b=2)), isequal(a, (a=1,c=3,b=2,d=4))

(true, false, false)

### Now let's define a `Factor`

In [12]:
X = Variable(:x, 2)
Y = Variable(:y, 2)
Z = Variable(:z, 2)

table = FactorTable(
    (x=1, y=1, z=1) => 0.08,
    (x=1, y=1, z=2) => 0.31,
    (x=1, y=2, z=1) => 0.09,
    (x=1, y=2, z=2) => 0.37,
    (x=2, y=1, z=1) => 0.01,
    (x=2, y=1, z=2) => 0.05,
    (x=2, y=2, z=1) => 0.02,
    (x=2, y=2, z=2) => 0.07
)

ϕ = Factor([X,Y,Z], table)

Factor(Variable[Variable(:x, 2), Variable(:y, 2), Variable(:z, 2)], Dict(Dict(:y => 1, :z => 1, :x => 1) => 0.08, Dict(:y => 1, :z => 1, :x => 2) => 0.01, Dict(:y => 1, :z => 2, :x => 1) => 0.31, Dict(:y => 2, :z => 2, :x => 1) => 0.37, Dict(:y => 2, :z => 1, :x => 2) => 0.02, Dict(:y => 1, :z => 2, :x => 2) => 0.05, Dict(:y => 2, :z => 2, :x => 2) => 0.07, Dict(:y => 2, :z => 1, :x => 1) => 0.09))

#### Get variable names from a given factor

In [13]:
variablenames(ϕ::Factor) = [var.name for var in ϕ.vars]

variablenames(ϕ)

3-element Vector{Symbol}:
 :x
 :y
 :z

#### Filter variables from an assignment

In [14]:
select(a::Assignment, varnames::Vector{Symbol}) = 
    Assignment( n => a[n] for n in varnames )

select (generic function with 1 method)

### We also import `product` from `Base.Iterators` to enumerate all possible assignments given a list of variables

In [15]:
import Base.Iterators: product

In [16]:
collect(product(1:3, 1:2))

3×2 Matrix{Tuple{Int64, Int64}}:
 (1, 1)  (1, 2)
 (2, 1)  (2, 2)
 (3, 1)  (3, 2)

In [17]:
collect(product(1:3, 1:2))

3×2 Matrix{Tuple{Int64, Int64}}:
 (1, 1)  (1, 2)
 (2, 1)  (2, 2)
 (3, 1)  (3, 2)

In [18]:
function assignments(vars::AbstractVector{Variable})
    names = [var.name for var in vars]
    matrix_of_assignments = 
        [Assignment(name => value for (name,value) in zip(names, p)) 
            for p in product((1:var.m for var in vars)...)]
    vec(matrix_of_assignments)
end

assignments (generic function with 1 method)

In [19]:
assignments([Variable(:x,3), Variable(:y,2), Variable(:z,4)])

24-element Vector{Dict{Symbol, Int64}}:
 Dict(:y => 1, :z => 1, :x => 1)
 Dict(:y => 1, :z => 1, :x => 2)
 Dict(:y => 1, :z => 1, :x => 3)
 Dict(:y => 2, :z => 1, :x => 1)
 Dict(:y => 2, :z => 1, :x => 2)
 Dict(:y => 2, :z => 1, :x => 3)
 Dict(:y => 1, :z => 2, :x => 1)
 Dict(:y => 1, :z => 2, :x => 2)
 Dict(:y => 1, :z => 2, :x => 3)
 Dict(:y => 2, :z => 2, :x => 1)
 Dict(:y => 2, :z => 2, :x => 2)
 Dict(:y => 2, :z => 2, :x => 3)
 Dict(:y => 1, :z => 3, :x => 1)
 Dict(:y => 1, :z => 3, :x => 2)
 Dict(:y => 1, :z => 3, :x => 3)
 Dict(:y => 2, :z => 3, :x => 1)
 Dict(:y => 2, :z => 3, :x => 2)
 Dict(:y => 2, :z => 3, :x => 3)
 Dict(:y => 1, :z => 4, :x => 1)
 Dict(:y => 1, :z => 4, :x => 2)
 Dict(:y => 1, :z => 4, :x => 3)
 Dict(:y => 2, :z => 4, :x => 1)
 Dict(:y => 2, :z => 4, :x => 2)
 Dict(:y => 2, :z => 4, :x => 3)

#### Normalize 

In [20]:
function normalize!(ϕ::Factor)
    z = sum(p for (a,p) in ϕ.table)
    for (a,p) in ϕ.table
        ϕ.table[a] = p/z
    end
    ϕ
end

normalize! (generic function with 1 method)

In [21]:
normalize!(ϕ)

Factor(Variable[Variable(:x, 2), Variable(:y, 2), Variable(:z, 2)], Dict(Dict(:y => 1, :z => 1, :x => 1) => 0.07999999999999999, Dict(:y => 1, :z => 1, :x => 2) => 0.009999999999999998, Dict(:y => 1, :z => 2, :x => 1) => 0.30999999999999994, Dict(:y => 2, :z => 2, :x => 1) => 0.36999999999999994, Dict(:y => 2, :z => 1, :x => 2) => 0.019999999999999997, Dict(:y => 1, :z => 2, :x => 2) => 0.04999999999999999, Dict(:y => 2, :z => 2, :x => 2) => 0.06999999999999999, Dict(:y => 2, :z => 1, :x => 1) => 0.08999999999999998))

In [22]:
sum(p for (a,p) in ϕ.table)

0.9999999999999998

#### _Discrete conditional models_

> A conditional probability distribution over discrete variables can be represented using a table.


#### _Conditional Gaussian models_

> A conditional Gaussian model can be used to represent a distribution over a con- tinuous variable given one or more discrete variables.

#### _Linear Gaussian models_

> $P(X|Y)$ is a distribution over $X$ with the mean being a linear function of continuous variable $Y$. The conditional density function is:

$$p(x|y) = N(x | my+b, \sigma)$$

with parameters $\theta = [m, b, \sigma]$

#### _Conditinal Linear Gaussian models_

> combines conditional Gaussian and linear Gaussian models: $P(X|Y,Z)$ where $Y$ is continuous and $Z$ is discrete.

#### _Sigmoid models_

> One way to represent a soft threshold is to use a logit model, which produces a sigmoid curve:

$$P(x^1|y) = \frac{1}{1 + exp({-2\frac{y-\theta_1}{\theta_2}})}$$

### Bayesian networks

> The structure of a Bayesian network is defined by a directed acyclic graph consisting of nodes and directed edges.

> Associated with each node $X$ is a conditional distribution $P(X_i | Pa(X_i))$, where $Pa(X_i)$ represents the parents of $X_i$ in the graph.

In [23]:
using Pkg;
Pkg.add("LightGraphs")

[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m    Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.6/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.6/Manifest.toml`


In [24]:
using LightGraphs

In [25]:
?SimpleDiGraph

search: [0m[1mS[22m[0m[1mi[22m[0m[1mm[22m[0m[1mp[22m[0m[1ml[22m[0m[1me[22m[0m[1mD[22m[0m[1mi[22m[0m[1mG[22m[0m[1mr[22m[0m[1ma[22m[0m[1mp[22m[0m[1mh[22m [0m[1mS[22m[0m[1mi[22m[0m[1mm[22m[0m[1mp[22m[0m[1ml[22m[0m[1me[22m[0m[1mD[22m[0m[1mi[22m[0m[1mG[22m[0m[1mr[22m[0m[1ma[22m[0m[1mp[22m[0m[1mh[22mFromIterator



```
SimpleDiGraph{T}
```

A type representing a directed graph.

---

```
SimpleDiGraph{T}(n=0)
```

Construct a `SimpleDiGraph{T}` with `n` vertices and 0 edges. If not specified, the element type `T` is the type of `n`.

## Examples

```jldoctest
julia> SimpleDiGraph(UInt8(10))
{10, 0} directed simple UInt8 graph
```

---

```
SimpleDiGraph(::Type{T})
```

Construct an empty `SimpleDiGraph{T}` with 0 vertices and 0 edges.

## Examples

```jldoctest
julia> SimpleDiGraph(UInt8)
{0, 0} directed simple UInt8 graph
```

---

```
SimpleDiGraph{T}(adjm::AbstractMatrix)
```

Construct a `SimpleDiGraph{T}` from the adjacency matrix `adjm`. If `adjm[i][j] != 0`, an edge `(i, j)` is inserted. `adjm` must be a square matrix. The element type `T` can be omitted.

## Examples

```jldoctest
julia> A1 = [false true; false false]
julia> SimpleDiGraph(A1)
{2, 1} directed simple Int64 graph

julia> A2 = [2 7; 5 0]
julia> SimpleDiGraph{Int16}(A2)
{2, 3} directed simple Int16 graph
```

---

```
SimpleDiGraph{T}(g::SimpleDiGraph)
```

Construct a copy of g. If the element type `T` is specified, the vertices of `g` are converted to this type. Otherwise the element type is the same as for `g`.

## Examples

```jldoctest
julia> g = complete_digraph(5)
julia> SimpleDiGraph{UInt8}(g)
{5, 20} directed simple UInt8 graph
```

---

```
SimpleDiGraph(g::AbstractSimpleGraph)
```

Construct an directed `SimpleDiGraph` from a graph `g`. The element type is the same as for `g`.

## Examples

```jldoctest
julia> g = path_graph(Int8(5))
julia> SimpleDiGraph(g)
{5, 8} directed simple Int8 graph
```

---

```
SimpleDiGraph(edge_list::Vector)
```

Construct a `SimpleDiGraph` from a vector of edges. The element type is taken from the edges in `edge_list`. The number of vertices is the highest that is used in an edge in `edge_list`.

### Implementation Notes

This constructor works the fastest when `edge_list` is sorted by the lexical ordering and does not contain any duplicates.

### See also

[`SimpleDiGraphFromIterator`](@ref)

## Examples

```jldoctest

julia> el = Edge.([ (1, 3), (1, 5), (3, 1) ])
julia> SimpleDiGraph(el)
{5, 3} directed simple Int64 graph
```

---

```
SimpleDiGraph{T}(nv, ne; seed=-1)
```

Construct a random `SimpleDiGraph{T}` with `nv` vertices and `ne` edges. The graph is sampled uniformly from all such graphs. If `seed >= 0`, a random generator is seeded with this value. If not specified, the element type `T` is the type of `nv`.

### See also

[`erdos_renyi`](@ref)

## Examples

```jldoctest
julia> SimpleDiGraph(5, 7)
{5, 7} directed simple Int64 graph
```


In [26]:
struct BayesianNetwork
    vars::Vector{Variable}
    factors::Vector{Factor}
    graph::SimpleDiGraph{Int64}
end

### Application of Bayesian networks to a satellite-monitoring problem

We are given five variables:
1. `B` battery failure
2. `S` solar panel failure
3. `E` electrical system failure 
4. `D` trajectory deviation
5. `C` communication loss

> Associated with each of the five variables are five conditional probability distributions.

![](assets/ex_2.5.png)

> Because `B` and `S` do not have any parents, we only need to specify `P(B)` and `P(S)`. 

The code below creates a Bayesian network structure with example values for the elements of the associated factor tables.

`(e=2,b=1,s=1)` corresponds to $(e^1, b^0, s^0)$

In [27]:
B = Variable(:b, 2)
S = Variable(:s, 2)
E = Variable(:e, 2)
D = Variable(:d, 2)
C = Variable(:c, 2)

vars = [B,S,E,D,C]

factorB = Factor([B], FactorTable((b=1,)=>0.99, (b=2,)=>0.01))
factorS = Factor([S], FactorTable((s=1,)=>0.98, (s=2,)=>0.02))
factorE = Factor([B,S,E], FactorTable(
        (e=1,b=1,s=1) => 0.90,
        (e=1,b=1,s=2) => 0.04,
        (e=1,b=2,s=1) => 0.05,
        (e=1,b=2,s=2) => 0.01,
        (e=2,b=1,s=1) => 0.10,
        (e=2,b=1,s=2) => 0.96,
        (e=2,b=2,s=1) => 0.95,
        (e=2,b=2,s=2) => 0.99,
        ))
factorD = Factor([D,E], FactorTable(
        (d=1,e=1) => 0.96,
        (d=1,e=2) => 0.03,
        (d=2,e=1) => 0.04,
        (d=2,e=2) => 0.97,
        ))
factorC = Factor([C,E], FactorTable(
        (c=1,e=1) => 0.98,
        (c=1,e=2) => 0.01,
        (c=2,e=1) => 0.02,
        (c=2,e=2) => 0.99,
        ))

Factor(Variable[Variable(:c, 2), Variable(:e, 2)], Dict(Dict(:e => 2, :c => 1) => 0.01, Dict(:e => 2, :c => 2) => 0.99, Dict(:e => 1, :c => 2) => 0.02, Dict(:e => 1, :c => 1) => 0.98))

In [28]:
graph = SimpleDiGraph(5)

{5, 0} directed simple Int64 graph

In [29]:
add_edge!(graph, 1, 3)
add_edge!(graph, 2, 3)
add_edge!(graph, 3, 4)
add_edge!(graph, 3, 5)

bn = BayesianNetwork(vars, [factorB, factorS, factorE, factorD, factorC], graph)

BayesianNetwork(Variable[Variable(:b, 2), Variable(:s, 2), Variable(:e, 2), Variable(:d, 2), Variable(:c, 2)], Factor[Factor(Variable[Variable(:b, 2)], Dict(Dict(:b => 2) => 0.01, Dict(:b => 1) => 0.99)), Factor(Variable[Variable(:s, 2)], Dict(Dict(:s => 1) => 0.98, Dict(:s => 2) => 0.02)), Factor(Variable[Variable(:b, 2), Variable(:s, 2), Variable(:e, 2)], Dict(Dict(:b => 1, :s => 1, :e => 2) => 0.1, Dict(:b => 2, :s => 2, :e => 2) => 0.99, Dict(:b => 1, :s => 2, :e => 1) => 0.04, Dict(:b => 1, :s => 1, :e => 1) => 0.9, Dict(:b => 2, :s => 1, :e => 2) => 0.95, Dict(:b => 2, :s => 2, :e => 1) => 0.01, Dict(:b => 2, :s => 1, :e => 1) => 0.05, Dict(:b => 1, :s => 2, :e => 2) => 0.96)), Factor(Variable[Variable(:d, 2), Variable(:e, 2)], Dict(Dict(:d => 2, :e => 1) => 0.04, Dict(:d => 2, :e => 2) => 0.97, Dict(:d => 1, :e => 1) => 0.96, Dict(:d => 1, :e => 2) => 0.03)), Factor(Variable[Variable(:c, 2), Variable(:e, 2)], Dict(Dict(:e => 2, :c => 1) => 0.01, Dict(:e => 2, :c => 2) => 0.99, D

> The chain rule for Bayesian networks specifies how to construct a joint distribution from the local conditional probability distributions.

$$P(x_{1:n}) = \prod_{i=1}^n P(x_i | parents(x_i))$$

In [30]:
function probability(bn::BayesianNetwork, assignment::Assignment)
    subassignment(ϕ) = select(assignment, variablenames(ϕ))
    probability(ϕ) = get(ϕ.table, subassignment(ϕ), 0.0)
    prod(probability(ϕ) for ϕ in bn.factors)
end

probability (generic function with 1 method)

In [31]:
assignment = Assignment((b=1,s=1,e=1,d=2,c=1))
probability(bn, assignment)

0.034228655999999996

An `assignment` should assign values to all variables. Thus it gives the joint probability of __ALL__ the variables:

In [32]:
assignment = Assignment((b=1,s=1,e=1))
probability(bn, assignment)

LoadError: KeyError: key :d not found

### Conditional independence

$P(X,Y|Z) = P(X|Z).P(Y|Z)$, represented by $(X \perp Y|Z)$

> A path between $A$ and $B$ is _d-separated_ by $C$ if any of the following is true ($C$ is a set of evidence variables):

1. The path contains a chain of nodes $X \rightarrow Y \rightarrow Z$ such that $Y$ is in $C$
2. The path contains a fork $X \leftarrow Y \rightarrow Z$ such that $Y$ is in $C$
3. The path contains an inverted fork $X \rightarrow Y \leftarrow Z$ such that $Y$ is __NOT__ in $C$ and __no descendent of $Y$__ is in $C$

We say that $A$ and $B$ are _d-separated_ by $C$ if all paths between $A$ and $B$ are _d-separated_ by $C$. This _d-separation_ implies that $(A ⊥ B|C)$