# CS486 - Artificial Intelligence
## Lesson 26 - Variable Elimination

Today we'll look at how to use a Bayes' Net to answer questions given some evidence. First, we'll look at **inference by enumeration** which reconstitutes the full joint distribution across all variables. Next, we'll look at the **variable elimination** algorithm to more efficiently compute the join. 

In [None]:
import helpers
from bayes import *

### Inference by Enumeration

A Bayes' Net encodes the full joint distribution across a set of random variables. So what does the full joint look like for our alarm network? Let compute it:

<center><img src="images/bayes_net.jpg" width="400"></center>


In [None]:
alarm_network = (BayesNet()
    .add('Burglary', [], 0.001)
    .add('Earthquake', [], 0.002)
    .add('Alarm', 
         ['Burglary', 'Earthquake'], 
         {(T, T): 0.95, (T, F): 0.94, (F, T): 0.29, (F, F): 0.001})
    .add('JohnCalls', ['Alarm'], {T: 0.90, F: 0.05})
    .add('MaryCalls', ['Alarm'], {T: 0.70, F: 0.01}))  

# so we can access our variables directly
globals().update(alarm_network.lookup)

The easiest way to compute the full joint is to consider every possible instantiation of every variable. Let's start with `F,F,F,F,F`:

In [None]:
P(Burglary)

In [None]:
P(Burglary)[F]

In [None]:
P(Burglary)[F] * \
P(Earthquake)[F] * \
P(Alarm, {Earthquake: F, Burglary: F})[F] * \
P(JohnCalls, {Alarm: F})[F] * \
P(MaryCalls, {Alarm: F})[F]

Computing that for every row yields the full joint:

In [None]:
joint = joint_distribution(alarm_network)
joint

We can use the full joint to answer queries. For example, what are the odds that there is a burglary and Mary calls? 

Well, we just have to select rows where `Burglary` and `MaryCalls` is `T` and **sum out** the hidden variables:

In [None]:
mary_burglary = {F: 0, T: 0}

for (b,e,a,j,m), p in joint.items():
    if b == T:
        print((b,e,a,j,m), p)
        mary_burglary[m] += p
        
mary_burglary

Any time you select evidence, you'll need to normalize:

In [None]:
ProbDist(mary_burglary)

This strategy is exponential in the number of variables. The `enumeration_ask` function improves on this strategy by selecting rows consistent with the evidence before computing the join:

In [None]:
enumeration_ask(MaryCalls, {Burglary: T}, alarm_network)

At best, inference by enumeration is  exponential in the number of non-evidence variables `:(`

### Variable Elimination

**NOTE: This section uses the older AIMA 3rd edition code so the API is a little different. You'll need to restart the kernel to move on from here. **

Instead of computing the full join for the Bayes' Net we can compute **Factors** and sum out hidden variables before computing joins. Let's compute the odds of a burglary and Mary calling using variable elimination:

In [None]:
from helpers import * 
from aima.probability import *
from aima.notebook import psource

In [None]:
alarm_network = BayesNet([
    ('Burglary', '', 0.001),
    ('Earthquake', '', 0.002),
    ('Alarm', 'Burglary Earthquake',
     {(T, T): 0.95, (T, F): 0.94, (F, T): 0.29, (F, F): 0.001}),
    ('JohnCalls', 'Alarm', {T: 0.90, F: 0.05}),
    ('MaryCalls', 'Alarm', {T: 0.70, F: 0.01})
])

First, build a factor for each variable in the network. Here's the factor for `Burglary`:

In [None]:
make_factor('Burglary', {'Burglary': T}, alarm_network)

A factor only contains rows consistent with evidence, so the `Burglary` factor only contains one row. Now let's see the factor for `MaryCalls`:

In [None]:
mary_factor = make_factor('MaryCalls', {'Burglary': T}, alarm_network)
mary_factor

The factor's conditional probability table is a join on its parents. In this case, the parent variable, `Alarm` is hidden since it is not in the query or evidence. We can sum hidden variables out:

In [None]:
sum_out('Alarm', [mary_factor], alarm_network)

The **variable elimination** algorithm simply alternates between creating large conditional probability tables through joins and reducing their size by summing out hidden variables. Here's the full `elimination_ask` function:

In [None]:
psource(elimination_ask)

In [None]:
print( elimination_ask('MaryCalls', {'Burglary': T}, alarm_network) )

### Runtime comparison
Let's see how the runtimes of these two algorithms compare.
We expect variable elimination to outperform enumeration by a large margin as we reduce the number of repetitive calculations significantly.

In [None]:
%%timeit
enumeration_ask('MaryCalls', {'Burglary': T}, alarm_network)

In [None]:
%%timeit
elimination_ask('MaryCalls', {'Burglary': T}, alarm_network)

Variable elimination is significantly faster in large networks. The performance also depends on the ordering of the factors. The size of a factor is a function of the number of parents and the evidence. The complexity of the algorithm is dominated by the largest factor generated along the way. 

For some cases, like polytrees, there is always an efficient ordering of factors for variable elimination. But in general inference in Bayes' Nets in NP-hard. We'll look at ways we can scale better by getting an approximate inference instead of an exact one. 