# Exercise 3- Bayesian Networks and the Darwiche Compiler

In this exercise, we will learn the parameters of a Bayesian Network and perform inference. 
We will then compare our results to those of the Darwiche compiler.

In the event of a persistent problem, do not hesitate to contact the course instructors under
- paul.kahlmeyer@uni-jena.de

### Submission

- Deadline of submission:
        13.11.2022
- Submission on [moodle page](https://moodle.uni-jena.de/course/view.php?id=34630)

### Help
In case you cannot solve a task, you can use the saved values within the `help` directory:
- Load arrays with [Numpy](https://numpy.org/doc/stable/reference/generated/numpy.load.html)
```
np.load('help/array_name.npy')
```
- Load functions, classes and other objects with [Dill](https://dill.readthedocs.io/en/latest/dill.html)
```
import dill
with open('help/some_func.pkl', 'rb') as f:
    func = dill.load(f)
```

to continue working on the other tasks.

# Bayesian Networks

## Dataset
In this exercise we will used a discretized version of the [Pima Indians Dataset](https://www.kaggle.com/uciml/pima-indians-diabetes-database/version/1). In its original form, nine attributes were collected:

- `Pregnancies` : Number of times pregnant 
- `Glucose` : Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- `BloodPressure` : Diastolic blood pressure (mm Hg)
- `SkinThickness` : Triceps skin fold thickness (mm)
- `Insulin` : 2-Hour serum insulin (mu U/ml)
- `BMI` : Body mass index (weight in kg/(height in m)^2)
- `DiabetesPedigreeFunction` : Value indicating the presence of diabetes in the family
- `Age` : Age
- `Outcome`:  1- Diabetes, 0- No diabetes

For this exercise, we will use a binarized version of this dataset. Each attribute has been labeled
- `0` below average
- `1` above average

The dataset is stored as `dataset.csv`. 


### Task 1
Load the dataset.

In [1]:
import pandas as pd
import numpy as np
dataset = pd.read_csv('dataset.csv')
data = dataset.to_numpy()

## Binary Bayesnet

A Bayesian network is a multivariate categorical where certain conditional independencies hold that are specified by a directed acyclic graph (DAG) on the variables.
Namely, any variable $x_i$ is conditionally independent from its non-descendants given the values of its parents.
This means that the joint distribution factorizes over each node given its parents.

Consider the following example of three variables:

<div>
<img src="images/bayesnet_example.png" width="200"/>
</div>

Here we have that 
\begin{equation}
p(x_0, x_1, x_2) = p(x_0)p(x_2|x_0)p(x_1|x_0,x_2)
\end{equation}

Such a categorical distribution can be represented by **Conditional Probability Tables** (CPTs), that hold the distributions $p(x_i| \text{parents}(x_i))$. 

In our example we would have three CPTs:

| $x_0$ | $p(x_0)$|
| :- | -: | 
| $0$ | $p(x_0=0)$|
| $1$ | $p(x_0=1)$|

| $x_0$ | $x_2$| $p(x_2|x_0)$|
| :- | :- | -: | 
| $0$ | $0$ | $p(x_2=0|x_0=0)$ | 
| $0$ | $1$ | $p(x_2=1|x_0=0)$ | 
| $1$ | $0$ | $p(x_2=0|x_0=1)$ | 
| $1$ | $1$ | $p(x_2=1|x_0=1)$ | 

| $x_0$ | $x_2$| $x_1$| $p(x_1|x_0, x_2)$|
| :- | :- | :- | -: | 
| $0$ | $0$ | $0$ | $p(x_1=0|x_0=0, x_2=0)$ | 
| $0$ | $0$ | $1$ | $p(x_1=0|x_0=0, x_2=1)$ | 
| $0$ | $1$ | $0$ | $p(x_1=0|x_0=1, x_2=0)$ | 
| $0$ | $1$ | $1$ | $p(x_1=0|x_0=1, x_2=1)$ | 
| $1$ | $0$ | $0$ | $p(x_1=1|x_0=0, x_2=0)$ | 
| $1$ | $0$ | $1$ | $p(x_1=1|x_0=0, x_2=1)$ | 
| $1$ | $1$ | $0$ | $p(x_1=1|x_0=1, x_2=0)$ | 
| $1$ | $1$ | $1$ | $p(x_1=1|x_0=1, x_2=1)$ | 


### Task 2

For our dataset, we have the adjacency matrix of such a DAG stored in `adj.npy`.

An entry $A_{ij}$ is 1, if there is an edge from $x_i$ to $x_j$. 

Load and display the adjacency matrix.

In [2]:
adj = np.load('adj.npy')
adj

array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 1, 0]], dtype=int32)

### Task 3

Implement the following `BinaryBayeNet` class.

Then create and fit a Bayesnet on the Diabetes dataset and calculate the [loglikelihood](https://en.wikipedia.org/wiki/Likelihood_function).

In [20]:
import itertools
class BinaryBayesNet():

    def __init__(self, A:np.ndarray, prob_tables:dict = {}) -> None:
        '''
        Bayesian Network of binary categorical variables.
        
        @Params:
            A...            adjacency matrix of the DAG
            prob_tables...  probability tables of the nodes, dictionary where
                            key = node index
                            value = tuple (sample space, probs, indices of sample space)
                            .fit(X) will estimate those prob_tables
        '''
        self.A = A
        self.n = self.A.shape[0]
        self.domain = [0,1]
        self.prob_tables = prob_tables
        self.sample_space = np.array(list(itertools.product(self.domain, repeat=self.n)))

    def fit(self, dataset:np.ndarray, pseudo_obs:int=0) -> None:
        '''
        Calculates the CPTs for the Bayesian Network.
        
        @Params:
            dataset... Nxd matrix, binary vectors as rows
            pseudo_obs... pseudo observations that are added for laplace regularization
        '''
        
        def parents(node_index):
            return np.argwhere(self.A[:,node_index] == 1).flatten().tolist() # returns the indices of the parents of the node
        
        for i in range(self.n):
            indices_of_sample_space = [i] + parents(i)
            sample_space = np.array(list(itertools.product(self.domain, repeat=len(indices_of_sample_space))))
            probs = pd.DataFrame(dataset).groupby(indices_of_sample_space).size().to_numpy()
            probs += pseudo_obs
            probs = probs.astype(np.float32)
            half_range = probs.shape[0] // 2  # If probs[idx] is p(x=0 | y=y'), then probs[idx+half_range] is p(x=1 | y=y')
            for idx in range(half_range):
                denominator = probs[idx] + probs[idx+half_range]
                probs[idx] /= float(denominator) # p(x=0 | y=0) = p(x=0,y=0) / (p(x=0, y=0)+p(x=0,y=1)
                probs[idx+half_range] /= float(denominator)
            self.prob_tables[i] = (sample_space, probs, indices_of_sample_space)

    def proba(self, X: np.ndarray) -> np.ndarray:
        '''
        Calculates the probabilities of samples X.
        
        @Params:
            X... numpy array with samples as rows
    
        @Returns:
            numpy array with p(x)
        '''
        prob_list = []
        for x in X:
            prob = 1 # stores probability of x
            for i,(sample_space, probs, indices_of_sample_space) in self.prob_tables.items():
                prob_index = np.argwhere(np.all(sample_space == x[indices_of_sample_space], axis=1)).squeeze() # get index of the line in the sample space that matches x
                prob *= probs[prob_index]
            prob_list.append(prob)
        return np.array(prob_list)

net = BinaryBayesNet(adj)
net.fit(data)
np.sum(np.log(net.proba(data)))

-4217.480217327824

### Task 4

Use the bayesian network to answer the following queries:

1. What is the posterior marginal distribution of `Diabetes` if we observe a high `BMI` and a high `Glucose` level?
2. What is the prior marginal distribution of `Pregnancies` and `Glucose`?
3. What is the MAP hypothesis of `Age` given a high `BloodPressure`?

In [22]:
cols = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
# 1.
sample_space, probs, indices = net.prob_tables[cols.index('Outcome')]
probs = probs[sample_space[:,1] == 1] / sum(probs[sample_space[:,1] == 1]) # Since Outcome does not have Glucose as a parent, we only need to look at BMI
np.array([[0, probs[0]], [1, probs[1]]])

array([[0.        , 0.53417724],
       [1.        , 0.46582279]])

In [23]:
# 2.
# Since Pregnancies and Glucose are independent, we can compute p('Pregnancies', 'Glucose') = p('Pregnancies')*p('Glucose')
sample_space, probs, indices = net.prob_tables[cols.index('Glucose')]
glucose = np.array([probs[0]+probs[1], probs[2]+probs[3]])
sample_space, probs, indices = net.prob_tables[cols.index('Pregnancies')]
pregnancies = np.array([probs[0]+probs[1], probs[2]+probs[3]])
marginal = [a*b for (a,b) in list(itertools.product(glucose, pregnancies))]
np.hstack((sample_space, np.array(marginal).reshape((4,1))))

array([[0.        , 0.        , 0.93864441],
       [0.        , 1.        , 0.99013156],
       [1.        , 0.        , 1.00796711],
       [1.        , 1.        , 1.06325674]])

In [24]:
# 3.
sample_space, probs, indices = net.prob_tables[cols.index('Age')]
probs = probs[sample_space[:,1] == 1] # Condition on high blood pressure
probs = np.array([probs[0]+probs[1], probs[2]+probs[3]]) # Marginalize out Outcome, so only Age is left
probs = probs / sum(probs)
np.array([[0, probs[0]], [1, probs[1]]]) # Maximum is Age=1

array([[0.        , 0.47226095],
       [1.        , 0.52773905]])

# Darwiche Compiler

The [Darwiche Compiler](http://reasoning.cs.ucla.edu/c2d/) compiles a logical CNF formula into an arithmetic circuit.

For our bayesian network, we can create the CNF out of the network polynomial and compile it using the `utils.to_circuit` function.

With the following tasks we will create this CNF step-by-step.

## State Monomials

First we need monomials of the form $\theta_{i:z_i}$ that are true if node $i$ takes the value $z_i$.


### Task 5
Create a dictionary, which holds these monomials in the format
- key = monomial index, an ID for the monomial
- value = tuple (node index, node value).

Example: 
```
6 : (2, 1)
```
stands for the monomial $\theta_{2:1}$ which we name with index 6.

*Important:*

Later we want to encode a negative index as the negation of the monomial. Therefore we need to **start counting at 1** (so that we have a different negative value).


In [25]:
state_monoms = {}
counter = 1
for i in range(adj.shape[0]):
    state_monoms[counter] = (i, 0)
    state_monoms[counter+1] = (i, 1)
    counter += 2
state_monoms

{1: (0, 0),
 2: (0, 1),
 3: (1, 0),
 4: (1, 1),
 5: (2, 0),
 6: (2, 1),
 7: (3, 0),
 8: (3, 1),
 9: (4, 0),
 10: (4, 1),
 11: (5, 0),
 12: (5, 1),
 13: (6, 0),
 14: (6, 1),
 15: (7, 0),
 16: (7, 1),
 17: (8, 0),
 18: (8, 1)}

## Conditional Monomials
Next we need monomials of the form $\theta_{i, pa(i): z_i, z_{pa(i)}}$, that are true if node $i$ takes the value $z_i$ and the parent nodes of $i$, namely $pa(i)$ take the values $z_{pa(i)}$.

### Task 6
Create a dictionary, which holds these monomials and their probability in the format
- key = monomial index, an ID for the monomial
- value = tuple (probability, tuple of tuples (node idx, node value)).

Example:

```
24 : (0.3, ((8, 0), (1, 1)))
```

stands for the monomial $\theta_{1, 8 : 1, 0}$ with the probability $p(x_1=1|x_8=0) = 0.3$ which we name with index 24.

*Important:*

In order for the compiler to work, the tuple of tuples **first has to specify the parents and then the actual node**.

In [8]:
cond_monoms = {}
counter = len(state_monoms) + 1
for node in range(adj.shape[0]):
    sample_space, probs, indices = net.prob_tables[node]
    parents = indices[1:]
    for idx,row in enumerate(sample_space):
        tup = ()
        node_val = row[0]
        parent_vals = row[1:]
        for parent_idx, parent in enumerate(parents):
            tup = tup + ((parent, parent_vals[parent_idx]),)
        tup = tup + ((node, node_val),)
        cond_monoms[counter] = (probs[idx], tup)
        counter += 1
cond_monoms

{19: (0.47265625, ((7, 0), (0, 0))),
 20: (0.07942708333333333, ((7, 1), (0, 0))),
 21: (0.14453125, ((7, 0), (0, 1))),
 22: (0.3033854166666667, ((7, 1), (0, 1))),
 23: (0.4505208333333333, ((8, 0), (1, 0))),
 24: (0.09505208333333333, ((8, 1), (1, 0))),
 25: (0.20052083333333334, ((8, 0), (1, 1))),
 26: (0.25390625, ((8, 1), (1, 1))),
 27: (0.1875, ((1, 0), (5, 0), (2, 0))),
 28: (0.08984375, ((1, 0), (5, 1), (2, 0))),
 29: (0.0625, ((1, 1), (5, 0), (2, 0))),
 30: (0.07421875, ((1, 1), (5, 1), (2, 0))),
 31: (0.12109375, ((1, 0), (5, 0), (2, 1))),
 32: (0.14713541666666666, ((1, 0), (5, 1), (2, 1))),
 33: (0.11458333333333333, ((1, 1), (5, 0), (2, 1))),
 34: (0.203125, ((1, 1), (5, 1), (2, 1))),
 35: (0.45703125, ((3, 0),)),
 36: (0.54296875, ((3, 1),)),
 37: (0.2265625, ((1, 0), (3, 0), (4, 0))),
 38: (0.171875, ((1, 0), (3, 1), (4, 0))),
 39: (0.15494791666666666, ((1, 1), (3, 0), (4, 0))),
 40: (0.0703125, ((1, 1), (3, 1), (4, 0))),
 41: (0.037760416666666664, ((1, 0), (3, 0), (4,

## CNF of the network polynomial

Let $\theta_{I:Z}$ be the monomial that stands for the node indices $I$ being set to values $Z$.
Then the CNF of the network polynomial is made from the following disjunctions:

1. For each node $i$
    - $\theta_{i:0}\vee\theta_{i:1}$
    - $\neg\theta_{i:0}\vee\neg\theta_{i:1}$
  
2. For each combination of a node $i$ and its parents $pa(i)$
    - $\neg\theta_{i,pa(i) : z_i, z_{pa(i)}}\vee\theta_{i:z_i}$
    - $\neg\theta_{i,pa(i) : z_i, z_{pa(i)}}\vee\theta_{j:z_j}$ for each $j\in pa(i)$
    - $\theta_{i,pa(i) : z_i, z_{pa(i)}}\vee\neg\theta_{i:z_i}\vee\bigvee_{j\in pa(i)}\neg\theta_{j:z_j}$
    

### Task 7

Create the CNF as a list of lists, where each list represents a disjunction with the monomial indices (negative indices stand for negated monomials).

Example:

```
[[1 , 2], [-5 , -6]]
```
would stand for the CNF

$\left(\theta_1\vee\theta_2\right) \wedge \left(\neg\theta_5\vee\neg\theta_6\right)$.

In [10]:
cnf = []

# 1.
c = 1
while c <= len(state_monoms):
    cnf += [[c, c+1]]
    cnf += [[-c, -(c+1)]]
    c += 2
    
# 2.
def parents(node):
    return np.argwhere(adj[:,node] == 1).flatten().tolist()
def find_state_monom(node_, val_):
    for key, (node,val) in state_monoms.items():
        if node == node_ and val == val_:
            return key
def find_cond_monoms(node_, val_):
    keys = []
    for key, (prob, tuples) in cond_monoms.items():
        if tuples[-1][0] == node_ and tuples[-1][1] == val_:
            keys.append(key)
    return keys
for node in range(adj.shape[0]):
    state_0 = find_state_monom(node, 0)
    state_1 = find_state_monom(node, 1)
    conds_0 = find_cond_monoms(node, 0)
    conds_1 = find_cond_monoms(node, 1)
    # First part
    for key in conds_0:
        cnf += [[-key, state_0]]
    for key in conds_1:
        cnf += [[-key, state_1]]
    # Second part
    for parent in parents(node):
        for key in conds_0:
            cnf += [[-key, find_state_monom(parent, 0)]]
            #cnf += [[-key, find_state_monom(parent, 1)]]
        for key in conds_1:
            #cnf += [[-key, find_state_monom(parent, 0)]]
            cnf += [[-key, find_state_monom(parent, 1)]]
    # Third part
    for key in conds_0:
        cnf += [[key, -state_0] + [-find_state_monom(parent, 0) for parent in parents(node)]]
    for key in conds_1:
        cnf += [[key, -state_1] + [-find_state_monom(parent, 1) for parent in parents(node)]]
cnf

[[1, 2],
 [-1, -2],
 [3, 4],
 [-3, -4],
 [5, 6],
 [-5, -6],
 [7, 8],
 [-7, -8],
 [9, 10],
 [-9, -10],
 [11, 12],
 [-11, -12],
 [13, 14],
 [-13, -14],
 [15, 16],
 [-15, -16],
 [17, 18],
 [-17, -18],
 [-19, 1],
 [-20, 1],
 [-21, 2],
 [-22, 2],
 [-19, 15],
 [-20, 15],
 [-21, 16],
 [-22, 16],
 [19, -1, -15],
 [20, -1, -15],
 [21, -2, -16],
 [22, -2, -16],
 [-23, 3],
 [-24, 3],
 [-25, 4],
 [-26, 4],
 [-23, 17],
 [-24, 17],
 [-25, 18],
 [-26, 18],
 [23, -3, -17],
 [24, -3, -17],
 [25, -4, -18],
 [26, -4, -18],
 [-27, 5],
 [-28, 5],
 [-29, 5],
 [-30, 5],
 [-31, 6],
 [-32, 6],
 [-33, 6],
 [-34, 6],
 [-27, 3],
 [-28, 3],
 [-29, 3],
 [-30, 3],
 [-31, 4],
 [-32, 4],
 [-33, 4],
 [-34, 4],
 [-27, 11],
 [-28, 11],
 [-29, 11],
 [-30, 11],
 [-31, 12],
 [-32, 12],
 [-33, 12],
 [-34, 12],
 [27, -5, -3, -11],
 [28, -5, -3, -11],
 [29, -5, -3, -11],
 [30, -5, -3, -11],
 [31, -6, -4, -12],
 [32, -6, -4, -12],
 [33, -6, -4, -12],
 [34, -6, -4, -12],
 [-35, 7],
 [-36, 8],
 [35, -7],
 [36, -8],
 [-37, 9],
 [-

### Task 8

Use the `utils.to_circuit` function to compile the CNF into an arithmetic circuit.

As a sanity check compare the probabilities of the arithmetic circuit that you obtain with `.eval` to those of your Bayesnet.

*Note:* `.eval` takes $x\in\mathbb{R}^n$ and outputs $p(x)$.

In [11]:
import utils
import dill
with open('help/state_monoms.pkl', 'rb') as f:
    state_monoms_help = dill.load(f)
with open('help/cond_monoms.pkl', 'rb') as f:
    cond_monoms_help = dill.load(f)
with open('help/cnf.pkl', 'rb') as f:
    cnf_help = dill.load(f)
utils.to_circuit(adj.shape[0], state_monoms_help, cond_monoms_help, cnf_help) # Funktioniert bei mir nicht :(

# TODO: sanity check - are probabilities correct?

sh: 1: /home/farin/Uni/Graphische Modelle Lab/exercise03/compiler/util/c2d/hgr2htree: not found


Error during compilation process: 
Constructing CNF... DONE
CNF stats: 
  Vars=64 / Clauses=178
  CNF Time	0.000s
Constructing vtree (from primal graph)...Cannot open dtree file: htree_4057.tmp



CalledProcessError: Command '['compiler/util/c2d/miniC2D', '-c', 'tmp/nnf/graphical_model.cnf']' returned non-zero exit status 1.