# Exercise 1 - Inference Queries, Independence

In this exercise, we will answer inference queries from a probability table and check for marginal independence.

In the event of a persistent problem, do not hesitate to contact the course instructors under
- paul.kahlmeyer@uni-jena.de

### Submission

- Deadline of submission:
        30.10.2022
- Submission on [moodle page](https://moodle.uni-jena.de/course/view.php?id=34630)

### Help
In case you cannot solve a task, you can use the saved values within the `help` directory:
- Load arrays with [Numpy](https://numpy.org/doc/stable/reference/generated/numpy.load.html)
```
np.load('help/array_name.npy')
```
- Load functions with [Dill](https://dill.readthedocs.io/en/latest/dill.html)
```
import dill
with open('help/some_func.pkl', 'rb') as f:
    func = dill.load(f)
```

to continue working on the other tasks.

# Probability Table

We will use a probability table derived from the [migraine dataset](https://www.kaggle.com/datasets/weinoose/migraine-classification). 
Due to the lack of description, we can only guess what some of the attributes stand for exactly.
Nevertheless, we have 11 discrete features from patients suffering from migraine.

1. `Age`: the age of the patient in intervals of 20 years
    - 0: $\leq 20$
    - 1: $\in (20, 40]$
    - 2: $\in (40, 60]$
    - 3: $>60$
2. `Duration`: how long did the migraine attack last?
    - 0: short
    - 1: normal
    - 2: long
3. `Intensity`: how intense was the migraine attack?
    - 0: very light
    - 1: light
    - 2: intense
    - 3: very intense
4. `Nausea`: did patient feel sick?
    - 0: no
    - 1: yes
5. `Vomit`: did the migraine attack cause the patient to vomit?
    - 0: no
    - 1: yes
6. `Phonophobia`: did the patient feel specifically sensitive for sound?
    - 0: no
    - 1: yes
7. `Photophobia`: did the patient feel specifically sensitive for light?
    - 0: no
    - 1: yes
8. `Tinnitus`: did the patient suffer from tinnitus?
    - 0: no
    - 1: yes
9. `Conscience`: did the patient loos conscience?
    - 0: no
    - 1: yes
10. `Paresthesia`: did the patient feel numbness?
    - 0: no
    - 1: yes
11. `Type`: What kind of migraine did the patient have?
    - 0: Basilar-type aura
    - 1: Familial hemiplegic migraine
    - 2: Migraine without aura
    - 3: Other
    - 4: Sporadic hemiplegic migraine
    - 5: Typical aura with migraine
    - 6: Typical aura without migraine

### Task 1

Load the probability table from `prob_table.npy`.
The first 11 columns correspond to the features, the last column holds the probabilty.

In [7]:
columns = ['Age', 
           'Duration', 
           'Intensity', 
           'Nausea', 
           'Vomit', 
           'Phonophobia',
           'Photophobia', 
           'Vertigo', # Fehlt in der Erklärung, daher auch 12 columns plus probability
           'Tinnitus', 
           'Conscience', 
           'Paresthesia',
           'Type',
           'Probability']

import numpy as np
import pandas as pd
import dill
prob_table = np.load('prob_table.npy')

# Inference Queries

The probability table encodes the joint probability distribution $p(x_1, \dots, x_{11})$, where $x_i$ corresponds to the $i$-th feature. The whole point of having such a distribution is to answer queries with it.

## Prior Marginal
For the prior marginal, a subset of indices $I\subseteq\{1,\dots, 11\}$ is given and the marginal distribution 

\begin{equation}
p(x_I)
\end{equation}

has to be computed.

### Task 2

Calculate the marginal distribution of `Vertigo` and `Vomit`.

In [8]:
def prior_marginal(prob_table:np.ndarray, I:np.ndarray) -> np.ndarray:
    '''
    Computes the probability table for a subset of the indices.
    
    @Params:
        prob_table... numpy array with columns holding values, last column holding the probabilities
        I... numpy array with indices
    
    @Returns:
        numpy array with columns holding values, last column holding the probabilities for indices in I
    '''
    dataframe = pd.DataFrame(prob_table, columns=columns)
    col_names = [columns[i] for i in I]
    grouped = dataframe.groupby(col_names)['Probability'].sum() # Groups by the selected columns, summing the probabilities
    vals = np.array(list(grouped.index.values)) # The last lines just turn the output into a neat numpy array
    if len(vals.shape) < 2:
        vals = np.array([vals]).T
    probs = np.array([grouped.to_numpy()]).T
    return np.hstack((vals, probs))

prior_marginal(prob_table, [4,7])

array([[0.        , 0.        , 0.25076552],
       [0.        , 1.        , 0.24964624],
       [1.        , 0.        , 0.25010439],
       [1.        , 1.        , 0.24948385]])

## Posterior Marginal
For the posterior marginal, two subsets of indices $I, J\subseteq\{1,\dots, 11\}$ together with values $e_J\in \mathcal{X}_J$ are given and the conditional distribution 

\begin{equation}
p(x_I|x_J=e_J) 
\end{equation}

has to be computed.

### Task 3
Calculate the posterior marginal distribution of `Type` given we observe the patient feels sick (`Nausea`=1) but no has no tinitus (`Tinnitus`=0).

In [9]:
def posterior_marginal(prob_table:np.ndarray, I:np.ndarray, J:np.ndarray, e_J:np.ndarray) -> np.ndarray:
    '''
    Computes the probability table for a subset of the indices given other subset is set to values.
    
    @Params:
        prob_table... numpy array with columns holding values, last column holding the probabilities
        I... numpy array with indices
        J... numpy array with indices
        e_J... numpy array with values for J
    
    @Returns:
        numpy array with columns holding values, last column holding the probabilities for indices in I
    '''
    table = prob_table
    for index in range(len(J)): # Leaves only the rows that satisfy the conditions
        j = J[index]
        e = e_J[index]
        table = table[table[:,j] == e]
    dataframe = pd.DataFrame(table, columns=columns)
    col_names = [columns[i] for i in I]
    grouped = dataframe.groupby(col_names)['Probability'].sum() # Groups by the selected columns, summing the probabilities
    vals = np.array(list(grouped.index.values)) # The last lines just turn the output into a neat numpy array
    if len(vals.shape) < 2:
        vals = np.array([vals]).T
    probs = np.array([grouped.to_numpy()]).T
    return np.hstack((vals, probs))

posterior_marginal(prob_table, [11], [3,8], [1,0])

array([[0.        , 0.03569523],
       [1.        , 0.03573003],
       [2.        , 0.0359794 ],
       [3.        , 0.03569523],
       [4.        , 0.03568943],
       [5.        , 0.03706389],
       [6.        , 0.03572423]])

## Probability of Evidence

For a subsets of indices $I, J \subseteq \{1, \dots, 11\}$ and evidence $e_I\in \mathcal{X}_I$ and $e_J\in\mathcal{X}_J$ compute the posterior marginal probability 

\begin{equation}
p(x_I = e_I| x_J = e_J)\,.
\end{equation}

In the special case $J = \emptyset$, compute the prior marginal probability $p(x_I = e_I)$.

### Task 4

Calculate the probability of a short attack (`Duration` = 0) given the patient is $\leq 20$ years (`Age` = 0) old and experiences swindle (`Vertigo` = 1).

In [10]:
def prob_of_evidence(prob_table:np.ndarray, I:np.ndarray, e_I: np.ndarray, J:np.ndarray, e_J:np.ndarray) -> float:
    '''
    Computes the probability of I being e_I given J is e_J.
    
    @Params:
        prob_table... numpy array with columns holding values, last column holding the probabilities
        I... numpy array with indices
        e_I... numpy array with values for I
        J... numpy array with indices
        e_J... numpy array with values for J
    
    @Returns:
        probability of I being e_I given J is e_J.
    '''

    table = posterior_marginal(prob_table, I, J, e_J)
    for i in range(len(I)): # Leaves only the rows that satisfy the conditions
        e = e_I[i]
        table = table[table[:,i] == e]
    return table[0,1]

prob_of_evidence(prob_table, [1], [0], [0,7], [0,1])

0.04158160898209149

## Most probable explanation (MPE)

Given evidence $e_J\in\mathcal{X}_J$ for a subset of indices $J\subseteq\{1,\dots, 11\}$, compute

\begin{equation}
\text{argmax}_{x\in\mathcal{X}} p(x|x_J = e_J)\,.
\end{equation}

### Task 5

What is the intesity of the most probable explanation for a long (`Duration` = 2) migraine attack of a 30 years old (`Age` = 1) patient with Tinitus (`Tinnitus` = 1), both Phono- and Photophobia (`Phonophobia` = 1, `Photophobia` = 1) where we know that it is of the type "Basilar-type aura" (`Type` = 0)?

In [11]:
def most_prob_explanation(prob_table:np.ndarray, J:np.ndarray, e_J:np.ndarray) -> np.ndarray:
    '''
    Computes the most probable x given some evidence
    
    @Params:
        prob_table... numpy array with columns holding values, last column holding the probabilities
        J... numpy array with indices
        e_J... numpy array with values for J
    
    @Returns:
        x that maximizes probability of x given J is set to e_J
    '''
    table = posterior_marginal(prob_table, [0,1,2,3,4,5,6,7,8,9,10,11], J, e_J) # Leaves only the rows where the conditions hold
    return table[table[:,-1] == table[:,-1].max()] # Return the row(s) with maximum probability

most_prob_explanation(prob_table, [0,1,5,6,8,11], [1,2,1,1,1,0])

array([[1.00000000e+00, 2.00000000e+00, 3.00000000e+00, 1.00000000e+00,
        0.00000000e+00, 1.00000000e+00, 1.00000000e+00, 0.00000000e+00,
        1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        1.73981628e-05],
       [1.00000000e+00, 2.00000000e+00, 3.00000000e+00, 1.00000000e+00,
        0.00000000e+00, 1.00000000e+00, 1.00000000e+00, 1.00000000e+00,
        1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        1.73981628e-05]])

### Maximum a Posteriori hypothesis (MAP)

For a subsets of indices $I, J \subseteq \{1, \dots, 11\}$ and evidence $e_J\in\mathcal{X}_J$ compute 

\begin{equation}
\text{argmax}_{x_I} p(x_I|x_J = e_J)\,.
\end{equation}

### Task 6
What is the maximum a posteriori hypothesis for the type of migraine attack (`Type`) of a 15 year old person (`Age` = 0) that has tinnitus (`Tinnitus` = 1)?

In [12]:
def max_a_posteriori(prob_table:np.ndarray, I:np.ndarray, J:np.ndarray, e_J:np.ndarray) -> np.ndarray:
    '''
    Computes the most probable x given some evidence
    
    @Params:
        prob_table... numpy array with columns holding values, last column holding the probabilities
        I... numpy array with indices
        J... numpy array with indices
        e_J... numpy array with values for J
    
    @Returns:
        x_I that maximizes probability of x given J is set to e_J
    '''
    table = posterior_marginal(prob_table, I, J, e_J) # Returns the posterior marginal of the variables specified by I
    return table[table[:,-1] == table[:,-1].max()] # Return the row with maximum probability

max_a_posteriori(prob_table, [11], [0,8], [0,1])

array([[1.        , 0.01782732]])

# Independence

As pointed out in the lecture, the number of parameters reduces if we know two features are independent.
Independence of features also has great value for the interpretation of data: One feature does not contain any information about the other.

Here we want to look at **marginal independence**. Two features $x_i, x_j$ are marginally independent, if
\begin{equation}
p(x_i, x_j) = p(x_i)p(x_j)\,.
\end{equation}

Of course in real data we will never have perfect marginal independence.

### Task 7
Implement the function `independence_error`, that calculates how similar the vectors
\begin{align}
v_1 &= \left[p(x_i = e_i, x_j = e_j)\right]_{e_i\in\mathcal{X}_i,e_j\in\mathcal{X}_j}\\
v_2 &= \left[p(x_i = e_i)p(x_j = e_j)\right]_{e_i\in\mathcal{X}_i,e_j\in\mathcal{X}_j}\\
\end{align}
are. If they are very similar, they are close to being marginally independent.

Which features are closest to being marginally independent from `Type`?

In [13]:
def independence_error(prob_table : np.ndarray, i : int, j : int) -> float:
    '''
    Compares the vectors p(x_i, x_j) and p(x_i)*p(x_j).
    
    @Params:
        prob_table... numpy array with columns holding values, last column holding the probabilities
        i... index of first feature
        j... index of second feature
        
    @Returns:
        difference of vectors p(x_i, x_j) and p(x_i)*p(x_j)
    '''
    
    v1 = prior_marginal(prob_table, [i,j])[:,2]
    v2_i = prior_marginal(prob_table, [i])
    v2_j = prior_marginal(prob_table, [j])
    v2 = np.outer(v2_i[:,1], v2_j[:,1]).flatten()
    if i == j:
        v2 = v2_i[:,1] * v2_j[:,1]
    
    return np.abs(v1-v2)

minimum = 100
for i in range(12):
    norm = np.linalg.norm(independence_error(prob_table, i, 11))
    print(f'Type and {columns[i]}: {norm}')
    if norm < minimum:
        minimum = norm
        feature = columns[i]
print(f'\nCloses feature to being marginally independent from Type: {feature} with {minimum}')

Type and Age: 0.000491144450742104
Type and Duration: 0.00047312471376042946
Type and Intensity: 0.0006689385745864952
Type and Nausea: 0.0008619743780921558
Type and Vomit: 0.00046993221677922684
Type and Phonophobia: 0.0008705248413889781
Type and Photophobia: 0.0008684019145065829
Type and Vertigo: 0.0008800343231366438
Type and Tinnitus: 0.0008908473316882825
Type and Conscience: 0.0008555802891183125
Type and Paresthesia: 0.000858599505828171
Type and Type: 0.3239701450406122

Closes feature to being marginally independent from Type: Vomit with 0.00046993221677922684
