# WIP (Most probably merge with 02_approximate_inference
# Computing on Bayes Network
To understand Pyro's behaviour, it is critical to understand the basic of how to approximate different probability queries using sampling on Bayes networks. Let's take the classical Student network from Koller's book:

![Student network](../gfx/student_network.png)

# Computing the probability of an event Y=y, P(Y) =?

This is easy. Because can perform topological ordering of the network's node. Then perform **forward-sampling** (ancestral sampling) on networks node following the topological ordering. We have the estimation of $P(Y=y)$ by generating M samples and: 

$$P(Y=y) = \frac{1}{M}\sum_{m=1}^{M}\mathbb{1}(y[m] = y)$$

# Computing conditional probability of an event P(Y=y|E=e)?

This is much harder than the unconditional version. The most obvious solution is to generate M samples and keep only those samples that are consistent with E=e. Then we we can estimate P(Y=y|E=e) as in the unconditional one. This is called **Rejection sampling**. Unfortunately this method wastes a lot of computation. Because, in general, P(E=e) is normally very low so we need to generate a lot of samples (M>>0). 

## Likelihood-weighted particle generation
This method uses forward-sampling to generate samples but with one modification. Since we have the observed variables (E=e), during the forward sampling, the observed values are assigned to observed variables, then the sampling is carried in downstream fashion as before. 

The key thing here is to reweight each samples according the the probaiblity of desired values in the observation E:

$$w = \prod_{i=1}^{|E|} p(e_i| pa_{e_i}),$$

where $pa_{e_i}$ is the assignments to the parent nodes of $e_i$. Using these M weighted samples, we can estimate the conditional distribution as follows.

$$P(y|e) = \frac{\sum_{m=1}^{M}w[m]\mathbb{1}(y[m] = y)}{\sum_{m=1}^{M}w[m]}$$

![](../gfx/likehood_weighted_particle_generation.png)

This likelihood-weighted method is a special case of importance sampling, which we study in the next section.

## Importance sampling
Importance sampling is normally used to estimate some expection of a function $f(X)$ wrt some distribution $p(X)$. But we can also use it to compute the probability of some event A, $P(A)$. Recall that $E[{\mathbb{I}(A=a)}] =p(A=a)$. 

If we can generate M samples from $p(X)$ easily, then we can estimate this expection $E_{p}[f] = \frac{1}{M}\sum_{m=1}^{M}f(x[m])$. However it is not always the case. P(x) maybe difficult to sample from or it is not normalized. 

The key idea is to generate samples from another distribution $q$, which is assumed to be easily to be sampled from. And reweight the samples when estimate the expection. Depending on how we perform the reweighting, we have different importance sampling method.

**Unnormalized importance sampling**: $$E[f] = \frac{1}{M}\sum_{m=1}^{M}f(x[m])\frac{P(x)}{Q(x)}$$, It is called **unnormalized** because P is a normalized distribution, i.e., there is no need to perform normalization.

**Normalized importance sampling**: In this case, we know $p(X)$ only up to a normalization factor, i.e., it is not normalized. Instead we have access only to its unormalized version $\widetilde{p}(X)$. Define the weight of a sample to be $$w(X) = \frac{\widetilde{p}(X)}{q(X)}$$. Then the expection is estimated as follows. $$E[f] = \frac{\sum_{m=1}^{M}f(x[m])w(x[m])}{\sum_{m=1}{w(x[m])}}$$

Normalized importance sampling is equivalent to likelihood-weighted particle generation.