## Exercise 3.4 Beta updating from censored likelihood
 (Source: Gelman.) Suppose we toss a coin $n = 5$ times. Let $X$ be the number of heads. We observe that there are fewer than 3 heads, but we don’t know exactly how many. Let the prior probability of heads be $p(\theta) = Beta(\theta|1,1)$. Compute the posterior $p(\theta|X < 3)$ up to normalization constants, i.e., derive an expression proportional to $p(\theta, X < 3)$. Hint: the answer is a mixture distribution.

### Notes
In this case, we have to infer a posterior for the parameter $\theta$ based on limited information. We can't multiply the prior and the likelihood as in the notes to get a beta distribution. The only information we have is that the number of heads is less than three $(X < 3)$. So we have an event which have more than one possible outcome. 

However, it is simple to decompose this particular event into individual results in the sample space:

$$
p(X < 3|\theta) = p(X = 0 |\theta) + p(X=1|\theta) + p(X=2|\theta)
$$

Therefore, our problem is nothing more than a combination of the simpler case mentioned above. As such, we should expect a similar combination to appear in the result.

### Solution
Random variables:
- X: number of heads

The posterior is given by Bayes' rule:

$$
p(\theta|X < 3) = \frac{p(X < 3|\theta)p(\theta)}{p(X < 3)}
$$

Since we want to derive an expression proportional to the posterior, we can ignore the $p(X < 3)$ in the calculation. So what is left is to calculate the likelihood, calculate the prior and combine both of them.

Likelihood:

\begin{aligned}
p(X < 3|\theta) & = p(X = 0|\theta) + p(X=1|\theta) + p(X=2 |\theta) \\
& = \binom{5}{0}\theta^0(1-\theta)^5 + \binom{5}{1}\theta^1(1-\theta)^4 + \binom{5}{2}\theta^2(1-\theta)^3 \\
& = \mathrm{Bin}(0|\theta,5) + \mathrm{Bin}(1|\theta,5) + \mathrm{Bin}(2|\theta, 5)
\end{aligned}

As shown above, the likelihood is a mixture distribution composed by 3 Binomial distributions. Let's concentrate on the prior $p(\theta)$. As the question informs us, the prior is given by 

$$
p(\theta) = Beta(\theta|1, 1) = \frac{1}{B(1, 1)}\theta^0(1-\theta)^0 = 1 = U(0, 1)
$$

Thus the prior is a uniform distribution over the interval [0, 1]. Therefore, it becomes easy to calculate the posterior (up to a normalization constant):

\begin{aligned}
p(\theta|X < 3) & \propto p(X < 3 | \theta)p(\theta) \\
& = \mathrm{Bin}(0|\theta,5) + \mathrm{Bin}(1|\theta,5) + \mathrm{Bin}(2|\theta, 5)
\end{aligned}.

Therefore, $p(\theta |X < 3) \propto\mathrm{Bin}(0|\theta,5) + \mathrm{Bin}(1|\theta,5) + \mathrm{Bin}(2|\theta, 5)$. We get the desired solution if we calculate 

$$
p(X < 3) = \int_\theta p(X < 3 |\theta)p(\theta)d\theta
$$

In this question, we showed how to infer the posterior of the parameter $\theta$ from a censored likelihood. The final result was a mixture model formed by three binomial distributions with the same weight. A mixture model was expected because our censored likelihood could be decomposed into a sum of non-censored likelihoods.

Question: the posterior of the non-censored case gives us a posterior with a Beta distribution. So why don't we arrive at a mixture model involving Beta distributions in this exercise? To answer this, we should note that the binomial and the beta distribution have the same general form $\theta^{\gamma_1}(1-\theta). To be precise, we have the following relationship between the models:

$$
Bin(k|\theta,n)\propto Beta(\theta|k+1, n-k+1)
$$

Unfortunately, the equality does not hold because the Beta function is not equal to the binomial coefficient, but we can express this as 

$$
p(\theta|X < 3) \propto Beta(\theta|1, 6) + Beta(\theta|2, 5) + Beta(\theta|3, 4)
$$