# Maximum Likelihood Fit

We are analyzing a dataset that contains a normally distributed signal and a flat background, measured in 10 equidistant bins of $x$ from 0 to 10 as shown in the figure. The normal distribiution has an (unknown) mean $\mu$ and standard deviation $\sigma$, and we define the signal strength $S$ as the number of all measurements over the full range of $x$, i.e. not restricted to the range of the histogram. The background $B$ shall be described by the average number of entries in a bin of width 1.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

# histogram data, as retrieved from plt.hist()
hdata = np.array([1, 3, 6, 4, 6, 8, 1, 0, 1, 0])
hbins = np.linspace(0,10,num=11)

print(f"{ int(np.sum(hdata)) } entries in total")

plt.bar(hbins[:-1],hdata,align="edge",width=1)
plt.xlabel("x")
plt.ylabel("counts")
plt.show()

## Pure Background

Calculate the likelihood that this dataset can be described by background only. 

Assume that the bin contents are simply statistical fluctiations around the expected background $B$. Think about the distribution of entries per bin, and calculate for each bin the conditional probability $P(y_i | B)$ of finding the observed number of entries $y_i$ in bin $i$, assuming a background $B$.

Report the likelihood values for all bins for $B=1$ and $B=4$.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

print("Likelihoods per bin for B=1:", likelihood_bg_bin(hdata, B=1))

In [None]:
assert np.all( likelihood_bg_bin(hdata, B=1) > 0.0 )

## Best Fit of Background

Which value of $B$ provides the best fit to the data? What is the likelihood of this most likely scenario in the absence of a signal?

Describe the shape of the likelihood function and discuss the uncertainty.

In [None]:
max_likelihood_bg_B = -1 # set to the B value where you find the maximum likelihood
max_likelihood_bg_L = -1 # set to the maximum value of the likelihood

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert max_likelihood_bg_B > 0
assert max_likelihood_bg_L > 0

YOUR ANSWER HERE

## Likelihood Function

Implement a function to calculate the likelihood $\mathcal{L}$ for a parameter vector $(S,\mu,\sigma,B)$, given the measurements shown in the histogram. Calculate and report the likelihood for $S=20$, $\mu=5$, $\sigma=2$ and $B=1$.

*Hint: Remember the distribution of entries per bin from the previous question. And how do you best deal with the finite bin width?*

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# print a likelihood value for testing
print( "Likelihood:", likelihood(hdata,hbins, 20,5,2,1) )
assert likelihood(hdata,hbins, 20,5,2,1) > 0


## Optimization

For which set of fit parameters is the likelihood maximal? What is the value of the likelihood $\mathcal{L}$ at this point?

In [None]:
# Set the following variables to the location and value of the maximum likelihood
max_likelihood_sig_S     = -1 # signal strength
max_likelihood_sig_mean  = -1 # mean of normal distribution
max_likelihood_sig_sigma = -1 # std dev of normal distribution
max_likelihood_sig_B     = -1 # background
max_likelihood_sig_L     = -1 # likelihood

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert max_likelihood_sig_S > 1

## Uncertainty

Visualize the likelihood in the neighborhood of the best fit.

Discuss the uncertainty based on your plots of the likelihood. Are the uncertainties Gaussian? Are they correlated or statistically independent?

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# YOUR CODE HERE
raise NotImplementedError()