<a href="https://colab.research.google.com/github/wdconinc/practical-computing-for-scientists/blob/master/Projects/Project1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Project 1: Determination of neutrino oscillation parameters

### Group members: *please fill this out with your names*

In [0]:
%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import math
import numpy.ma as ma

##Basic situation

You'll be doing some analysis of the behavior of neutrinos, a particularly mysterious and interesting fundamental particle. For the purpose of this exercise here's what you need to know about neutrinos:

###Weak, neutral and partnered

<img src="http://www.fnal.gov/pub/science/inquiring/matter/madeof/standardmodel.jpg" width=300>

There are three types of neutrinos $\nu_e, \nu_\mu, \nu_\tau$, and each is partnered with a charged lepton $e$ (the electron), $\mu$ (muon) and $\tau$ (tau) by the weak nuclear force (the W and Z bosons above). Neutrinos have no electric charge and interact so weakly that the earth is nearly transparent to them. They are also very, very light and for a long time we thought there was a good reason for them to have zero mass.

###A beam of neutrinos

<img src="http://inspirehep.net/record/1241411/files/Figures_NuMIBeamline.png" width=800>

We produce the neutrinos at Fermilab by smashing high energy protons into a graphite target. This makes pions (unstable combinations of u and d-quarks) which eventually decay to produce muon neutrinos $\nu_\mu$.

###We shoot the beam to our secret underground base in northern Minnesota

<img src="http://minnesota.publicradio.org/collections/special/columns/statewide/NOVA-Looking-North-large.jpg" width=300>

<img src="https://www-off-axis.fnal.gov/images/numi_lr.jpg" width=300>

But not so secret that it doesn't have a website: 
http://www.soudan.umn.edu/index.html

No, there is no tunnel. The neutrino beam is hardly attenuated at all by the earth.

###We measure the neutrinos with a massive magnetized iron detector

<img src="http://www.hep.ucl.ac.uk/minos/photos/fdminos.jpg" width=600>

#### It weighs 5400 tons and can measure the energy of muon neutrinos.  

**Q:** Wait, if the earth is transparent, how can the detector see a neutrino?!

**A:** I almost never does. Every day the beam produces roughly $8 \cdot 10^{17}$ neutrinos. The neutrino detector in Minnesota observes 1-2 $\nu_\mu$ per day. It just gets super lucky.

**Q:** How does it see neutrinos?  

**A:** Neutrinos are only visible when they interact, producing a muon and other charged particles. The detector observes the energy that those particles lose due to electro-magnetic interactions.

##Reading Google Sheets data into python

The data for this project is stored in the [data spreadsheet](https://drive.google.com/open?id=1ZuhqHP6E7pt4G9ZbktJiW1GT_butvOvOo7UqcSAglpg) on the Team Drive. If you look at this link more carefully, you will see that it contains the file id `1Zuh...glpg`. This uniquely identifies the data spreadsheet among all google drive files.

We will use several external modules to authenticate this python runtime for access to Google Drive and to access the data in the spreadsheet. If you are curious where this all comes from, take a look at the example Google Colaboratory notebook on [input and output](https://colab.research.google.com/notebooks/io.ipynb).

In [0]:
# Module to authenticate: follow the instructions
from google.colab import auth
auth.authenticate_user()

# Download, install and load gspread modules
!pip install --upgrade -q gspread
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())

# Open the data spreadsheet by file ID
sheet = gc.open_by_key('1ZuhqHP6E7pt4G9ZbktJiW1GT_butvOvOo7UqcSAglpg')
# Get the two worksheets of interest
sheet_sims = sheet.worksheet('simulation')
sheet_data = sheet.worksheet('data')

# get_all_values gives a list of rows (we don't want a formatted string but a number)
sims = sheet_sims.col_values(1, value_render_option = 'UNFORMATTED_VALUE')
data = sheet_data.col_values(1, value_render_option = 'UNFORMATTED_VALUE')

# Print size of the data sets
print(len(sims), len(data), "data points in simulation and data tables, respectively")

20000 1339 data points in simulation and data tables, respectively


In [0]:
print(sims)

[6.65, 2.17, 2.28, 3.07, 3.68, 6.82, 4.36, 1.6, 1.87, 2.46, 4.31, 2.81, 4.74, 1.77, 2.31, 5.21, 2.77, 3.42, 3.31, 5.13, 3.75, 3.71, 2.77, 4.2, 4.13, 4, 1.81, 1.61, 3.85, 1.96, 2.81, 3.56, 1.9, 4.09, 3.01, 2.04, 3.08, 1.79, 3.13, 5.35, 3.86, 4.51, 2.94, 2.46, 3.43, 2.79, 1.98, 4.04, 2.87, 4.91, 3.26, 3.82, 3.21, 3.08, 4.74, 6.17, 3.61, 5.18, 4.14, 4.15, 4.01, 4.58, 4.4, 1.98, 2.85, 6.57, 1.08, 2.64, 2.48, 3.68, 4.29, 5.64, 4.29, 2.19, 4.59, 4.21, 2.2, 2.51, 5.22, 4.71, 3.24, 2.57, 4.01, 3.37, 1.64, 3.04, 2.61, 3.71, 5.43, 2.73, 1.91, 2.29, 3.75, 3.86, 4.12, 3.99, 1.29, 3.72, 4.25, 3.04, 4.83, 2.72, 2.96, 2.57, 2.31, 4.51, 3.43, 4.01, 4.41, 1.63, 3.61, 1.26, 2.41, 5.75, 3.47, 3.53, 2.97, 3.08, 4.7, 2.27, 2.63, 3.84, 2.92, 2.06, 3.06, 1.99, 4.65, 2.67, 3.1, 4.67, 4.21, 3.53, 1.02, 4.87, 2.65, 3.53, 5.18, 3.81, 5.76, 4.63, 3.02, 3.35, 3.35, 4.68, 5.41, 2.82, 2.67, 2.22, 3.27, 2.46, 6.22, 2.86, 2.34, 2.68, 2.43, 2.57, 3.78, 2.79, 1.66, 5.73, 2.03, 2.2, 2.96, 3.68, 3.63, 4.03, 5.69, 4.56, 4.

##Read in simulated data and make a histogram

The first sheet (labeled *simulation*) of the spreadsheet is a list of the energy of each muon neutrino interaction observed (called an _event_) by the detector. This list was produced by a detailed Monte Carlo (MC) simulation of the neutrino beam, neutrino interactions, electricity and magnetism, the size, location, and geometry of the detector, and the way the detector sees charged particles. It assumes that the earth is transparent to neutrinos and they travel on straight line paths to Minnesota. Such simulations rely on computer generated psuedo-random numbers; they are referred to as "Monte Carlos", after the casino in Monaco. 

The energies are in units of GeV = 1 billion (Giga) electron-volts (eV). In those units the proton has a mass of $0.938~\mathrm{GeV/c}^2$ $(E=mc^2)$. In particle physics we normally choose a unit system in which $c=1$ which allows us to refer to masses (and momenta) in GeV units. Read the file into an `np.ndarray` and then make a histogram (`plt.hist`) with 20 energy ranges (called _bins_), starting at 0 GeV and ending at 20 GeV.  This should make a plot. Label the x axis "neutrino energy (GeV)" and the y axis "number of entries per bin".

Now, imagine that we kept on simulating the experiment and created another neutrino interaction. We'd find that it has some energy and clearly that energy isn't fixed. In fact, we can think of the neutrino energy as a random variable which takes on a different value every time we measure it. If we divided each y value in the histogram (the _bin contents_) by the total number of events placed into the histogram (_the number of entries_)  we would end up a plot that shows, on the y-axis, the probability of the new interaction falling within each of the energy bins. A histogram of the list of event energies (a list of _random variates_ of the random variable) therefore serves as an approximation to the probability distribution function for the neutrino energy.

##Read in actual data and histogram

Now do the same as above but using the data in the second sheet (labeled *data*) of the spreadsheet.

##They are different.

The difference is really great to see because there is some new physics to discover here!  What's going on?

First, there is a completely trivial difference. We simulated the experiment for 10 times longer than we actually took data. This is useful because it makes statistical fluctuations in the simulation small enough to be negligible. But, it also means when comparing the simulation histogram to the data histogram we need to divide the bin contents of the former by a factor of 10.

Second, you can see that there is a shape difference. It can be better characterized by looking at the ratio of data/MC as a function of the neutrino energy.

##Construct the data/MC ratio and plot with error bars

The `plt.hist` function returns the bin contents and bin edges in arrays.  You want to construct the data/MC ratio of the bin contents and make a plot vs. neutrino energy. Clearly, a bin spans a range of energies but the most representative value is the center of the bin. You will want to plot the ratio as points with error bars. Recall that in the single-photon version of the double slit experiment we were doing a counting experiment and in that case the uncertainty is the square root of the number of counts. We are in the same situation here. Each bin of each of the histograms is an independent counting experiment, so estimate the uncertainties accordingly. Make sure your axes are labeled.

##Neutrino oscillations

Now, here's the physics, much of which can be understood with the knowledge you'll learn in QM-I and Modern. For simplicity, we'll just consider the case where we have two neutrinos $\nu_\mu$ and $\nu_\tau$. As it turns out that's a very good approximation in our situation. 

#### mass and flavor eigenstates

_If neutrinos have mass_ we can express any neutrino wave function in terms of the eigenstates of the Hamiltonian $\mathcal{H}$ (the energy operator in QM):

$$ \mathcal{H} | \nu_1 \rangle = E_1 | \nu_1 \rangle $$
$$ \mathcal{H} | \nu_2 \rangle = E_2 | \nu_2 \rangle $$

If the neutrino is at rest, then the energy is just the rest energy - the mass:

$$ \mathcal{H} | \nu_1 \rangle = m_1 | \nu_1 \rangle $$
$$ \mathcal{H} | \nu_2 \rangle = m_2 | \nu_2 \rangle $$


Those are the _mass eigenstates_. We can also express any neutrino wavefunction in terms of flavor ($\nu_\mu$,$\nu_\tau$) states, defined by which charged lepton ($\nu$,$\tau$) they are partnered with. As it turns out, there is no reason that a neutrino flavor eigenstate also corresponds to a mass eigenstate. Instead they could be related by a mixing matrix:

$$
\left[
\begin{array}{l}
\nu_\mu\\
\nu_\tau
\end{array}
\right]
=
\left[
\begin{array}{ll}
\cos\theta & \sin\theta \\
-\sin\theta & \cos\theta
\end{array}
\right]
\left[
\begin{array}{l}
\nu_1\\
\nu_2
\end{array}
\right]
$$

#### Born as a flavor, moves as a mass

When a pion decays it produces a muon and, since they are partners, a muon neutrino.  The neutrinos in our neutrino beam therefore start ($t=0$) as pure $\nu_\mu$ states, which we can write as a combination of $\nu_1$ and $\nu_2$ states:

$$| \nu_\mu (0) \rangle = \cos\theta |\nu_1 \rangle + \sin\theta | \nu_2 \rangle $$

According to Schroedinger's equation, the wavefunction at a later time is:

$$| \nu_\mu (t) \rangle =  e^{-i E_1 t}\cos\theta |\nu_1 \rangle + e^{-i E_2 t} \sin\theta | \nu_2 \rangle $$

#### $\nu_\mu$ disappearance

If we create a $\nu_\mu$ and then observe it later, the probability that we see a $\nu_\mu$ is:

$$ P(\nu_\mu \to \nu_\mu) = |\langle \nu_\mu (0) | \nu_\mu (t) \rangle|^{2} = \left| e^{i E_1 t}\cos^2\theta + e^{i E_2 t} \sin^2\theta \right|^2 $$

With a little manipulation this becomes the _oscillation formula_:

$$ P(\nu_\mu \to \nu_\mu) =  1 - \sin^2 2\theta \sin^2\left(1.27 \frac{\Delta m^2 L}{E}\right)$$

where $\Delta m^2 \equiv m_2^2 - m_1^2$ in units of $\mathrm{eV^2}$, $L$ is the distance between creation and detector ($L=735~\mathrm{km}$ in our case) and $E$ is the neutrino energy in GeV. The $1.27$ takes care of the units, factors of $\pi$, etc.


##Plot the oscillation function for a few parameter values

Plot the oscillation formula for values of $\Delta m^2$:

$$\Delta m^2 = 5\times\{10^{-4},10^{-3},10^{-2},10^{-1}\}\quad \mathrm{eV}^{2}$$

and 3 values of $\sin^2 2\theta$:

$$\sin^2 2\theta=\{0.5,0.75,1.0\}$$

with $L=735~\mathrm{km}$.

Make the plot in a 2x2 grid, with different $\Delta m^2$ in different figures, and different $\sin^2 2\theta$ as different lines in the same figure.

Make sure each figure indicates the $\Delta m^2$ value and that there is a legend in one of the 4 figures indicating which $\sin^2 2\theta$ values correspond to which curves. Make sure the curve for each value of $\sin^2 2\theta$ have the same color and line style (solid, dashed, etc) between the 4 plots. Write $\sin^2 2\theta = 0.5$ as `\sin^2 2\\theta = 0.5` (for some reason the extra `\` is needed).

Label the x axis as "Neutrino Energy (GeV)" and the Y axis as "Survival Probability".

##Estimating the oscillation parameters: $\chi^2$-by-eye

Plot the oscillation function atop the data/MC ratio and vary the parameters to make the curve run through the datapoints as best you can. With this procedure you can estimate the true parameter values.

##Compute the $\chi^2$ for a grid in $\sin^2 2\theta$ and $\Delta m^2$

You want to make a plot showing the $\chi^2$ as a function of the parameters $\sin^2 2\theta$ and $\Delta m^2$. Suggestions:

* Define a function `chi2` which computes the $\chi^2$ , taking as arguments the data/mc ratio and uncertainties on it as vectors, and $\sin^2 2\theta$ and $\Delta m^2$ as scalars (single numbers).

* Define a 2d grid, with $\sin^2 2\theta$ on the x-axis and $\Delta m^2$ on the y axis. Choose the range to show the region around your chi2-by-eye. 

* Using loops, compute an `ndarray` containing the $\chi^2$ at each grid point and display the result as a `contourf`, with colormap and axis labels. 

* Vary your range so it's small enough to see the approximate location of the minimum but large enough to see some of the structure around it.

##Minimum $\chi^2$ and a fit

Modify your code above to record the smallest $\chi^2$ and its location in the plane.  

Then, conduct a $\chi^2$ fit of the oscillation formula to the data/MC ratio, using the techinques we applied in the double-slit fitting exercise. Start the fit at the minimum you found.  Report the $\chi^2_{BF}$ at the best fit point as well as the parameters and their uncertainties.

##Plot the best fit

Plot the oscillation formula, with the best fit parameters, on top of the data/MC ratio.

##Uncertainties in 2D

As it turns out, we can estimate the uncertainties directly from the 2D $\chi^2$ plot. Here is what you do:

* Instead of plotting $\chi^2$ you want to plot $\Delta \chi^2 = \chi^2 - \chi^2_{BF}$.
* Then, draw contours for $\Delta \chi^2=[2.30, 4.61, 5.99]$. You may need to change the range and number of grid points to get smooth contours.
* The contours correspond to _confidence intervals_. If we were able to repeat our experiment many, many times, and we repeated all of our analysis, the contours we'd draw would contain the true value of $(\Delta m^2,\sin^2 2\theta)$  68, 90, and 95% of the time. 
  * With some additional ingredients, we can infer the probability that our single confidence interval contains the true value. But, this is beyond the scope of the problem.

##Systematic uncertainties

Close out the project by considering the effect of two sources of systematic uncertainty.

* First, what happens if we have mis-measured the neutrino energy by 5%?  To estimate the effect of this uncertainty shift the energy of each simulated event upwards by 5%, recalculate the ratio, and redo the fit. How much do the parameters change? Do the same for a downward shift.

* Second, what happens if our real detector "misses" 3% of the neutrinos? Perhaps we didn't know it was off for a while, or that some part of it wasn't working right. Scale the contents of the MC energy histogram up and down by 3%, recalculate the ratio, redo the fit, and report on how the parameters change.