# Module 6 Linear Fit
(Partially adapted from Newman 2012 Chapter 3, Page 122)

It's a common situation in physics that an experiment produces data that lies roughly on a straight line, like the dots in this figure:

![image-2.png](attachment:image-2.png)

The blue solid line here represents the underlying straight-line linear fit, which we
usually don't know beforehand, and the points representing the measured data lie
roughly along the line but don't fall exactly on it, typically because of
measurement error.

The straight line can be represented in the familiar form $y=mx+c$ and a
frequent question is what the appropriate values of the **slope** $m$ and
**intercept** $c$ are that correspond to the measured data.  Since the data
don't fall perfectly on a straight line, there is no perfect answer to such
a question, but we can find the straight line that gives the best
compromise fit to the data.  The standard technique for doing this is the
**method of least squares**.


In your working directory, other than this notebook file, you'll find another data file called
  **millikan.txt**.  The file contains two columns of numbers, giving
  the $x$ and $y$ coordinates of a set of data points.  
    
    


<div class="span alert alert-success">

Write a program to read the data points in  **millikan.txt** and make a graph with one dot for each point.


In [None]:
#your code here
import numpy as np
import matplotlib.pyplot as plt

plt.plot()

These values are taken from a historic experiment by Robert Millikan that measured the
  **photoelectric effect**.  When light of an appropriate wavelength is
  shone on the surface of a metal, the photons in the light can strike
  conduction electrons in the metal and, sometimes, eject them from the
  surface into the free space above.  The energy of an ejected electron is
  equal to the energy of the photon that struck it minus a small
  electric potential energy $e\phi$ called the **work function** of the surface,
  which represents the energy needed to remove an electron from the
  surface. In other words,

\begin{align}
\text{Energy of ejected electron} & = \text{Photon Energy} - \text{Work Function } \\
eV  &= h\nu - e\phi,
\end{align}

The energy of a photon is $h\nu$, where $h$ is Planck's
  constant and $\nu$ is the frequency of the light, and we can measure the
  energy of an ejected electron by measuring the voltage $V$ that is just
  sufficient to stop the electron moving.  Then the voltage, frequency, and
  work function are related by the equation

$$
V = {h\over e}\nu - \phi,
$$
where $e$ is the charge on the electron.  This equation was first given by
Albert Einstein in 1905.

The data in the file **millikan.txt** represent frequencies $\nu$ in
hertz (first column) and voltages $V$ in volts (second column) from
photoelectric measurements of this kind.  

**In this project, suppose that we are given the charge on the electron
$e=1.602\times10^{-19}\,$C, try to calculate from Millikan's experimental data a
value for Planck's constant $h$.**  We will compare our value of $h$ with the accepted value of
the constant, which you can find in books or online.  You should get a
result within a couple of percent of the accepted value.

This calculation is essentially the same as the one that Millikan himself
used to determine of the value of Planck's constant, although, lacking a
computer, he fitted his straight line to the data by eye.  In part for this
work, Millikan was awarded the Nobel prize in physics in 1923.


<div class="span alert alert-success">

Inpect the plot you made earlier and estimate on paper what the slope should be, neglecting the intersect for now. Using this slope, plot out a linear function along with the dots to test what is the best you can get.

In [None]:
#your code here


<div class="span alert alert-success">

The fit above looks like a good start, but we can certainly do better. First let's try moving the linear model around by hand and estimate a fit. The template below allows us to control a linear plot using sliders. Try to implement this combined with our data and estimate what the slope and intersect are.

In [None]:
from ipywidgets import interact, fixed, FloatSlider

def plot_test(A):
    fig = plt.figure()
    x = np.linspace(0,1,200)
    y = A*x
    plt.plot([0.5,0.6],[0.5,0.6],'ko')
    plt.plot(x,y,'r-')
    plt.xlim(0,1)
    plt.ylim(0,1)
    plt.close(fig)
    return fig

interact(plot_test, A=FloatSlider(min=0, max=2, step=0.01, value=1))

In [None]:
#your code here

As you can tell by this point, doing the fitting by hand is quite tedious. So next we'll perform the fit numerically. Suppose we make some guess about the parameters $m$ and $c$ for the
straight line.  We then calculate the vertical distances between the data
points and that line, as represented by the short vertical lines in the
figure. These are the **errors** that we wish to minimze.

![image.png](attachment:image.png)

Then we calculate the sum of the squares of those distances, which
we denote $\chi^2$.  If we have $N$ data points with
coordinates $(x_i,y_i)$, then $\chi^2$ is given by
$$
\chi^2 = \sum_{i=1}^N (mx_i+c-y_i)^2.
$$

The least-squares fit of the straight line to the data is the straight line
that minimizes this total squared distance from data to line.  We find the
minimum by differentiating with respect to both $m$ and $c$ and setting the
derivatives to zero, which gives
\begin{align}
m \sum_{i=1}^N x_i^2 + c \sum_{i=1}^N x_i - \sum_{i=1}^N x_iy_i &= 0, \\
m \sum_{i=1}^N x_i + cN - \sum_{i=1}^N y_i &= 0.
\end{align}

For convenience, let us define the following quantities:
$$
E_x = {1\over N} \sum_{i=1}^N x_i,\qquad
E_y = {1\over N} \sum_{i=1}^N y_i,\qquad
E_{xx} = {1\over N} \sum_{i=1}^N x_i^2,\qquad
E_{xy} = {1\over N} \sum_{i=1}^N x_iy_i,
$$
in terms of which our equations can be written

\begin{align}
mE_{xx} + cE_x &= E_{xy}\,, \\
mE_x + c &= E_y\,.
\end{align}

Solving these equations simultaneously for $m$ and $c$ now gives
$$
m = {E_{xy}-E_x E_y\over E_{xx} - E_x^2},\qquad
c = {E_{xx}E_y-E_x E_{xy}\over E_{xx} - E_x^2}.
$$

These are the equations for the least-squares fit of a straight line to $N$
data points.  They tell you the values of $m$ and $c$ for the line that
best fits the given data.


    
<div class="span alert alert-success">


Calculate the quantities $E_x$, $E_y$, $E_{xx}$, and $E_{xy}$ defined
  above, and from them calculate and print out the slope $m$ and
  intercept $c$ of the best-fit line. Plot this line along with the dots one last time, and compare with your manual estimate earlier.

In [None]:
#your code here
import numpy as np
import matplotlib.pyplot as plt


Ex = 
Ey = 
Exx =
Exy = 

m = 
c = 