# Checkpoint 1

**Due: Tuesday, 18 October, 2022 at 11:00am BST**

Total points: 100

### Read This First
1. Use the constants provided in the cells. Do not use your own constants.

2. Wherever you see `raise NotImplementedError()`, remove that line and put your code there.

3. Put the code that produces the output for a given task in the cell indicated. You are welcome to add as many cells as you like for imports, function definitions, variables, etc.

4. Your notebook must run correctly when executed once from start to finish. Your notebook will be graded based on how it runs, not how it looks when you submit it. To test this, go to the *Kernel* menu and select *Restart & Run All*.

5. Once you are happy with it, clear the output by selecting *Restart & Clear Output* from the *Kernel* menu.

6. Submit through Noteable.

In [None]:
from matplotlib import pyplot as plt
%matplotlib inline
import numpy as np
import time

In [None]:
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 14

# Problem 1 - 20 points

## Interpolation
You are given an array of x and y measurements that you need to interpolate on new locations.

The file *ch1_1_data.txt* is a text file that contains two arrays of Xs and Ys in two rows that need to be interpolated. The file *ch1_1_test.txt* is a text file that contains an array of X values on which you need to evaluate the interpolated function.

You will need to do the interpolation by chosing the best interpolation technique among linear interpolation, cubic splines and smoothing splines with different values of smoothing parameters.

You need to write the code that

* selects the best among different interpolation methods for a provided dataset.
* returns the array of the results of evaluating the best interpolation method on the test dataset. Note, the returned array of interpolated Y values should correspond directly to the X values from the test file. That is, the first returned Y value should correspond to the first X value and so on.

The resulting array will then be verified to provide a mean square error (MSE) with respect to the true values of **MSE < 0.1**, where

$
\large
\begin{align}
MSE = \frac{1}{N} \sum_{i=1}^{N} (y_{interp, i} - y_{true, i})^2.
\end{align}
$

In [None]:
import scipy.interpolate

In [None]:
def solve_task1():
    """
    This function needs select the best interpolation method for provided data  
    and return the numpy array of interpolated values at the locations specified in test.txt
    """
    
    # Load .txt files into numpy arrays
    data = np.loadtxt("ch1_1_data.txt")
    x_test = np.loadtxt("ch1_1_test.txt")
    
    # data rows from the data file
    x_data = data[0]    
    y_data = data[1]

    #sort x_data and y_data
    ind = np.argsort(x_data)
    # indices to sort the arrays
    x_sort = x_data[ind]
    y_sort = y_data[ind]
    
    # sort test data
    ind_test = np.argsort(x_test)
    x_test_sort = x_test[ind_test]
    
    
    # select the best interpolation method by calculating the MSE
    
    nsplit = 3 # split the data in 3 chunks 
    N = len(x_sort)
    pos = np.arange(len(x_sort))
    
    # zero values set for the MSE for each interpolator tested
    ret_linear = 0   # linear interpolator
    ret_cubic = 0   # cubic splines
    ret_smooth_a = 0   # smoothing splines (smoothing parameter = 1.0)  
    ret_smooth_b = 0   # smoothing splines (smoothing parameter = .18)   
    ret_smooth_c = 0   # smoothing splines (smoothing parameter = .05)   

    
    # calculate the MSE with the data divided into nsplits for each interpolator 
    for i in range(nsplit):
        testsubset = pos%nsplit ==i 
        fitsubset = ~testsubset 
        curx = x_sort[fitsubset]
        cury = y_sort[fitsubset]
        testx = x_sort[testsubset]
        testy = y_sort[testsubset]
        
        # linear interpolator
        int_linear = scipy.interpolate.interp1d(curx, cury, fill_value="extrapolate")
        ret_linear = ret_linear + np.mean((int_linear(testx) - testy)**2)
        
        # cubic splines
        int_cubic = scipy.interpolate.CubicSpline(curx, cury)
        ret_cubic = ret_cubic + np.mean((int_cubic(testx) - testy)**2)
        
        # smoothing splines (smoothing parameter = 1.0)  
        int_smooth_a = scipy.interpolate.UnivariateSpline(curx, cury, s=1)
        ret_smooth_a = ret_smooth_a + np.mean((int_smooth_a(testx) - testy)**2)
        
        # smoothing splines (smoothing parameter = .18)  
        int_smooth_b = scipy.interpolate.UnivariateSpline(curx, cury, s=.18)
        ret_smooth_b = ret_smooth_b + np.mean((int_smooth_b(testx) - testy)**2)
        
        # smoothing splines (smoothing parameter = .05)
        int_smooth_c = scipy.interpolate.UnivariateSpline(curx, cury, s=.05)
        ret_smooth_c = ret_smooth_c + np.mean((int_smooth_c(testx) - testy)**2)        
    
    # total MSE for each inteprolator used
    ret_linear = ret_linear/nsplit
    ret_cubic = ret_cubic/nsplit
    ret_smooth_a = ret_smooth_a/nsplit
    ret_smooth_b = ret_smooth_b/nsplit
    ret_smooth_c = ret_smooth_c/nsplit
    
    # array with all the MSE values for different interpolator used
    ret = np.array([ret_linear, ret_cubic, ret_smooth_a, ret_smooth_b, ret_smooth_c])
    # index of the value with a smallest MSE 
    index = np.argmin(ret)
    
    # depending on which index had the smallest MSE that is going to be the interpolator method used
    
    # linear interpolation 
    if index == 0:
        inte = scipy.interpolate.interp1d(x_sort, y_sort, fill_value = "extrapolate")
        y_int_or = inte(x_test_sort)
        
    # cubic interpolation 
    if index == 1:
        inte = scipy.interpolate.CubicSpline(x_sort, y_sort)        
        y_int_or = inte(x_test_sort)        
        
    # smoothing splines, parameter s=1
    if index == 2:
        inte = scipy.interpolate.UnivariateSpline(x_sort, y_sort, s=1)
        y_int_or = inte(x_test_sort)
        
    # smoothing splines, parameter s=.18
    if index == 3:
        inte = scipy.interpolate.UnivariateSpline(x_sort, y_sort, s=.18)
        y_int_or = inte(x_test_sort)
        
    # smoothing splines, parameter s=.05
    if index == 4:
        inte = scipy.interpolate.UnivariateSpline(x_sort, y_sort, s=.05)   
        y_int_or = inte(x_test_sort)
        
    # "un"sort the y_int dat to match the original order of x_test 
    undo_ind = np.searchsorted(x_test_sort, x_test)
    # interpolated values returned in order matching the unordered data from the x data in ch1_1_test.txt file
    y_int = y_int_or[undo_ind]
    
    return y_int



In [None]:
# plotting data and results as a sanity check 

# interpolated from the test data
y_int_unsorted = solve_task1()

# Load .txt files into numpy arrays
data = np.loadtxt("ch1_1_data.txt")
x_test = np.loadtxt("ch1_1_test.txt")

# data rows from the data file
x_data = data[0]    
y_data = data[1]

#sort x_data and y_data
ind = np.argsort(x_data)
# indices to sort the arrays
x_sort = x_data[ind]
y_sort = y_data[ind]


# plot
plt.plot(x_sort, y_sort, "b-", label="Given X and Y data")
plt.plot(x_test, y_int_unsorted, "rx", label="Interpolated test data")
plt.legend();

We will add tests to the cell below when grading.

In [None]:
# This function will be tested with this 
# assert ( np.mean((solve_task1()- YTRUE )**2) < 0.1)

print ("Testing, testing...")



# Problem 2 - 80 points

This problem is divided into 5 tasks, worth the following point values:

1. 20 points
2. 15 points
3. 15 points
4. 20 points
5. 10 points

## The 1D time-independent Schrödinger equation

In one dimension, the time-independent Schrödinger equation is given by

$
\large
\begin{align}
\mathbf{H}\ \mathbf{\Psi} = E\ \mathbf{\Psi}
\end{align}
$,

where $\mathbf{H}$ is the Hamiltonian. Here, $E$ and $\mathbf{\Psi}$ are the eigenvectors and eigenvalues of $\mathbf{H}$, respectively. The Hamiltonian is expressed as

$
\Large
\begin{align}
H = -\frac{\hbar^2}{2m} \nabla^2 + V(r),
\end{align}
$

where $V(r)$ is the electric potential energy, given by

$
\Large
\begin{align}
V(r) = -\frac{e^{2}}{4 \pi \epsilon_{0} r}.
\end{align}
$

In matrix form, the Schrödinger equation is solved for N equally spaced values of r, such that r goes from ($r_{max}$/N) to $r_{max}$, where $r_{max} \sim 1.5$ nm is a sensible choice. To turn the Schrödinger equation into a matrix, $\textbf{V(r)}$ should be a diagonal matrix with the values of the potential at each r along the diagonal.

For this problem, the constants for the above equations have been defined for you in the cell below. Please use these for your calculations.
* $\frac{\hbar^{2}}{2m} = 0.0380998\ nm^{2} eV$ (called `c1` below)
* $\frac{e^{2}}{4 \pi \epsilon_{0}} = 1.43996\ nm\ eV$ (called `c2` below)
* $r_{0} = 0.0529177\ nm$ (the Bhor radius, called `r0` below)
* Planck constant $h = 6.62606896\times10^{-34} J s$ (`h`)
* Speed of light $c = 299792458\ m/s$ (`c`)

In [None]:
# Constants (use these)
c1 = 0.0380998 # nm^2 eV
c2 = 1.43996 # nm eV
r0 = 0.0529177 # nm
h  = 6.62606896e-34 # J s
c  = 299792458. # m/s
hc = 1239.8419 # eV nm

# Task 1 - 20 points

For this task, you will create the matrix representing $\mathbf{H}$ and find the two lowest eigenvalues. These correspond to the two lowest energy levels of the Hydrogen atom.

In the constants defined above, the theoretical values for the first two energy levels are given by

$
\Large
\begin{align}
e_{n} = \frac{c_2}{2 r_0 n^2},
\end{align}
$

where $r_{0}$ is the Bhor radius, given by

$
\Large
\begin{align}
r_{0} = \frac{4 \pi \epsilon_{0} \hbar^{2}}{m e^{2}}.
\end{align}
$

In the cells below, write a function that creates a matrix representing the Hamiltonian and returns the two lowest eigenvalues. **This function should take a single argument, N, for the size of the matrix.**

Use your function to determine the minimum value of N (within a factor of 2) required to compute the two lowest energy levels to within **0.05\%** of the theoretical values. Print the values of the two energy levels and the error for each, where the error is $abs((E_{calc} - E_{theo}) / E_{theo})$. $E_{calc}$ is the calculated value and $E_{theo}$ is the theoretical value. **Note, your code should iteratively call your function while increasing N (e.g., doubling it each time) and stop when the desired error is reached. It is not sufficient to simply run the code at a single value of N that meets the criteria.**

In [None]:
from scipy import sparse
from scipy.sparse import linalg as splinalg
import math

In [None]:
def hamiltonian_solver(N):
    
    # tolerance 0.05%
    error_crit = 0.0005

    # setting initial errors for the energy values
    error_en0 = 1
    error_en1 = 1 
    
    # loop that stops when the energy errors are smaller than the tolerance
    while error_en0 and error_en1 > error_crit: 
        
        r_max = 1.5 # in nm (maximum radious used)
        r_min = 1.5/N  # in nm (minimum radious used)


        # define the laplace part of the hamiltonian
        diagonals = [1, -2, 1]
        offsets = [-1, 0, 1]
        laplace_m = -1*c1*((1/r_min)**2)*sparse.diags(diagonals, offsets, shape=(N, N))

        # define the electric potential part of the equation
        diag_dist  = np.linspace(r_min, r_max, N)
        potential_m = sparse.diags([-1*c2/diag_dist], [0], shape=(N,N))

        # total hamiltonian 
        hamiltonian_m = laplace_m + potential_m

        # two lowest eginvalues and eigenvectors
        evals, evecs = splinalg.eigsh(hamiltonian_m, k=2, which="SA")

        # error of the eigenvalues
        theo_value_0 = -c2/(2*r0*(1**2))  # theoretical value for lowest energy 
        error_en0 = abs((evals[0] - theo_value_0)/ theo_value_0)   

        theo_value_1 = -c2/(2*r0*(2**2))   # theoretical value for second-lowest energy
        error_en1 = abs((evals[1] - theo_value_1)/ theo_value_1)
        
        # increment the steps (separation)
        N *= 2
    
        
    
    return N, evals[0], error_en0, evals[1], error_en1

In [None]:
print("Min N: ", hamiltonian_solver(4)[0])
print("Lowest energy value: ", hamiltonian_solver(4)[1], "eV, with error: ", hamiltonian_solver(4)[2], "eV" )
print("Second-lowest energy value: ", hamiltonian_solver(4)[3], "eV, with error: ", hamiltonian_solver(4)[4], "eV" )

## Task 2 - 15 points

Now, imagine the Coulomb law has a minor modification to it, and is now given by:

$
\Large
\begin{align}
F(r) = -\frac{e^{2}}{4 \pi \epsilon_{0} r^{2}} \left( \frac{r}{r_{0}} \right)^{\alpha},
\end{align}
$

where $\alpha = 0.01$ and $r_{0}$ is the Bhor radius, given by:

$
\Large
\begin{align}
r_{0} = \frac{4 \pi \epsilon_{0} \hbar^{2}}{m e^{2}}.
\end{align}
$

The electric potential is given by:

$
\Large
\begin{align}
V(r) = \int_{r}^{\infty} F(r^{\prime}) dr^{\prime}
\end{align}
$

Using the constants defined previously, write a function to calculate V(r) using the modified Coulomb law by numerically integrating the equation above. This function need only accept a single value of radius and not an entire array. Your function must agree with the analytical value to within $10^{-5}$ eV.

Your function should go in the cell below using the template for `potential_numerical`.

In another cell, make a plot of V(r) over the range of r values used in Task 1. Remember to label axes and show units.

In [None]:
from scipy import integrate

In [None]:
# function to calculate V(r) by numerically integrating Coulomb law
def potential_numerical(r, alpha):
   
    # define the coulomb law force function
    force = lambda r, alpha: -c2*np.power(r,alpha)*np.power(r0, -alpha)*np.power(r,-2)
    
    # integral value that gives the potential
    int_val = integrate.quad(force, r, np.inf, args = (alpha,))

    return int_val[0]

The cell below will test your function for a few values of radius.

In [None]:
def potential_exact(r, alpha):
    return c2*np.power(r,alpha-1)*np.power(r0,-alpha) / (alpha-1)

for my_r in np.linspace(0.01, 1, 100):
    diff = abs(potential_numerical(my_r, 0.01) - potential_exact(my_r, 0.01))
    assert(diff <= 1e-5)

Plot V(r) in the cell below.

In [None]:
steps = 1024  # choosen value from task 
r_max = 1.5 # in nm 
r_min = 1.5/steps

# list to store potential values to plot
V_r_val = []

# points for my_r and x grid for plot
x_grid = np.linspace(r_min, r_max, steps)

for my_r in x_grid:
    V_r_val.append(potential_numerical(my_r, 0.01))
    
# plot    
plt.plot(x_grid, V_r_val, "b-")
plt.title("Potential dependent of radial distance")
plt.xscale("log")
plt.ylabel("V(r)  [eV]")
plt.xlabel("r [nm] (log scale)");

## Task 3 - 15 points

Write a function to calculate the first 2 energy levels (eigenvalues of $H$) for $\alpha = 0.01$ and print out the values in eV. The values must be accurate to 0.01 eV. Use the function template `calculate_energy_levels_modified` below for your function. It is fine to call functions you've already written. 

In the cell after, plot the difference $\Delta E$ between the two lowest energy levels as a function of $\alpha$ for $\alpha = 0$ and $0.01$. Remember axes labels and units.

In [None]:
def calculate_energy_levels_modified(N, alpha):
    
    r_max = 1.5 # in nm (maximum radious used)
    r_min = 1.5/N  # in nm (minimum radious used)


    # define the laplace part of the equation
    diagonals = [1, -2, 1]
    offsets = [-1, 0, 1]
    laplace_m = -1*c1*((1/r_min)**2)*sparse.diags(diagonals, offsets, shape=(N, N))

    # calculating potential part of the equation
    V_values = []   # list to store the values 
    for my_r in np.linspace(r_min, r_max, N):
        V_values.append(potential_numerical(my_r, alpha))
    
    # convert list into numpy array for easier manipulation
    np.array(V_values)
    potential_m = sparse.diags([V_values], [0], shape=(N,N))

    # total hamiltonian 
    hamiltonian_m = laplace_m + potential_m

    # two lowest eginvalues and eigenvectors
    evals, evecs = splinalg.eigsh(hamiltonian_m, k=2, which="SA")
    
    return evals 



The cell below will test your function against the correct values.

In [None]:
N = 1024
alpha = 0.01
E1, E2 = calculate_energy_levels_modified(N, alpha)

In [None]:
# print two lowest energy values with modified potential  for different alphas
print("For alpha = 0.01: E0 = ", E1, "(eV) and E1 = ", E2, " (eV)")

In the cell below, make the plot of $\Delta E$ vs. $\alpha$ as instructed above.

In [None]:
dif_energy = []   # list to store the difference in energy values 
alpha_new = 0
grid = np.array([0, 0.01])   # points for alpha and x grid for plot

# calculating thne difference in energy between two lowest energy values 
for alpha_new in grid:
    en_0 = calculate_energy_levels_modified(N, alpha_new)[0]
    en_1 = calculate_energy_levels_modified(N, alpha_new)[1]
    diff_e = en_1 - en_0
    dif_energy.append(en_1 - en_0)
       
# plot    
plt.plot(grid, dif_energy, "b")
plt.xlabel("Alpha")
plt.ylabel("E1 - E0 [eV]")
plt.title("Energy difference between lowest energy values E1 and E0");
    

## Task 4 - 20 points

The transition between the 1st and 2nd states is known as the Lyman-$\alpha$ transition. The photon emitted by this transition will have a wavelength, $\lambda$, given by

$
\Large
\begin{align}
\lambda = \frac{hc}{\Delta E}.
\end{align}
$

Imagine the wavelength of this transition has been measured as $\lambda = 121.5 \pm 0.1$ nm. What is the maximum value of $\alpha > 0$ consistent with this measurement (i.e., the largest $\alpha$ such that the predicted and measured wavelengths differ by less than 0.1 nm)?

Using the template `find_alpha_max`, write a function that performs the above computation and returns the value of $\alpha_{max}$. Your value for $\alpha_{max}$ should be within 1% of the correct answer.

In [None]:
import scipy.optimize as opt

In [None]:
def find_alpha_max():

    # define the function in which the roots have to be found
    def dif_e_zero(alpha):
        
        N = 1024
        min_wav = 121.4
        en_0 = calculate_energy_levels_modified(N, alpha)[0]
        en_1 = calculate_energy_levels_modified(N, alpha)[1]
        
        dif_e = en_1 - en_0
        
        return hc/dif_e - min_wav
    
    # computing roots using Brent's method 
    alpha_max = opt.brentq(dif_e_zero, 0, 0.5)
    
    return alpha_max

The cell below will run your function. You will not be told the correct answer.

In [None]:
amax = find_alpha_max()
print (f"alpha_max = {amax}.")

## Task 5 - 10 points

Knowing the shape of the matrix for of $\textbf{H}$, is it possible to greatly increase the accuracy of the energy level calculation without a significant increase in computation time? In the cell below, write a function to compute the first two energy levels using the original (unmodified) potential. Your function should run in 15 seconds or less and compute the first two energy levels each to within an accuracy of $5\times10^{-6}$.

In [None]:
def calculate_energy_levels_super():

    N = 16384
    
    r_max = 1.5 # in nm 
    r_min = 1.5/N

    # define the laplace part of the equation
    diagonals = [1, -2, 1]
    offsets = [-1, 0, 1]
    laplace_m = -1*c1*((1/r_min)**2)*sparse.diags(diagonals, offsets, shape=(N, N))

    # define potential part of the equation
    diag_dist  = np.linspace(r_min, r_max, N)

    potential_m = sparse.diags([-1*c2/diag_dist], [0], shape=(N,N))

    # total hamiltonian 
    ham = laplace_m + potential_m

    # two lowest eginvalues and eigenvectors
    # to optimize the finding of
    evals, evecs = sparse.linalg.eigsh(ham, k=2, sigma = -50, which="LA")

    # error of the eigenvalues
    theo_value_0 = -c2/(2*r0*(1**2))
    error_0 = abs((evals[0] - theo_value_0)/ theo_value_0)   

    theo_value_1 = -c2/(2*r0*(2**2))
    error_1 = abs((evals[1] - theo_value_1)/ theo_value_1)
        
    
    return evals

In [None]:
t1 = time.time()
my_e1, my_e2 = calculate_energy_levels_super()
t2 = time.time()
print (f"Calculation took {t2-t1} seconds.")

e1_th = -c2 / (2 * r0)
e2_th = e1_th / 4

er1 = abs((my_e1 - e1_th) / e1_th)
er2 = abs((my_e2 - e2_th) / e2_th)
print (f"Err1 = {er1}, Err2 = {er2}.")