# LIBRARIES
* Execute this cell before going any further. 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

<br/><br/>

# Warmup

## DataFrames and Tabular Data 
`10 points` 

You’ve likely worked with tabular data before, especially in Excel. In Python, we use the `pandas` library (imported as `pd`) to handle this type of data efficiently. The core component of pandas is the `pd.DataFrame` object, which functions much like a spreadsheet. Proficiency in `pandas` is an evergreen skill—it can make data processing tens or even hundreds of times faster compared to manual work in Excel.

In this warmup, we'll explore the basics of the `pd.DataFrame` using a simple dataset. This will prepare you for the extensive use we'll make of this data structure throughout the assignment.

### CODE
* Store the given lists in a dictionary, using meaningful labels as keys.
* Use the pd.DataFrame constructor to create a DataFrame from the dictionary.
* Display the DataFrame by making it the last line of a code cell. What does it look like?
* Multiply a column by 5, store the result in a new column on the DataFrame, and display the updated DataFrame.

In [None]:
students      = ['Bob','Alice','Giancarlo']
test_scores   = [95,85,75]
gpa           = [3.0,2.0,5.0]

In [None]:
my_data = {
    # Just like we define dictionaries with angle brackets [],
    # we define dictionaries with braces {}.
    # we define key-value pairs with the key and value paired by a colon
    # and successive key-value pairs separated by commas, like follows:
    # 'my_key' : some_list,
    # 'another_key' : another_list,
    # (remember these are just placeholder names!)
    #define your own dictionary, using the three lists above.
}

In [None]:
# call pd.DataFrame() as a function.
#use the dict my_data as an argument. 
#end this block with the variable my_dataframe alone on a line.

In [None]:
#we can access columns using angle brackets [],
#just like we can access data in a dictionary by key.
#we can make new columns by using the angle brackets []
#specifying a column that doesn't exist, like as follows:
#my_dataframe['some_column'] = my_dataframe['old_column'] + 7 
#(this would add 7 to every element in the column with title 'old_column')

<br/><br/>
<br/><br/>

# The Photoelectric Effect

In the previous activity, we explored how the Double Slit Experiment provides strong evidence for the wave nature of light. Would it surprise you to learn that another experiment just as convincingly demonstrates that light behaves as particles?

The phenomenon in question is the photoelectric effect. The experimental apparatus consists of two metal plates separated by a gap within a circuit. When light shines on one plate, it may eject electrons, which travel to the other plate and complete the circuit. This is measured as an electric current.

Because energy is conserved, we know that the energy from the light is split between two parts:

$$
E_{total} = E_k + E_{escape}
$$
where $E_{total}$ is the energy imparted to the electron by light, $E_K$ is the kinetic energy of the ejected electron, and $E_{escape}$ is the energy required for the electron to escape the metal's surface. 

Classical wave theory predicts that light can gradually add energy to electrons until they are able to escape. If this were true, we would expect that increasing the intensity of the light would in turn increase the kinetic energies of the electrons contributing to the current. What if the energy transfer follows a different rule?

<br/><br/>

## PART 1 - Measuring the Kinetic Energy of Electrons
`20 points`  

To examine this behavior, we can use a potential energy barrier to measure the kinetic energy of electrons. By applying a voltage between the plates which opposes the electrons' motion, we create a threshold: only electrons with at least a certain amount of kinetic energy can cross. 

For example, if we set this counter-voltage to 3V, any electrons detected must have at least 3 eV of kinetic energy (The __electronvolt__ [eV] is a unit of energy equal to one volt times the charge of an electron). By gradually increasing the voltage and recording the current, we can measure the energy distribution of the ejected electrons.

Let's explore this energy distribution using our spectra. To do this, we'll process our data using `pandas`. Instead of manually constructing a `DataFrame`, we'll load our data directly from a file using: 
```
my_data = pd.read_csv("my_placeholder_filename.csv")
```

However, staring at columns of numbers doesn’t reveal much—Jupyter even truncates long tables by default. To better understand the data, we’ll visualize it using `plt.plot()`, allowing us to see relationships between different columns in our `DataFrame`.

### CODE
* use `pd.read_csv(filename)` to read one of your spectra.
* Display the contents of your `DataFrame`.
* Use `plt.plot(x,y)`, where `x` and `y` are the appropriate columns from your data.
* Apply basic formatting to your plot using the provided functions.

In [None]:
#if the file is in a different folder than this assignment,
#use "folder/filename" as the argument

In [None]:
plt.scatter(#use data['some_key'] for the x and y arguments.)
plt.title() #choose a title! 
plt.ylabel() #label your axes (with units!)
plt.xlabel()

### SHORT RESPONSE QUESTIONS
1. What are the column names in `data`? 
2. How many rows are in the dataset? What does each row correspond to in the experiment?
3. Describe the trends you observe in the spectrum. How does the current change as voltage increases?
4. What is the approximate kinetic energy of the most energetic electrons? How is the $x$ (not $y$) intercept of the spectrum related to this value?
### ANSWER

<br/><br/>
<br/><br/>

## PART 2 - Proccessing our Spectrum
`20 points`  

Clearly, the __cutoff voltage__ is a key value in our analysis. It gives us the kinetic energy of the fastest electrons:
$$
E_k = eV_{stop}
$$
Where $V_{stop}$ is the cutoff voltage or __stopping potential__, and e is the charge of an electron. If we use units of electronvolts, the factor of $e$ cancels out.

It would be interesting to compare the kinetic energy of the most energetic electrons as we change the wavelength of light.  We should write some code to extract this value, as we will be able to re-use it for all of the spectra.

For our purposes, there are three primary ways to process tabular data:
1. __Filtering:__ Selecting subsets of data based on a condition (e.g., removing zero values).
2. __Column Operations:__ Applying transformations to modify or create new columns (e.g., converting wavelengths to frequencies).
3. __Reduction:__ Computing statistics or extracting key values (e.g., calculating cutoff voltage).

To determine the cutoff voltage, we will use __linear regression__ on the nonzero portion of our spectrum with `np.polyfit()` and find where it crosses the $x$-axis with `np.roots()`.

### CODE
* Filter your data to remove all points below a small, nonzero threshold.
* Perform a linear fit on your data using `np.polyfit(x_data, y_data, degree)`, with degree set to 1.
* Display the result. Use `plt.scatter(x,y)` for data points and `plt.plot(x,y)` for the regression line.
* Once we have our slope and intercept, calculate the x intercept using `np.roots([slope,intercept])`.

In [None]:
#an example of the syntax for applying a mask / filter to a DataFrame:
#dataframe = dataframe[ dataframe['numbers'] == 5 ]
#this would return all rows of the dataframe where the value in the 
#'numbers' column was equal to 5.

In [None]:
# example of the syntax for a linear fit using NumPy's polyfit function:
#slope, y_intercept = np.polyfit(x_array,y_array,1)

In [None]:

# copy the functions you previously used to display your plot above this line^

#This part is given; this is how we display our trendline
x_trendline = np.linspace(-1,4,100) #this gives us a list of x values
y_trendline = slope * x_trendline + y_intercept #this gives us the corresponding y values
wavelength_nm = your_data['Wavelength (nm)'][0] #change this line!
plt.plot(x_trendline,y_trendline,label=f'{wavelength_nm}nm : {slope:.2f} x + {y_intercept:.2f}')
plt.legend() #this function displays labels

#these set boundaries on our plot so it looks nice
plt.xlim(0,5)
plt.ylim(0,8)

#this puts everything together and plots it.
plt.plot()

In [None]:
#np.roots gives the roots (where y == 0) of a polynomial.
#It will return a list of values, but for a linear equation there is only one root.
#we can access this first element using list_of_roots[0].

# roots = ??
# cutoff_voltage = roots[0]

print(f"Cutoff Voltage: {cutoff_voltage:.3f}")

### SHORT RESPONSE QUESTIONS
1. Why is it important to remove zero values before computing a linear fit?
2. What is the more precise cutoff voltage calculated using this method?
3. If we repeat this experiment with a different wavelength of light, how would you expect the cutoff voltage to change?
### ANSWER

<br/><br/>
<br/><br/>
  

## PART 3 - Automating The Routine
`20 points`  

To understand how light's energy depends on frequency, we need to analyze multiple spectra. Instead of repeating the same steps manually, we’ll organize our code into a reusable __function__. 

First, we'll create a function which accepts the filename of a spectrum and processes it according to the procedure we just developed. Then, we'll create another which finds every .csv spectrum file in a folder and applies our routine to each of them one by one, storing the results in a `DataFrame`.

This demonstrates the principle of **abstraction**—just as we don’t need to understand a computer’s inner workings to use it, we don’t need to rewrite a function’s logic every time we call it. Functions can call other functions, creating a heirarchy of logic which is easy to read and modify—as opposed to thousands of lines of unorganized code.

<br/><br/>

### 3a - Processing one spectrum
`10 points`  

### CODE
* Create a function called `process_spectrum()`. It should accept one argument, `filename`.
* This function should use your routine and plot the spectrum with its regression line. It must return `wavelength_nm`, `frequency_Hz`, and `cutoff_voltage`, in that order.
* Test your new function on several different spectra with different wavelengths. Use the same metal, if you have spectra for multiple metals.

In [3]:
# you can get wavelength and frequency from the spectra themselves.
# use 
#wavelength = my_data['Wavelength (nm)'][0] 
#and
# frequency = my_data['Frequency (Hz)'][0]

# your function goes here!!!

In [4]:
# call your function with a few different spectra.

<br/><br/>

### 3b - Scaling Up
`10 points`  

### CODE
* A function `process_all_data(folder)` is given. Leave comments explaining what each line does.
* Run the function on your spectra folder (ensure the only CSV files it contains are the spectra for this assignment).
* Store the result in a variable and display it by placing the variable on the last line of a code block.

In [6]:
import os #the python module for interacting with the operating system, file trees, etc
def process_all_data(folder):
    filenames = os.listdir(folder) 
    csv_files = [file for file in filenames if file.endswith('.csv')]
    
    wavelengths = []
    frequencies = []
    cutoff_voltages = []
    
    for filename in csv_files:
        wavelength, frequency, cutoff_voltage = process_spectrum(filename)
        wavelengths.append(wavelength)
        frequencies.append(frequency)
        cutoff_voltages.append(cutoff_voltage)

    table = pd.DataFrame({
        'Wavelength (nm)': wavelengths,
        'Frequency (Hz)': frequencies ,
        'Stopping Potential (eV)': cutoff_voltages 
    })
    table.sort_values(by=['Frequency (Hz)']) #make the table easier to read
    return table

In [9]:
#if the .csv spectra are in the same folder as this assignment,
# use './' as the folder argument.
# call the function here.

### SHORT RESPONSE QUESTIONS
1. What is the utility of collecting all of our code into a function?
4. How does a change in wavelength correspond to the change in frequency?
2. What trend do you observe as we change the wavelength of light used, and what does this imply?
### ANSWER

<br/><br/>
<br/><br/>

## Part 4 - Light Energy Versus Frequency 
`20 points`  

We’ve reduced each spectrum to two key values: stopping potential and frequency. Now, we’ll analyze their relationship to extract fundamental physical constants.

From energy conservation:
$$
E_{total} = E_k + E_{escape}
$$
At the stopping potential $V_{stop}$, we know from earlier that $E_k = eV_{stop}$. So, we can rearrange this equation into the following:
$$
eV_{stop} = E_{total} - E_{escape}
$$
By plotting $V_{stop}$ against frequency, we can determine whether a linear relationship exists. If so, the intercept is the energy required to escape the metal’s surface (the __work function__) times $-1$, and the slope reveals how light’s energy depends on frequency.

### CODE
* Take a linear regression of stopping potential versus frequency in $Hz$ and plot the data with the regression line.
* Find the slope - this will be the proportionality between light frequency and energy in $eV\cdot{}s$.
* Convert this value to Joule-seconds ($J\cdot{}s$) and print it.
* Find the __work function__ of your metal from the intercept and print it.

In [10]:
#you can copy-paste the functions you used 
#to plot data and find the slope/intercept before here.

#make sure you change all the variable names to appropriate values, however.

In [11]:
#energy_constant_Js = slope * 1.602e-19 # the unit conversion for eV*s to J*s

### SHORT RESPONSE QUESTIONS
1. The constant you found relating the energy and frequency- where have you seen this before?
2. What is the percent error for your calculated value compared to the accepted value?
3. What value do you obtain for the work function of your metal, and what percent error?
4. What are possible sources of error?
### ANSWER

<br/><br/>
<br/><br/>

# REFLECTION

`10 points`  

### SHORT RESPONSE QUESTIONS 
1. How did using `pandas` and creating functions allow us to automate and scale up our data processing routine? How might such an approach prove useful for your work as a chemist?
2. How does the photoelectric effect challenge the classical wave model of light? What observations from your data suggest that light behaves as a particle?
### ANSWER

<br/><br/>
<br/><br/>