# He_line_calc: a notebook for reducing He line data

## Introduction

This notebook reduces data produced by a Pfieffer PrismaPlus220 in the Helium Analysis Laboratory (HAL) at the University of Illinois. The HAL uses h6t and Pychron software for data reporting and measures masses 1-5 on line blanks, hot blanks, line gas standards, and samples. We use an isotope dilution approach with a $^3$He spike and a $^4$He reference gas of a known volume. As such, 4/3 gas ratios are measured with corrections for H, D, and HD. This notebook reads in the raw data files from either h6t or Pychron and reports out He amounts in terms of pmol.

Instructions for the use of this notebook are provided before each cell of code and should be followed step-by-step. Only one notebook is needed for a complete set of analyses that may encompass multiple days (referred throughout in this notebook as a "session"). Some cells will be run at the very beginning of data collection, whereas other cells will be run repeatedly as new data is collected. __Pay attention as to which cells need to be run once, and which will be repeatedly run throughout the data collection process!__ If you are unfamiliar with Jupyter Notebooks, the "run" button is found at the top of the notebook, and each cell can be run by highlighting it with the mouse and then clicking the "run" button.

### Step-by-step instructions

First, save a copy of this notebook to the relevant file on the HAL desktop (C:\Users\lab-admin\Desktop\Line_summary_sheets\20XX\XXX20XX\'lastname' where the Xs are specific to the date and 'lastname' is the user's last name) and make sure that all data files for a given session are saved to the same folder.

This first cell imports some useful packages, sets some constants, and intializes several key running means that will be used throughout the notebook. __This cell should be run only once! If it is rerun by accident before you have finished a sample run, you will need to fill in the various lists, sample times, and data frame with the appropriate ones from the cell below.__

In [None]:
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
%matplotlib inline

#constants
ideal_gas_moles = 22.414 #liter/mol
initial_tank_4He = 6.863044
tank_depletion = 0.999981

#initialize mean lists, lists are needed here to calculate standard deviations
#initialize reference list of lists and sample dictionary, ref lists tie running means to specific samples
hb_list = []
lb_list = []
std_list = []
std_num_list = []
hb_ref_list = []
lb_ref_list = []
std_ref_list = []
sample_dict = {}

#initialize sample times and reference numbers used to tie samples to relevant running means 
last_hb_time = 0
last_lb_time = 0
last_std_time = 0
std_ref_num = 0
lb_ref_num = 0
hb_ref_num = 0

#this data frame is for display purposes only
aliquot_frame = pd.DataFrame(columns = ['4He/3He intercept','4He/3He err', 'notes'], index = [])

This next cell of code reads each individual run file produced by h6t. You will need to __run this cell everytime you collect new data from the PrismaPlus__. In the cell below, enter the name of the aliquot you saved to the folder in the file_name variable. __You will need to type within the ''. The file extension is added to the aliquot name in the next line of code__. This should be the name of the sample, line blank, hot blank, or line standard. Line blanks have the style: 'lb_mmddyyyy' where mm = the month, dd = the day, and yyyy = the year. Hot blanks have the style: 'hb_mmddyyyy'. Line standards have the style: 'stdXXXX' where XXXX is the shot number from the $^4$He pipette as recorded in the notebook. Re-extracts are appended to the end of the sample name with the convention: 'sample_reX' where 'X' is the re-extract number (1-4). __You must follow these naming conventions or the rest of the code does not work!__

In [None]:
aliquot = 'std456'
file_name = aliquot + '.xlh'
Prisma_data_list = []

with open(file_name, mode='r') as in_file:
    #read in first line
    line = in_file.readline()
    
    #find the start of the mass intensity data
    while not line.startswith('cycle'):
        line = in_file.readline()
    
    #read in the data lines to a list of lists (because exact cycle number when data capture begins is not known) until the end
    line_present = True
    while line_present:
        line = in_file.readline()
        
        if not line:
            line_present = False
        else:
            line = line.split()
            line = [float(i) for i in line]
            Prisma_data_list.append(line)

#convert data_list to an array for easier indexing
Prisma_data_array=np.array(Prisma_data_list)
aliquot_time = Prisma_data_array[-1,1]

#create time list (x-values) and corrected 4He/3He list (y-values)
t_list = [(Prisma_data_array[i,1]-Prisma_data_array[0,1])*24*60*60 for i in range(len(Prisma_data_array))]
He_ratio_list = [(Prisma_data_array[i,5]-Prisma_data_array[i,6])/(Prisma_data_array[i,4]-Prisma_data_array[i,6]-0.005*Prisma_data_array[i,2]) for i in range(len(Prisma_data_array))]

#do some math to find the intercept and mean of the corrected 4He/3He
sum_t_y = 0
sum_t2 = 0
sum_slope_err = 0

for i in range(len(t_list)):
    sum_t_y = sum_t_y + t_list[i]*He_ratio_list[i]
    sum_t2 = sum_t2 + t_list[i]**2

slope = (len(t_list)*sum_t_y - sum(t_list)*sum(He_ratio_list))/(len(t_list)*sum_t2 - sum(t_list)**2)
intercept = (sum(He_ratio_list) - slope*sum(t_list))/len(t_list)

for i in range(len(t_list)):
    sum_slope_err = sum_slope_err + (He_ratio_list[i] - intercept - slope*t_list[i])**2

del_slope = math.sqrt(sum_slope_err/(len(t_list) - 2)) * math.sqrt(len(t_list)/(len(t_list)*sum_t2 - sum(t_list)**2))
del_intercept = math.sqrt(sum_slope_err/(len(t_list) - 2)) * math.sqrt(sum_t2/(len(t_list)*sum_t2 - sum(t_list)**2))

mean_4He_3He = np.mean(He_ratio_list)
stdev_4He_3He = np.std(He_ratio_list)

print('The int and err for this sample is ',intercept,' +/- ',del_intercept)
print('and the mean and std dev is ',mean_4He_3He,' +/- ',stdev_4He_3He)

Now you're ready to add these data to the running line summary list for your run. __If you do not want the data to be added to the running list do not run the next sequence of cells.__ There are various reasons why you might not want to add this particular aliquot to the running total. For example, typically the first line blank of the day comes in with a slightly high 4He/3He ratio and we discard it.

If you do want to add these data to the line summary list, click run in the next cell below to proceed. __Add any important sample notes to the sample data frame by filling in the '' next to the notes variable.__ The assumption here is that all data added to a given line summary list will be in chronologic order.

In [None]:
#add sample/run to sample data frame for display purposes at the end of cell
#enter in any notes you have in the notes variable below

notes = 'test'
row_data = pd.DataFrame({'4He/3He intercept':intercept,'4He/3He err':del_intercept,'notes':notes },columns = ['4He/3He intercept','4He/3He err', 'notes'], index = [aliquot])
aliquot_frame = pd.concat([aliquot_frame, row_data])

#selection statements to determine type of aliquot and what to do with it
#blanks and standards have additional selection statements in order to reset running means if long periods of time
#(8+ hours) have elapsed between measurements
#re-extracts are dealt with by adding their totals to the correct sample, NOTE: THIS ASSUMES THAT RE-EXTRACTS AND 
#THEIR SAMPLES ARE CO-LOCATED IN THE SAME LINE SUMMARY NOTEBOOK!

if aliquot.startswith('hb'):
    if (aliquot_time - last_hb_time)*24*60 > 8:
        #reset hb_list if longer than 8 hours between hot blanks or first hot blank
        hb_list = []
        hb_list.append(intercept)
        hb_mean = np.mean(hb_list)
        hb_err = del_intercept
        hb_ref_list.append([hb_mean, hb_err])
        last_hb_time = aliquot_time
        hb_ref_num = hb_ref_num + 1
    else:
        hb_list.append(intercept)
        hb_mean = np.mean(hb_list)
        hb_err = np.std(hb_list)
        hb_ref_list[hb_ref_num - 1] = [hb_mean, hb_err]
        last_hb_time = aliquot_time
elif aliquot.startswith('lb'):
    if (aliquot_time - last_lb_time)*24*60 > 8:
        #reset lb_list if longer than 8 hours between line blanks or first line blank
        lb_list = []
        lb_list.append(intercept)
        lb_mean = np.mean(lb_list)
        lb_err = del_intercept
        lb_ref_list.append([lb_mean, lb_err])
        last_lb_time = aliquot_time
        lb_ref_num = lb_ref_num + 1
    else:
        lb_list.append(intercept)
        lb_mean = np.mean(lb_list)
        lb_err = np.std(lb_list)
        lb_ref_list[lb_ref_num - 1] = [lb_mean, lb_err]
        last_lb_time = aliquot_time
elif aliquot.startswith('std'):
    if (aliquot_time - last_std_time)*24*60 > 8:
        #reset std_list and std_num_ list if longer than 8 hours between standards or first standard
        std_list = []
        std_list.append(intercept)
        std_num_list = []
        std_num_list.append(float(aliquot[3:]))
        std_mean = np.mean(std_list)
        std_err = del_intercept
        std_num = np.mean(std_num_list)
        std_ref_list.append([std_mean, std_err, std_num])
        last_std_time = aliquot_time
        std_ref_num = std_ref_num + 1
    else:
        std_list.append(intercept)
        std_num_list.append(float(aliquot[3:]))
        std_mean = np.mean(std_list)
        std_err = np.std(std_list)
        std_num = np.mean(std_num_list)
        std_ref_list[std_ref_num - 1] = [std_mean, std_err, std_num]
        last_std_time = aliquot_time
elif aliquot[-4:-1] == '_re':
    reextract_sample = aliquot.split('_re')
    update_list = sample_dict[reextract_sample[0]]
    update_list[0] = update_list[0] + intercept
    update_list[1] = update_list[1] + del_intercept
    sample_dict[reextract_sample[0]] = update_list
else:
    sample_dict[aliquot] = [intercept, del_intercept, aliquot_time, lb_ref_num, hb_ref_num, std_ref_num]

aliquot_frame

This last step organizes and reduces your data for export to a json file that we use to further reduce the data once U, Th, and Sm information has been collected. Here, the notebook reports out blank corrected, 4He volumes (in ncc) and amounts (in nmols) for each sample aliquot.  __You will only need to run this once at the end to export the data.__

In [None]:
#json dictionary that gets reported out
json_out_dict = {}

#for loop that steps through elements of sample_dict, converts to volume and amount, and adds to the json_out_dict
for index in sample_dict:
    std_ncc = initial_tank_4He * tank_depletion**std_ref_list[sample_dict[index][5]][2]
    sample_vol = ((sample_dict[index][0] - hb_ref_list[sample_dict[index][4]][0])/
                    (std_ref_list[sample_dict[index][5]][0]-hb_ref_list[sample_dict[index][4]][0])) * std_ncc
    sample_mol = sample_vol * 1e-12/ideal_gas_moles
    
    #calculate errors
    sample_vol_err = 0
    sample_mol_err = sample_vol * 1e-12/ideal_gas_moles
    
    #add to the json_dict
    json_out_dict[index] = [sample_vol, sample_vol_err, sample_mol, sample_mol_err]
    
#send the dict to a json file
    
    
    