# Introduction

`icpmsflow` is a collection of routines to analyze icpms data.   

The main component of icpms flow is the `ICPMSAnalysis` class.  This class handles standard manipulations of icpms data.  Below, we show an abbreviated use case.


## main modules

In [15]:
import icpmsflow
# for finding paths to files
from pathlib import Path

## read in data

In [16]:
# In this directory, there are files '001SMPL-*.csv'
# get list of paths
paths = list(Path('.').glob('00*.csv'))
print(paths)

[PosixPath('001SMPL-4.csv'), PosixPath('001SMPL-5.csv'), PosixPath('001SMPL-7.csv'), PosixPath('001SMPL-6.csv'), PosixPath('001SMPL-2.csv'), PosixPath('001SMPL-3.csv'), PosixPath('001SMPL-1.csv'), PosixPath('001SMPL-10.csv'), PosixPath('001SMPL-11.csv'), PosixPath('001SMPL-13.csv'), PosixPath('001SMPL-12.csv'), PosixPath('001SMPL-8.csv'), PosixPath('001SMPL-9.csv')]


## Create ICPMSAnalaysis object from these paths

In [17]:
ds = icpmsflow.ICPMSAnalysis.from_paths(paths)

## working with data

There are two components of the `ICPMSAnalysis` class.  The icpms data, and the bounds_data, which contains the baseline and signal regions.  Right now, these regions are per batch.  However, this can be adjusted down the road.  
You can set the `bounds_data` manually, or set it with the `ds.add_bounds` method, shown below.  This method takes the derivative of the signal, and looks for the minimum/maximum peaks to find the jumps in the data.  Here we use the optional parameters

    * kernel_size : apply median filter with this kernel size.  Useful to smooth out noisy data
    * z_threshold : z score applied to extrema.  Only consider element min/max where scipy.stats.zscore is less than 
    This cutoff
    * shift : baseline = (tmin, max - shift[0]), signal = (max+shift[1], min - shift[2])
    
    
We also apply `snap_bounds_minmax` to ensure that the bounds are withing the time min/max of the data
    

In [19]:
# add in bounds_data
ds = (
    ds
    .add_bounds(kernel_size=21, z_threshold=2, shift=(5, 10, 2))
    .snap_bounds_minmax()
)


# new functionality to work with indexes

In [22]:
bounds_data = ds.bounds_data.reset_index()
bounds_data.head()

Unnamed: 0,batch,type_bound,lower_bound,upper_bound
0,1FEB21610CAL.b,baseline,2.9896,2.9896
1,1FEB21610CAL.b,signal,10.0,10.0
2,1FEB21610CAL2.b,baseline,2.9896,18.8469
3,1FEB21610CAL2.b,signal,33.8469,105.2727
4,1FEB21610CALEND.b,baseline,2.9885,30.7637


In [25]:
# Note that bounds_data does not have the required multiindex.  
# pass it in with the `set_bounds` method
ds2 = ds.set_bounds(bounds_data)
# this will auto check index and do what need to be done
ds2.bounds_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,lower_bound,upper_bound
batch,type_bound,Unnamed: 2_level_1,Unnamed: 3_level_1
1FEB21610CAL.b,baseline,2.9896,2.9896
1FEB21610CAL.b,signal,10.0,10.0
1FEB21610CAL2.b,baseline,2.9896,18.8469
1FEB21610CAL2.b,signal,33.8469,105.2727
1FEB21610CALEND.b,baseline,2.9885,30.7637


In [26]:
# to get a copy of the 'bounds_data' without multiindex, use get_bounds() method
ds2.get_bounds()

Unnamed: 0,batch,type_bound,lower_bound,upper_bound
0,1FEB21610CAL.b,baseline,2.9896,2.9896
1,1FEB21610CAL.b,signal,10.0,10.0
2,1FEB21610CAL2.b,baseline,2.9896,18.8469
3,1FEB21610CAL2.b,signal,33.8469,105.2727
4,1FEB21610CALEND.b,baseline,2.9885,30.7637
5,1FEB21610CALEND.b,signal,45.7637,96.3327
6,1FEB21612610CAL.b,baseline,2.9885,19.667638
7,1FEB21612610CAL.b,signal,34.667638,104.63315
8,1FEB21612CALEND.b,baseline,2.9885,19.564921
9,1FEB21612CALEND.b,signal,34.564921,86.33045


In [29]:
# to get integral at bounds, without multiindex, use reset_index flag
(
    ds
    .interpolate_at_bounds(reset_index=True, as_delta=True)
)

Unnamed: 0,batch,type_bound,Time [Sec],Li7 -> 7,Mg24 -> 24,Mg25 -> 25,Ca43 -> 43,Ca44 -> 44,Sc45 -> 45,Mn55 -> 55,...,Sb121 -> 121,Sb123 -> 123,Ba137 -> 137,Ba138 -> 138,Tl205 -> 205,Pb206 -> 206,Pb207 -> 207,Pb208 -> 208,Th232 -> 232,U238 -> 238
0,1FEB21610CAL.b,baseline,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1FEB21610CAL.b,signal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1FEB21610CAL2.b,baseline,15.8573,0.0,669.893,119.182,24.491,8961.564,605.508,3898.912,...,173.482,84.0825,9.592,515.6125,59.592,629.996,218.1645,431.013,0.0,0.0
3,1FEB21610CAL2.b,signal,71.4258,769394.680961,14888590.0,2209025.0,10764900.0,165454900.0,71151850.0,105546500.0,...,77816120.0,60253260.0,32970870.0,221652200.0,23801210.0,62605980.0,58039940.0,149128300.0,303133600.0,296228500.0
4,1FEB21610CALEND.b,baseline,27.7752,0.0,1795.79,396.922,113.8815,16139.3,2248.044,47556.66,...,575.6945,515.6335,739.0895,2162.949,3322.425,1059742.0,940833.2,2259676.0,239443.0,448570.3
5,1FEB21610CALEND.b,signal,50.569,743723.116247,15019170.0,2211924.0,10917490.0,175219000.0,70466170.0,118549700.0,...,76691500.0,58470770.0,33434490.0,228309200.0,24878220.0,65995190.0,60804100.0,155965800.0,288029600.0,288806300.0
6,1FEB21612610CAL.b,baseline,16.679138,0.0,1283.297,131.0006,142.9827,12301.61,884.3196,4214.157,...,634.4506,414.0541,863.7524,4431.393,3496.643,1205854.0,1071326.0,2682700.0,29.804,44.6955
7,1FEB21612610CAL.b,signal,69.965512,135919.873915,4245141.0,598976.9,21209750.0,338510400.0,12479000.0,21644960.0,...,17027370.0,13019780.0,6303688.0,41043060.0,11777020.0,18997330.0,17062130.0,40403090.0,41300320.0,50205900.0
8,1FEB21612CALEND.b,baseline,16.576421,0.0,760.0012,203.9483,29.7955,11647.39,1352.137,4922.838,...,174.1538,134.0935,88.53572,786.0107,1287.925,384291.6,352882.6,837548.8,29.804,0.0
9,1FEB21612CALEND.b,signal,51.765529,96710.973717,3074232.0,414468.8,15582580.0,250541700.0,9210285.0,15531570.0,...,11284680.0,8711366.0,4274053.0,28144730.0,9898319.0,10511720.0,9573015.0,22911710.0,28487730.0,32905750.0
