# Introduction

`icpmsflow` is a collection of routines to analyze icpms data.   

The main component of icpms flow is the `ICPMSAnalysis` class.  This class handles standard manipulations of icpms data.  Below, we show an abbreviated use case.


## main modules

In [1]:
import icpmsflow
# for finding paths to files
from pathlib import Path

## read in data

In [2]:
# In this directory, there are files '001SMPL-*.csv'
# get list of paths
paths = list(Path('.').glob('00*.csv'))
print(paths)

[PosixPath('001SMPL-4.csv'), PosixPath('001SMPL-5.csv'), PosixPath('001SMPL-7.csv'), PosixPath('001SMPL-6.csv'), PosixPath('001SMPL-2.csv'), PosixPath('001SMPL-3.csv'), PosixPath('001SMPL-1.csv'), PosixPath('001SMPL-10.csv'), PosixPath('001SMPL-11.csv'), PosixPath('001SMPL-13.csv'), PosixPath('001SMPL-12.csv'), PosixPath('001SMPL-8.csv'), PosixPath('001SMPL-9.csv')]


## Create ICPMSAnalaysis object from these paths

In [3]:
ds = icpmsflow.ICPMSAnalysis.from_paths(paths)

## working with data

There are two components of the `ICPMSAnalysis` class.  The icpms data, and the bounds_data, which contains the baseline and signal regions.  Right now, these regions are per batch.  However, this can be adjusted down the road.  
You can set the `bounds_data` manually, or set it with the `ds.add_bounds` method, shown below.  This method takes the derivative of the signal, and looks for the minimum/maximum peaks to find the jumps in the data.  Here we use the optional parameters

    * kernel_size : apply median filter with this kernel size.  Useful to smooth out noisy data
    * z_threshold : z score applied to extrema.  Only consider element min/max where scipy.stats.zscore is less than 
    This cutoff
    * shift : baseline = (tmin, max - shift[0]), signal = (max+shift[1], min - shift[2])
    
    
We also apply `snap_bounds_minmax` to ensure that the bounds are withing the time min/max of the data
    

In [4]:
# add in bounds_data
db = (
    ds
    .add_bounds(kernel_size=21, z_threshold=2, shift=(5, 10, 2))
    .snap_bounds_minmax()
)


# new functionality to work with indexes

In [5]:
bounds_data = db.bounds_data.reset_index()
bounds_data.head()

Unnamed: 0,batch,type_bound,lower_bound,upper_bound
0,1FEB21610CAL.b,baseline,2.9896,2.9896
1,1FEB21610CAL.b,signal,10.0,10.0
2,1FEB21610CAL2.b,baseline,2.9896,18.8469
3,1FEB21610CAL2.b,signal,33.8469,105.2727
4,1FEB21610CALEND.b,baseline,2.9885,30.7637


In [6]:
# Note that bounds_data does not have the required multiindex.  
# pass it in with the `set_bounds` method
ds2 = ds.set_bounds(bounds_data)
# this will auto check index and do what need to be done
ds2.bounds_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,lower_bound,upper_bound
batch,type_bound,Unnamed: 2_level_1,Unnamed: 3_level_1
1FEB21610CAL.b,baseline,2.9896,2.9896
1FEB21610CAL.b,signal,10.0,10.0
1FEB21610CAL2.b,baseline,2.9896,18.8469
1FEB21610CAL2.b,signal,33.8469,105.2727
1FEB21610CALEND.b,baseline,2.9885,30.7637


In [7]:
# to get a copy of the 'bounds_data' without multiindex, use get_bounds() method
ds2.get_bounds()

Unnamed: 0,batch,type_bound,lower_bound,upper_bound
0,1FEB21610CAL.b,baseline,2.9896,2.9896
1,1FEB21610CAL.b,signal,10.0,10.0
2,1FEB21610CAL2.b,baseline,2.9896,18.8469
3,1FEB21610CAL2.b,signal,33.8469,105.2727
4,1FEB21610CALEND.b,baseline,2.9885,30.7637
5,1FEB21610CALEND.b,signal,45.7637,96.3327
6,1FEB21612610CAL.b,baseline,2.9885,19.667638
7,1FEB21612610CAL.b,signal,34.667638,104.63315
8,1FEB21612CALEND.b,baseline,2.9885,19.564921
9,1FEB21612CALEND.b,signal,34.564921,86.33045


In [8]:
# to get integral at bounds, without multiindex, use reset_index flag
a = (
    ds2
    .interpolate_at_bounds(reset_index=True, as_delta=True)
)

In [24]:
# explicitly pass bounds
print(ds.bounds_data)
b = (
    ds
    .interpolate_at_bounds(bounds_data=bounds_data, reset_index=True, as_delta=True)
)

None


In [25]:
import pandas as pd

In [26]:
pd.testing.assert_frame_equal(a, b)