# Quick Start

In this vignette we will demonstrate how to use `lamp` python package. The 
input data and reference files are located in 
https://github.com/wanchanglin/lamp/tree/master/examples/data.

## Setup

To use `lamp`, the first step is to import some python libraries including 
`lamp`.

In [1]:
import pandas as pd
from lamp import anno, stats, utils

## Data loading

`lamp` supports text files separated by comma (`,`) or tab (`\t`).
The Microsoft's XLSX is also supported, presuming that data are in the 
first sheet.

Here we use a small example data set with TSV format. Load it into python and
have a look of data format:


In [2]:
# data set
d_data = "./data/df_pos_2.tsv"
data = pd.read_table(d_data, header=0, sep="\t")
data

Unnamed: 0,name,namecustom,mz,mzmin,mzmax,rt,rtmin,rtmax,npeaks,.,...,X210,X209,X208,X207,X206,X205,X204,X203,X202,X201
0,M151T34,M150.8867T34,150.886715,150.886592,150.886863,34.152700,33.637595,35.465548,97,97,...,4.224942e+06,3.946599e+06,3.668948e+06,3.754321e+06,3.853724e+06,3.787350e+06,3.584464e+06,3.499711e+06,3.623205e+06,4.145770e+06
1,M151T40,M151.0402T40,151.040235,151.040092,151.040350,39.838172,37.556072,40.532315,95,95,...,1.419062e+06,1.251606e+06,1.214826e+06,8.143028e+05,5.331963e+05,1.930928e+06,1.479001e+06,1.076354e+06,9.293218e+05,5.298062e+05
2,M152T40,M152.0436T40,152.043607,152.043451,152.043737,40.303700,38.092678,40.909428,81,81,...,1.203919e+05,9.970442e+04,9.384000e+04,4.186335e+04,,2.115447e+05,1.285713e+05,9.389346e+04,7.163655e+04,4.916483e+04
3,M153T34,M152.8838T34,152.883824,152.883678,152.883959,34.174647,33.637595,35.465548,98,98,...,5.592065e+06,5.761380e+06,5.845419e+06,5.576013e+06,5.552878e+06,6.132789e+06,5.891378e+06,5.418082e+06,5.036840e+06,5.733794e+06
4,M153T36,M153.0195T36,153.019474,153.019331,153.019633,35.785847,34.130244,36.287354,98,98,...,7.284938e+06,1.083289e+07,1.140072e+07,8.220552e+06,9.255154e+06,7.648211e+06,7.723814e+06,5.571163e+06,5.362560e+06,9.259675e+06
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
395,M283T339,M283.2646T339,283.264583,283.264341,283.264809,338.763489,338.398380,339.165948,94,94,...,3.509767e+05,4.117633e+05,3.948000e+05,4.338804e+05,5.335221e+05,6.224684e+05,7.009340e+05,3.005173e+05,3.133173e+05,8.204783e+05
396,M284T60,M284.1953T60,284.195294,284.194939,284.195536,59.593561,58.844217,60.107058,59,59,...,,,,,,2.558004e+04,4.020517e+04,,3.162670e+04,5.446684e+04
397,M284T108,M284.2235T108,284.223499,284.223156,284.223692,108.406389,107.880510,108.971046,72,72,...,7.477652e+04,7.482219e+04,3.399667e+04,7.233564e+04,1.043879e+05,2.506785e+04,2.753769e+04,,,
398,M284T339,M284.268T339,284.267962,284.267634,284.268204,338.725056,338.268300,339.370098,84,84,...,3.697604e+04,5.398264e+04,5.340109e+04,6.557698e+04,7.656575e+04,1.040606e+05,1.063727e+05,,3.059370e+04,1.358056e+05


This data set includes peak list and intensity data matrix. `lamp` will use
peak list's name, m/z value and retention time. Hence you needs to 
indicates the locations of peak name, m/z value, retention time and starting 
points of data matrix from input data. Here they are 1, 3, 6 and 11,
respectively. 

In [3]:
cols = [1, 3, 6, 11]
# get the input data set for `lamp` 
df = anno.read_peak(d_data, cols, sep='\t')
df

Unnamed: 0,name,mz,rt,QC9,QC5,QC4,QC3,QC26,QC25,QC24,...,X210,X209,X208,X207,X206,X205,X204,X203,X202,X201
0,M151T34,150.886715,34.152700,3.664879e+06,3.735147e+06,5.190263e+06,2.742966e+06,3.824723e+06,3.722932e+06,3.804188e+06,...,4.224942e+06,3.946599e+06,3.668948e+06,3.754321e+06,3.853724e+06,3.787350e+06,3.584464e+06,3.499711e+06,3.623205e+06,4.145770e+06
1,M151T40,151.040235,39.838172,7.406381e+05,7.524075e+05,,6.429245e+05,1.167016e+06,1.175981e+06,1.122533e+06,...,1.419062e+06,1.251606e+06,1.214826e+06,8.143028e+05,5.331963e+05,1.930928e+06,1.479001e+06,1.076354e+06,9.293218e+05,5.298062e+05
2,M152T40,152.043607,40.303700,6.105241e+04,5.335546e+04,,,6.875157e+04,7.807399e+04,8.943068e+04,...,1.203919e+05,9.970442e+04,9.384000e+04,4.186335e+04,,2.115447e+05,1.285713e+05,9.389346e+04,7.163655e+04,4.916483e+04
3,M153T34,152.883824,34.174647,5.141479e+06,5.496344e+06,8.335846e+06,3.860588e+06,5.316874e+06,5.988232e+06,5.844917e+06,...,5.592065e+06,5.761380e+06,5.845419e+06,5.576013e+06,5.552878e+06,6.132789e+06,5.891378e+06,5.418082e+06,5.036840e+06,5.733794e+06
4,M153T36,153.019474,35.785847,5.336758e+06,5.558265e+06,1.118557e+07,6.876715e+06,9.967314e+06,9.073822e+06,9.328573e+06,...,7.284938e+06,1.083289e+07,1.140072e+07,8.220552e+06,9.255154e+06,7.648211e+06,7.723814e+06,5.571163e+06,5.362560e+06,9.259675e+06
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
395,M283T339,283.264583,338.763489,7.330602e+05,8.243956e+05,,1.159506e+06,4.294760e+05,4.641813e+05,4.570657e+05,...,3.509767e+05,4.117633e+05,3.948000e+05,4.338804e+05,5.335221e+05,6.224684e+05,7.009340e+05,3.005173e+05,3.133173e+05,8.204783e+05
396,M284T60,284.195294,59.593561,2.310932e+04,,,,1.759336e+04,2.645392e+04,2.727266e+04,...,,,,,,2.558004e+04,4.020517e+04,,3.162670e+04,5.446684e+04
397,M284T108,284.223499,108.406389,3.748444e+04,2.993283e+04,,,3.175596e+04,3.879604e+04,4.299529e+04,...,7.477652e+04,7.482219e+04,3.399667e+04,7.233564e+04,1.043879e+05,2.506785e+04,2.753769e+04,,,
398,M284T339,284.267962,338.725056,1.161886e+05,1.476514e+05,,,,6.753490e+04,5.436219e+04,...,3.697604e+04,5.398264e+04,5.340109e+04,6.557698e+04,7.656575e+04,1.040606e+05,1.063727e+05,,3.059370e+04,1.358056e+05


Data frame `df` now includes only `name`, `mz`, `rt` and intensity data
matrix. 

## Metabolite annotation

To performance metabolite annotation, users should provide their own 
reference file. Otherwise, `lamp` will use its default reference file for 
annotation.

In [4]:
ref_path = ""    # if empty, use default reference file for matching

# load reference library
cal_mass = False
ref = anno.read_ref(ref_path, calc=cal_mass)
ref

Unnamed: 0,compound_id,molecular_formula,compound_name,exact_mass
0,1638,C10Cl10O,Chlordecone,485.683441
1,38485,C10H10Br2O2,Dibromothymoquinone,319.904755
2,32427,C10H10BrNO2,Brofoxine (USAN/INN),254.989491
3,39834,C10H10Cl2N2O,Fenmetozole (USAN),244.017018
4,10156,C10H10Cl2O3,"4-(2,4-Dichlorophenoxy)butyric acid",248.000700
...,...,...,...,...
31639,80256,H5O10P3,PPPi,257.909557
31640,37374,H6NO9P3,(Diphosphono)Aminophosphonic Acid,256.925542
31641,32626,H9N2O4P,Ammonium phosphate (NF),132.029994
31642,735,HNO3,Nitrate,62.995643


The reference file must have two columns: `molecular_formula` and
`compound_name` (or `name`). The `exact_mass` is optional. if absent, `lamp`
will calculates it based on NIST database. If your reference file has
`exact_mass` and want to calculate it using NIST database, set `calc` as
True.  The `exact_mass` is used to match against a range of `mz`, controlled
by `ppm` in data frame `df`.

Now we have a look another reference file:

In [5]:
ref_path = "./data/hmdb_urine_v4_0_20200910_v1.tsv"

# load reference library
cal_mass = True    # there is no exact mass in reference file, so calculate
ref = anno.read_ref(ref_path, calc=cal_mass)
ref

Unnamed: 0,id,molecular_formula,molecular_name,inchi,inchi_key,exact_mass
0,HMDB0000001,C7H11N3O2,1-Methylhistidine,InChI=1S/C7H11N3O2/c1-10-3-5(9-4-10)2-6(8)7(11...,BRMWTNUJHUMWMS-LURJTMIESA-N,169.085127
1,HMDB0000002,C3H10N2,"1,3-Diaminopropane",InChI=1S/C3H10N2/c4-2-1-3-5/h1-5H2,XFNJVJPLKCPIBV-UHFFFAOYSA-N,74.084398
2,HMDB0000005,C4H6O3,2-Ketobutyric acid,"InChI=1S/C4H6O3/c1-2-3(5)4(6)7/h2H2,1H3,(H,6,7)",TYEYBOSBBBHJIV-UHFFFAOYSA-N,102.031694
3,HMDB0000008,C4H8O3,2-Hydroxybutyric acid,"InChI=1S/C4H8O3/c1-2-3(5)4(6)7/h3,5H,2H2,1H3,(...",AFENDNXGAFYKQO-VKHMYHEASA-N,104.047344
4,HMDB0000010,C19H24O3,2-Methoxyestrone,InChI=1S/C19H24O3/c1-19-8-7-12-13(15(19)5-6-18...,WHEUWNKSCXYKBU-QPWUGHHJSA-N,300.172545
...,...,...,...,...,...,...
1606,HMDB0012308,C8H8O3,Vanillin,InChI=1S/C8H8O3/c1-11-8-4-6(5-9)2-3-7(8)10/h2-...,MWOOGOJBHIARFG-UHFFFAOYSA-N,152.047344
1607,HMDB0012322,C10H8O,2-Naphthol,InChI=1S/C10H8O/c11-10-6-5-8-3-1-2-4-9(8)7-10/...,JWAZRIHNYRIHIV-UHFFFAOYSA-N,144.057515
1608,HMDB0012325,C5H10O5,Arabinofuranose,InChI=1S/C5H10O5/c6-1-2-3(7)4(8)5(9)10-2/h2-9H...,HMFHBZSHGGEWLO-HWQSCIPKSA-N,150.052823
1609,HMDB0012451,C20H28O3,"all-trans-5,6-Epoxyretinoic acid",InChI=1S/C20H28O3/c1-15(8-6-9-16(2)14-17(21)22...,KEEHJLBAOLGBJZ-WEDZBJJJSA-N,316.203845


Next we use HMDB reference file for compounds match. Here function argument
`ppm` is used to control the m/z value matching tolerance or range.

In [6]:
ppm = 5.0
match = anno.comp_match_mass(df, ppm, ref)
match

Unnamed: 0,id,mz,molecular_formula,molecular_name,inchi,inchi_key,exact_mass,ppm_error
0,M154T37,154.062402,C8H10O3,Hydroxytyrosol,InChI=1S/C8H10O3/c9-4-3-6-1-2-7(10)8(11)5-6/h1...,JUUBCHWRXWPFFH-UHFFFAOYSA-N,154.06,-3.84
1,M164T119,164.046774,C9H8O3,Phenylpyruvic acid,InChI=1S/C9H8O3/c10-8(9(11)12)6-7-4-2-1-3-5-7/...,BTNMPGBKDVTSJY-UHFFFAOYSA-N,164.05,-3.47
2,M164T119,164.046774,C9H8O3,m-Coumaric acid,InChI=1S/C9H8O3/c10-8-3-1-2-7(6-8)4-5-9(11)12/...,KKSDGJDHHZEWEP-SNAWJCMRSA-N,164.05,-3.47
3,M164T119,164.046774,C9H8O3,4-Hydroxycinnamic acid,InChI=1S/C9H8O3/c10-8-4-1-7(2-5-8)3-6-9(11)12/...,NGSWKAQJJWESNS-ZZXKWVIFSA-N,164.05,-3.47
4,M164T119,164.046774,C9H8O3,2-Hydroxycinnamic acid,InChI=1S/C9H8O3/c10-8-4-2-1-3-7(8)5-6-9(11)12/...,PMOWTIHVNWZYFI-AATRIKPKSA-N,164.05,-3.47
5,M164T233,164.046832,C9H8O3,Phenylpyruvic acid,InChI=1S/C9H8O3/c10-8(9(11)12)6-7-4-2-1-3-5-7/...,BTNMPGBKDVTSJY-UHFFFAOYSA-N,164.05,-3.12
6,M164T233,164.046832,C9H8O3,m-Coumaric acid,InChI=1S/C9H8O3/c10-8-3-1-2-7(6-8)4-5-9(11)12/...,KKSDGJDHHZEWEP-SNAWJCMRSA-N,164.05,-3.12
7,M164T233,164.046832,C9H8O3,4-Hydroxycinnamic acid,InChI=1S/C9H8O3/c10-8-4-1-7(2-5-8)3-6-9(11)12/...,NGSWKAQJJWESNS-ZZXKWVIFSA-N,164.05,-3.12
8,M164T233,164.046832,C9H8O3,2-Hydroxycinnamic acid,InChI=1S/C9H8O3/c10-8-4-2-1-3-7(8)5-6-9(11)12/...,PMOWTIHVNWZYFI-AATRIKPKSA-N,164.05,-3.12
9,M164T53,164.046825,C9H8O3,Phenylpyruvic acid,InChI=1S/C9H8O3/c10-8(9(11)12)6-7-4-2-1-3-5-7/...,BTNMPGBKDVTSJY-UHFFFAOYSA-N,164.05,-3.16


`match` gives the compound matching results. `lamp` also provides a mass
adjust option by adduct library. You can provide your own adducts library
otherwise `lamp` uses its default adducts library. 

The adducts library looks like:

In [7]:
add_path = './data/adducts_short.tsv'
lib_df = pd.read_csv(add_path, sep="\t")
lib_df

Unnamed: 0,label,exact_mass,charge,ion_mode
0,[M+H]+,1.007276,1,pos
1,[M+NH4]+,18.033826,1,pos
2,[M+Na]+,22.989221,1,pos
3,[M+Mg]+,23.984493,1,pos
4,[M+K]+,38.963158,1,pos
5,[M+Fe]+,55.934388,1,pos
6,[M+Cu]+,62.929049,1,pos
7,[M+2H]+,2.015101,1,pos
8,[M+3H]+,3.022926,1,pos
9,[M-H]-,-1.007276,1,neg


We use this addcuts file to adjust mass:

In [8]:
ion_mode = "pos"
# if empty, use default adducts library
add_path = "./data/adducts_short.tsv"

lib_add = anno.read_lib(add_path, ion_mode)
lib_add

Unnamed: 0,label,exact_mass,charge
0,[M+H]+,1.007276,1
1,[M+NH4]+,18.033826,1
2,[M+Na]+,22.989221,1
3,[M+Mg]+,23.984493,1
4,[M+K]+,38.963158,1
5,[M+Fe]+,55.934388,1
6,[M+Cu]+,62.929049,1
7,[M+2H]+,2.015101,1
8,[M+3H]+,3.022926,1


Now use this function to match compounds:

In [9]:
match_1 = anno.comp_match_mass_add(df, ppm, ref, lib_add)
match_1

Unnamed: 0,id,mz,molecular_formula,molecular_name,inchi,inchi_key,exact_mass,adduct,ppm_error
0,M152T40,152.043607,C5H8N2O2,Dihydrothymine,"InChI=1S/C5H8N2O2/c1-3-2-6-5(9)7-4(3)8/h3H,2H2...",NBAKTGXDIBVZOO-VKHMYHEASA-N,152.04,[M+Mg]+,3.52
1,M154T37,154.062402,C8H8O3,p-Hydroxyphenylacetic acid,InChI=1S/C8H8O3/c9-7-3-1-6(2-4-7)5-8(10)11/h1-...,XQXPVVBIMDBYFF-UHFFFAOYSA-N,154.06,[M+2H]+,-0.28
2,M154T37,154.062402,C8H8O3,3-Hydroxyphenylacetic acid,InChI=1S/C8H8O3/c9-7-3-1-2-6(4-7)5-8(10)11/h1-...,FVMDYYGIDFPZAX-UHFFFAOYSA-N,154.06,[M+2H]+,-0.28
3,M154T37,154.062402,C8H8O3,ortho-Hydroxyphenylacetic acid,InChI=1S/C8H8O3/c9-7-4-2-1-3-6(7)5-8(10)11/h1-...,CCVYRRGZDBSHFU-UHFFFAOYSA-N,154.06,[M+2H]+,-0.28
4,M154T37,154.062402,C8H8O3,Mandelic acid,InChI=1S/C8H8O3/c9-7(8(10)11)6-4-2-1-3-5-6/h1-...,IWYDHOAUDWTVEP-ZETCQYMHSA-N,154.06,[M+2H]+,-0.28
5,M154T37,154.062402,C8H8O3,3-Cresotinic acid,InChI=1S/C8H8O3/c1-5-3-2-4-6(7(5)9)8(10)11/h2-...,WHSXTWFYRGOBGO-UHFFFAOYSA-N,154.06,[M+2H]+,-0.28
6,M154T37,154.062402,C8H8O3,4-Hydroxy-3-methylbenzoic acid,InChI=1S/C8H8O3/c1-5-4-6(8(10)11)2-3-7(5)9/h2-...,LTFHNKUKQYVHDX-UHFFFAOYSA-N,154.06,[M+2H]+,-0.28
7,M154T37,154.062402,C8H8O3,Vanillin,InChI=1S/C8H8O3/c1-11-8-4-6(5-9)2-3-7(8)10/h2-...,MWOOGOJBHIARFG-UHFFFAOYSA-N,154.06,[M+2H]+,-0.28
8,M157T35,157.036819,C4H10N2O2,"2,4-Diaminobutyric acid","InChI=1S/C4H10N2O2/c5-2-1-3(6)4(7)8/h3H,1-2,5-...",OGNSCSPNOLGXSM-UHFFFAOYSA-N,157.04,[M+K]+,-3.61
9,M157T35,157.036819,C4H10N2O2,"L-2,4-diaminobutyric acid","InChI=1S/C4H10N2O2/c5-2-1-3(6)4(7)8/h3H,1-2,5-...",OGNSCSPNOLGXSM-VKHMYHEASA-N,157.04,[M+K]+,-3.61


## Correlation analysis

Next step is correlation analysis, based on intensity data matrix along all
peaks. All results are filtered by the correlation coefficient, p-values
and retention time difference. That is: keep correlation results in an
retention time differences/windows(such as 1 seconds) with correlation
coefficient larger than a threshold(such as 0.5) and their correlation
p-values less than a threshold (such as 0.05).

`lamp` uses one of correlation methods, either `pearson` or `spearman`. Also
parameter `positive` allows user to select only positive correlation results.

Two functions, `_tic` and `_toc`, record the correlation computation time in
seconds. 

In [10]:
thres_rt = 1.0
thres_corr = 0.5
thres_pval = 0.05
method = "spearman"   # "pearson"
positive = True

utils._tic()
corr = stats.comp_corr_rt(df, thres_rt, thres_corr, thres_pval, method,
                          positive)
utils._toc()
corr

Elapsed time: 4.283773899078369 seconds.


Unnamed: 0,name_a,name_b,r_value,p_value,rt_diff
0,M151T34,M153T34,0.80,1.267076e-23,0.02
1,M151T34,M155T34,0.71,1.752854e-16,0.20
2,M151T34,M161T34,0.78,1.869949e-21,0.14
3,M151T34,M163T34,0.69,3.239594e-15,0.20
4,M151T34,M167T35,0.51,5.776482e-08,0.73
...,...,...,...,...,...
1783,M283T34_1,M283T34_2,0.62,4.214876e-12,0.29
1784,M283T34_1,M285T34,0.82,5.937139e-26,0.08
1785,M283T34_2,M285T34,0.66,7.898957e-14,0.37
1786,M283T60,M284T60,0.86,1.033010e-29,0.15


Based on the correlation analysis, we can extract the groups and their size by:

In [11]:
# get correlation group and size
corr_df = stats.corr_grp_size(corr)
corr_df

Unnamed: 0,name,cor_grp_size,cor_grp
0,M219T35,52,M221T34::M223T34::M225T35::M226T35::M229T34::M...
1,M216T35,52,M217T35::M218T35::M219T34::M219T35::M221T34::M...
2,M217T35,52,M218T35::M219T34::M219T35::M221T34::M223T34::M...
3,M215T35,52,M216T35::M217T35::M218T35::M219T34::M219T35::M...
4,M218T35,51,M219T34::M219T35::M221T34::M223T34::M225T35::M...
...,...,...,...
335,M173T119,1,M171T119
336,M277T71,1,M278T71
337,M259T233,1,M191T233
338,M284T60,1,M283T60


## Summarize results

The final step gets the summary table in different format and save for the 
further analysis.

In [12]:
# get summary of metabolite annotation
sr, mr = anno.comp_summ(df, match)

This function combines peak table with compound matching results and returns 
two results in different formats. `sr` is single row results for each peak id
in peak table `df`:

In [13]:
sr

Unnamed: 0,name,mz,rt,exact_mass,ppm_error,molecular_formula,molecular_name,inchi,inchi_key
0,M151T34,150.886715,34.152700,,,,,,
1,M151T40,151.040235,39.838172,,,,,,
2,M152T40,152.043607,40.303700,,,,,,
3,M153T34,152.883824,34.174647,,,,,,
4,M153T36,153.019474,35.785847,,,,,,
...,...,...,...,...,...,...,...,...,...
395,M283T61,283.068474,60.739869,,,,,,
396,M284T108,284.223499,108.406389,,,,,,
397,M284T339,284.267962,338.725056,,,,,,
398,M284T60,284.195294,59.593561,,,,,,


`mr` is multiple rows format if the match more than once from the reference
file:

In [14]:
mr

Unnamed: 0,name,mz,rt,molecular_formula,molecular_name,inchi,inchi_key,exact_mass,ppm_error
0,M151T34,150.886715,34.152700,,,,,,
1,M151T40,151.040235,39.838172,,,,,,
2,M152T40,152.043607,40.303700,,,,,,
3,M153T34,152.883824,34.174647,,,,,,
4,M153T36,153.019474,35.785847,,,,,,
...,...,...,...,...,...,...,...,...,...
404,M283T61,283.068474,60.739869,,,,,,
405,M284T108,284.223499,108.406389,,,,,,
406,M284T339,284.267962,338.725056,,,,,,
407,M284T60,284.195294,59.593561,,,,,,



Now we merges single format results with correlation results:

In [15]:
# merge summery table with correlation analysis
res = anno.comp_summ_corr(sr, corr_df)
res

Unnamed: 0,name,mz,rt,exact_mass,ppm_error,molecular_formula,molecular_name,inchi,inchi_key,cor_grp_size,cor_grp
0,M167T35,167.021095,34.882147,167.02,-4.57,C7H5NO4,Quinolinic acid,InChI=1S/C7H5NO4/c9-6(10)4-2-1-3-8-5(4)7(11)12...,GJAWHXHKYYXBSV-UHFFFAOYSA-N,25.0,M171T34::M197T36::M209T34::M211T34::M213T34::M...
1,M276T36,276.077397,36.385373,276.08,-2.16,C10H16N2O5S,Biotin sulfone,InChI=1S/C10H16N2O5S/c13-8(14)4-2-1-3-7-9-6(5-...,QPFQYMONYBAUCY-ZKWXMUAHSA-N,13.0,M277T36_2::M278T36::M173T36_2::M186T36::M187T3...
2,M154T37,154.062402,37.183625,154.06,-3.84,C8H10O3,Hydroxytyrosol,InChI=1S/C8H10O3/c9-4-3-6-1-2-7(10)8(11)5-6/h1...,JUUBCHWRXWPFFH-UHFFFAOYSA-N,12.0,M155T38::M158T37_2::M164T36::M171T37_2::M173T3...
3,M174T35,174.088395,35.001130,174.09,-4.67,C8H14O4,Suberic acid,InChI=1S/C8H14O4/c9-7(10)5-3-1-2-4-6-8(11)12/h...,TYFQFVWCELRYAO-UHFFFAOYSA-N,9.0,M211T34::M213T34::M219T34::M221T34::M229T35::M...
4,M181T36,181.060407,35.734801,181.06,2.39,C6H7N5O2,8-Hydroxy-7-methylguanine,InChI=1S/C6H7N5O2/c1-11-2-3(9-6(11)13)8-5(7)10...,VHPXSVXJBWZORQ-UHFFFAOYSA-N,9.0,M224T36::M225T35::M226T35::M227T36::M269T37_2:...
...,...,...,...,...,...,...,...,...,...,...,...
395,M279T50,279.159930,50.055451,,,,,,,,
396,M279T79,279.163910,78.758079,,,,,,,,
397,M282T85,282.207859,84.719202,,,,,,,,
398,M283T47,283.110871,46.822069,,,,,,,,


The result data frame `res` is re-arranged as four parts from top to bottom:
 
 - 1st part: identified metabolites, satisfied with correlation analysis
 - 2nd part: identified metabolites, not satisfied with correlation
 - 3rd part: no identified metabolites, satisfied with correlation
 - 4th part: no identified metabolites, not satisfied with correlation

The users should focus on the first part and perform their further analysis. 

You can save all results in different forms, such as text format TSV or CSV.
You can also save all results `sqlite3` database and use 
[DB Browser for SQLite](https://sqlitebrowser.org/) to view: 

In [16]:
import sqlite3

f_save = False   # here we do NOT save results
db_out = "test.db"
sr_out = "test_s.tsv"

if f_save:
    # save all results into a sqlite3 database
    conn = sqlite3.connect(db_out)
    df[["name", "mz", "rt"]].to_sql("peaklist", conn,
                                    if_exists="replace", index=False)
    corr_df.to_sql("corr_grp", conn, if_exists="replace", index=False)
    corr.to_sql("corr_pval_rt", conn, if_exists="replace", index=False)
    match.to_sql("match", conn, if_exists="replace", index=False)
    mr.to_sql("anno_mr", conn, if_exists="replace", index=False)
    res.to_sql("anno_sr", conn, if_exists="replace", index=False)

    conn.commit()
    conn.close()

    # save final results
    res.to_csv(sr_out, sep="\t", index=False)


## End user usages

For end users, `lamp` has two options: command line interface or graphical
user interface.

To use GUI,  open a terminal and type in:

```bash
$ lamp gui
```

To use CLI, open a terminal and type in something like:

```bash
$ lamp cli \
  --sep "tab" \
  --input-data "./data/df_pos_3.tsv" \
  --col-idx "1, 2, 3, 4" \
  --add-path "" \
  --ref-path "" \
  --ion-mode "pos" \
  --cal-mass \
  --thres-rt "1.0" \
  --thres-corr "0.5" \
  --thres-pval "0.05" \
  --method "pearson" \
  --positive \
  --ppm "5.0" \
  --save-db \
  --save-mr \
  --db-out "./res/test.db" \
  --sr-out "./res/test_s.tsv" \
  --mr-out "./res/test_m.tsv"
```

Or you can create a bash script `lamp_cli.sh` (Linux and MacOS) or
Windows script `lamp_cli.bat`  to contain these CLI arguments and run:

- For Linux and MacOS terminal:

  ```bash
  $ chmod +x lamp_cli.sh   
  $ ./lamp_cli.sh
  ```

- For Windows terminal:

  ```bash
  $ lamp_cli.bat
  ```

