MassQL is a powerful, SQL-like query language for mass spectrometry data. For further information visit the MassQL documentation.
MS data from a MSExperiment
can be exported to MS1 and MS2 dataframes, which can
be queried directly with the massql
module.
- pyopenms.MSExperiment.get_massql_df()
Exports data from MSExperiment to pandas DataFrames to be used with MassQL.
Both dataframes contain the columns: 'i': intensity of a peak 'i_norm': intensity normalized by the maximun intensity in the spectrum 'i_tic_norm': intensity normalized by the sum of intensities (TIC) in the spectrum 'mz': mass to charge of a peak 'scan': number of the spectrum 'rt': retention time of the spectrum 'polarity': ion mode of the spectrum as integer value (positive: 1, negative: 2)
The MS2 dataframe contains additional columns: 'precmz': mass to charge of the precursor ion 'ms1scan': number of the corresponding MS1 spectrum 'charge': charge of the precursor ion
Returns:
ms1_df : pandas.DataFrame
peak data of MS1 spectra
ms2_df : pandas.DataFrame
peak data of MS2 spectra with precursor information
Example:
Load an example file into a MSExperiment
and get the MS1 and MS2 data frames for a MassQL query.
from pyopenms import *
from massql import msql_engine
from urllib.request import urlretrieve
url = 'https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master/src/data/'
urlretrieve(url+'small.mzML', 'small.mzML')
# load MSExperiment
exp = MSExperiment()
MzMLFile().load('small.mzML', exp)
# get MS1 and MS2 dataframes
ms1_df, ms2_df = exp.get_massql_df()
ms1_df.head()
i | i_norm | i_tic_norm | mz | scan | rt | polarity | |
---|---|---|---|---|---|---|---|
0 | 2105.75 | 0.00455405 | 0.000325626 | 360.696 | 1 | 15.0015 | 1 |
1 | 1172.47 | 0.00253567 | 0.000181306 | 361.2 | 1 | 15.0015 | 1 |
2 | 2287.57 | 0.00494729 | 0.000353743 | 361.208 | 1 | 15.0015 | 1 |
3 | 1547.15 | 0.00334599 | 0.000239246 | 361.621 | 1 | 15.0015 | 1 |
4 | 1842.32 | 0.00398435 | 0.00028489 | 362.698 | 1 | 15.0015 | 1 |
Run a query on ms1_df
and ms2_df
. If you don't pass the data frames massql_engine.process_query
will read data from the given file name.
# Executing Query
results_df = msql_engine.process_query("QUERY scaninfo(MS1DATA) WHERE RTMIN=16", 'small.mzML', ms1_df=ms1_df, ms2_df=ms2_df)
results_df.head()
scan | rt | mslevel | i | i_norm | |
---|---|---|---|---|---|
0 | 139 | 16.001 | 1 | 6.77786e+06 | 1 |
1 | 140 | 16.0095 | 1 | 9.65984e+06 | 1 |
2 | 141 | 16.0185 | 1 | 7.0933e+06 | 1 |
3 | 143 | 16.0268 | 1 | 7.51255e+06 | 1 |
4 | 144 | 16.0354 | 1 | 1.01007e+07 | 1 |
In the resulting data frame each row represents a scan with the peak intensities summed up.