## dda-PASEF LFQ with Ion Mobility

SAGE now supports a new LFQ mode that incorporates the ion mobility dimension of timsTOF data. This notebook demonstrates the full LFQ workflow, using the imspy package to handle raw data extraction and preprocessing. The LFQ logic is now functional and integrated with SAGE’s internal components for retention time and ion mobility modeling. The complete workflow consists of the following steps:

* Reading raw data from timsTOF .d folders using imspy.
* Extracting MS1 and MS2 spectra from precursor and fragment frames.
* Performing an initial database search with SAGE.
* Calculating q-values to filter confident peptide-spectrum matches (PSMs).
* Fitting and predicting retention time and ion mobility using SAGE’s built-in modeling tools.
* Aligning retention time across runs for cross-sample comparison.
* Creating feature maps for label-free quantification (LFQ) based on aligned peptide features.
* Running LFQ with SAGE internals, leveraging predicted RT and mobility values.
* Exporting results to a Pandas DataFrame for downstream statistical analysis or visualization.

## Create A SAGE database

In [1]:
import numpy as np
import pandas as pd

from sagepy.utility import create_sage_database

# create an in-mem database for scoring
indexed_db = create_sage_database(
    fasta_path='/media/hd02/data/fasta/hela/plain/hela.fasta'
)

## Create a Scorer

In [4]:
from sagepy.core import Scorer, Tolerance

# static modifications to apply to all amino acids
static_mods = {
    "C": "[UNIMOD:4]"
}

# variable modifications to consider
variable_mods = {
    "M": ["[UNIMOD:1]", "[UNIMOD:35]"], 
    "[": ["[UNIMOD:1]"]
}

# create a scorer object that can be used to search a database given a collection of spectra to search
scorer = Scorer(
    precursor_tolerance=Tolerance(ppm=(-15.0, 15.0)),
    fragment_tolerance=Tolerance(ppm=(-10.0, 10.0)),
    report_psms=5,
    min_matched_peaks=5,
    annotate_matches=True,
    variable_mods=variable_mods,
    static_mods=static_mods
)

## Extract MS1 and MS2 data from raw TDF files for PSM generation (MS2) and scoring (MS1)

In [5]:
from helpers import process_timstof_datasets, sage_quant_map_to_pandas

# helper function for easier readbility
results = process_timstof_datasets(
    # number of precursor peaks to extract on data load
    max_peaks = 50_000,
    # number of precursor peaks to extract on creation of sage IMSpectra
    ms1_take_top_n = 50_000,
    dataset_dirs=[
    '/media/hd01/CCSPred/HELA-GRAD20/M210115_007_Slot1-1_1_856.d/',
    '/media/hd01/CCSPred/HELA-GRAD20/M210115_008_Slot1-1_1_857.d/',
    '/media/hd01/CCSPred/HELA-GRAD20/M210115_009_Slot1-1_1_858.d/',
    ],
)

# extract precursor and fragment data
fragments, ms1_spectra = [], []

for k, v in results.items():
    fragments.append(v['fragments'])
    ms1_spectra.extend(v['ms1_spectra'])

# create combined table for scoring
fragments = pd.concat(fragments)

2025-05-09 16:58:02.395059: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-09 16:58:02.395095: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-09 16:58:02.396263: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-09 16:58:02.401435: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Score MS2 spectra to get PSMs

In [6]:
# scoring
psm_collection = scorer.score_collection_psm(
    db=indexed_db, 
    spectrum_collection=fragments['processed_spec'].values,  
    num_threads=16,
)

## Calculate q-values to identify candidate peptides for LFQ

In [7]:
from sagepy.core.fdr import sage_fdr_psm
sage_fdr_psm(indexed_db=indexed_db, psm_collection=psm_collection)

## Perform retention time alignment 

In [8]:
from sagepy.core.ml.retention_alignment import global_alignment_psm

# align rts, creates alignments to be used for scoring and adds aligned rt values to PSMs
alignments = global_alignment_psm(psm_collection)

## Predict Ion Mobility and Retention Time with SAGE functions

In [9]:
from sagepy.core.ml.mobility_model import predict_sage_im
from sagepy.core.ml.retention_model import predict_sage_rt

# predict RT
predict_sage_rt(
    psm_collection,
    indexed_db
)

# predict IM
predict_sage_im(
    psm_collection,
    indexed_db
)

## Build a FeatureMap for LFQ

In [18]:
from sagepy.core.lfq import build_feature_map_psm, LfqSettings

# build a feature map for LFQ
feature_map = build_feature_map_psm(
    psm_collection,
    lfq_settings=LfqSettings(
        spectral_angle=0.7,
        ppm_tolerance=10.0,
        combine_charge_states=False,
        mobility_pct_tolerance=5.0,
    )
)

## Use the FeatureMap to perform LFQ with ion mobility

In [19]:
# perform quant, return result as pandas table with UNIMOD sequences
quant_result = feature_map.quantify_with_mobiliy_pandas(
    indexed_db=indexed_db,
    ms1=ms1_spectra,
    alignments=alignments,
    variable_mods=variable_mods,
    static_mods=static_mods
)

In [21]:
quant_result.head(5)

Unnamed: 0,peptide,proteins,charge,decoy,rt_bin,spectral_angle,score,q_value,intensity_file_0,intensity_file_1,intensity_file_2
5564,DAEAWFTSR,[sp|P08727|K1C19_HUMAN],2,False,43,0.963205,0.850238,0.000921,32112.736079,33704.756558,35689.702364
15105,TPAQYDASELK,"[sp|A6NMY6|AXA2L_HUMAN, sp|P07355|ANXA2_HUMAN]",2,False,53,0.969954,0.8941,0.000921,241574.471596,211796.792515,244538.940875
21749,EQEVAEER,[sp|Q8IVT2|MISP_HUMAN],2,False,45,0.955002,0.841226,0.000921,73841.480374,57009.646197,63134.516658
15106,LALQALTEK,[sp|Q07065|CKAP4_HUMAN],2,False,51,0.968392,0.902106,0.000921,33555.40421,38059.882247,35882.497778
15107,SPILVATAVAAR,"[sp|O00571|DDX3X_HUMAN, sp|O15523|DDX3Y_HUMAN]",2,False,57,0.964565,0.853845,0.000921,57401.723576,53743.135956,68007.080916
