## dda-PASEF LFQ

SAGE now supports a new LFQ mode that incorporates the ion mobility dimension of timsTOF data. This notebook serves as a starting point for exposing the logic of this LFQ method to Python, using the imspy package for raw data extraction and preprocessing. Currently, the LFQ algorithm does not yet perform as expected—likely due to a configuration issue, incorrect passing of precursor spectra, a misunderstanding of certain fields, or another unidentified cause. Nevertheless, I include it here in the hope that it can serve as a starting point for those interested in using it and willing to engage in some debugging.

## Create A SAGE database

In [2]:
import numpy as np
import pandas as pd

from sagepy.utility import create_sage_database

indexed_db = create_sage_database(
    fasta_path='/media/hd02/data/fasta/hela/plain/hela.fasta'
)

## Create a Scorer

In [3]:
from sagepy.core import Scorer, Tolerance

static_mods = {
    "C": "[UNIMOD:4]"
}

variable_mods = {
    "M": ["[UNIMOD:1]", "[UNIMOD:35]"], 
    "[": ["[UNIMOD:1]"]
}

# create a scorer object that can be used to search a database given a collection of spectra to search
scorer = Scorer(
    precursor_tolerance=Tolerance(ppm=(-15.0, 15.0)),
    fragment_tolerance=Tolerance(ppm=(-10.0, 10.0)),
    report_psms=5,
    min_matched_peaks=5,
    annotate_matches=True,
    variable_mods=variable_mods,
    static_mods=static_mods
)

## Extract MS1 and MS2 data from raw TDF files for PSM generation (MS2) and scoring (MS1)

In [4]:
from helpers import process_timstof_datasets, sage_quant_map_to_pandas

# helper function for easier readbility
results = process_timstof_datasets([
    '/media/hd01/CCSPred/HELA-GRAD20/M210115_007_Slot1-1_1_856.d/',
    '/media/hd01/CCSPred/HELA-GRAD20/M210115_008_Slot1-1_1_857.d/',
    '/media/hd01/CCSPred/HELA-GRAD20/M210115_009_Slot1-1_1_858.d/'
])

# extract precursor and fragment data
fragments, ms1_spectra = [], []

for k, v in results.items():
    fragments.append(v['fragments'])
    ms1_spectra.extend(v['ms1_spectra'])

fragments = pd.concat(fragments)

2025-05-08 17:50:03.182907: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-05-08 17:50:03.182940: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-05-08 17:50:03.184122: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-08 17:50:03.189398: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [16]:
# display results
fragments.head(3)

Unnamed: 0_level_0,frame_id,time,precursor_id,raw_data,scan_begin,scan_end,isolation_mz,isolation_width,collision_energy,largest_peak_mz,average_mz,monoisotopic_mz,charge,average_scan,intensity,parent_id,mobility,spec_id,sage_precursor,processed_spec
precursor_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,2,0.015169,1,"TimsFrame(frame_id=2, ms_type=FragmentDda, num...",380,405,922.488951,3.0,43.179661,922.008681,922.488951,922.008681,1.0,392.013754,7083.0,1,1.178431,2-1-M210115_007_Slot1-1_1_856.d,"Precursor(mz: 922.01, intensity: 7083.0, charg...","ProcessedSpectrum(level: 2, id: 2-1-M210115_00..."
2,2,0.015169,2,"TimsFrame(frame_id=2, ms_type=FragmentDda, num...",675,700,575.083637,2.0,30.179661,574.79563,575.083637,574.79563,2.0,687.640148,4056.0,1,0.850545,2-2-M210115_007_Slot1-1_1_856.d,"Precursor(mz: 574.8, intensity: 4056.0, charge...","ProcessedSpectrum(level: 2, id: 2-2-M210115_00..."
3,2,0.015169,3,"TimsFrame(frame_id=2, ms_type=FragmentDda, num...",750,775,394.33425,2.0,26.874576,394.191085,394.33425,394.191085,2.0,762.585696,7117.0,1,0.774292,2-3-M210115_007_Slot1-1_1_856.d,"Precursor(mz: 394.19, intensity: 7117.0, charg...","ProcessedSpectrum(level: 2, id: 2-3-M210115_00..."


## Score MS2 spectra to get PSMs

In [6]:
# scoring
psm_collection = scorer.score_collection_psm(
    db=indexed_db, 
    spectrum_collection=fragments['processed_spec'].values,  
    num_threads=16,
)

## Calculate q-values to identify candidate peptides for LFQ

In [7]:
from sagepy.core.fdr import sage_fdr_psm
sage_fdr_psm(indexed_db=indexed_db, psm_collection=psm_collection)

## Perform retention time alignment 

In [8]:
from sagepy.core.ml.retention_alignment import global_alignment_psm
alignments = global_alignment_psm(psm_collection)

## Build a FeatureMap for LFQ

In [9]:
from sagepy.core.lfq import build_feature_map_psm, LfqSettings

feature_map = build_feature_map_psm(
    psm_collection,
    lfq_settings=LfqSettings(
        spectral_angle=0.7,
        ppm_tolerance=5.0,
        combine_charge_states=True,
        mobility_pct_tolerance=1.0
    )
)

## Use the FeatureMap to perform LFQ with ion mobility

In [17]:
quant_result = feature_map.quantify_with_mobility(
    indexed_db=indexed_db,
    ms1=ms1_spectra,
    alignments=alignments
)

## Create pandas table and inspect results of LFQ

In [19]:
quant_table = sage_quant_map_to_pandas(quant_result)
quant_table.head(3)

Unnamed: 0,peptide_id,charge,decoy,rt_bin,spectral_angle,score,q_value,intensity_file_0,intensity_file_1,intensity_file_2
0,391099,,False,27,0.764648,0.364817,0.285714,3292.932904,1891.167035,2739.750089
1,59715,,False,51,0.745282,0.411213,0.285714,133.659514,7.070516,111.486366
2,988843,,False,81,0.855199,0.454496,0.285714,3073.365193,3324.818342,1973.800182
