<a href="https://colab.research.google.com/github/timosachsenberg/EuBIC2025/blob/main/EUBIC_Task3_Quant.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notebook 3 – Quantification

In this tutorial, we demonstrate a complete feature detection and annotation workflow using Biosaur2, an isotope-aware feature detection algorithm, as implemented in pyOpenMS.

In this notebook we will:

1. **Apply the Biosaur2 algorithm to detect isotope-resolved features from mzML data.**

2. **Feature map annotations with peptide identifications.**

3. **Visually inspect detected features in retention time–m/z–intensity space.**



In [None]:
# install pyopenms, pyopenms-viz
!pip install pyopenms
!pip install pyopenms-viz

Collecting pyopenms
  Downloading pyopenms-3.5.0-cp312-cp312-manylinux_2_34_x86_64.whl.metadata (2.0 kB)
Downloading pyopenms-3.5.0-cp312-cp312-manylinux_2_34_x86_64.whl (65.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.9/65.9 MB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pyopenms
Successfully installed pyopenms-3.5.0
Collecting pyopenms-viz
  Downloading pyopenms_viz-1.0.0-py3-none-any.whl.metadata (5.7 kB)
Downloading pyopenms_viz-1.0.0-py3-none-any.whl (153 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m153.1/153.1 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pyopenms-viz
Successfully installed pyopenms-viz-1.0.0


In [None]:

%matplotlib inline
import os
import pyopenms as oms
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
print("pyOpenMS version:", oms.__version__)


pyOpenMS version: 3.5.0


In [None]:
# Download mzML and idXML files
from urllib.request import urlretrieve

if not os.path.exists("BSA1.mzML"):
    !wget -O "BSA1.mzML" https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master/src/data/BSA1.mzML

if not os.path.exists("BSA1_F1.idXML"):
    !wget -O "BSA1_F1.idXML" https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master/src/data/BSA1_F1.idXML


--2025-12-29 09:23:51--  https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master/src/data/BSA1.mzML
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13607928 (13M) [text/plain]
Saving to: ‘BSA1.mzML’


2025-12-29 09:23:52 (182 MB/s) - ‘BSA1.mzML’ saved [13607928/13607928]

--2025-12-29 09:23:52--  https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master/src/data/BSA1_F1.idXML
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13781 (13K) [text/plain]
Saving to: ‘BSA1_F1.idXML’


2025-12-29 09:23:52 (21.3 MB/s) - ‘BSA1_F1.id

In [None]:
# Load mzML file to MSExperiment
exp = oms.MSExperiment()
oms.MzMLFile().load("BSA1.mzML", exp)


# 1. Apply the Biosaur2 algorithm.

**Aims of this task**

- Biosaur2 is an isotope-aware feature detection method that identifies peptide features by clustering peaks across retention time and evaluating their isotopic patterns and charge state consistency.
- This strategy enables reliable discrimination between true peptide signals and background noise, particularly in complex LC–MS datasets.
- As each detected feature represents an aggregated MS1 signal characterized by a centroid mass-to-charge ratio, a retention time apex, an integrated intensity, and an inferred charge state.

**Implementation**
- The Biosaur2 algorithm (`Biosaur2Algorithm()`) instance initialized and provided with the experimental data as `MSExperiment` loaded from mzML file.
- Feature detection was executed using default algorithm parameters, yielding a FeatureMap object containing all detected features.



In [None]:
# Initialized Biosaur feature detection algorithm
biosaur = oms.Biosaur2Algorithm()

# Run biosaur
biosaur.setMSData(exp)


In [None]:
# biosaur output as FeatureMap
features = oms.FeatureMap()
biosaur.run(features)

#oms.FeatureXMLFile().store("BSA1.featureXML", features)

# 2. Feature map annotations with peptide identifications.

**Aims of this task**
- To associate MS1-level quantitative features with sequence-level peptide identifications.
- To integrate peptide and protein identification information into the detected feature map.
- To enable biologically interpretable, peptide-resolved quantitative analysis.

**Implementation**
- Peptide and protein identifications were loaded from an idXML file generated by MS/MS database searching.
- An `IDMapper` instance was initialized to perform feature–identification mapping. see: [https://pyopenms.readthedocs.io/en/latest/user_guide/PSM_to_features.html](https://pyopenms.readthedocs.io/en/latest/user_guide/PSM_to_features.html)
- Peptide identifications were annotated onto detected features based on proximity in RT and m/z space.
- The annotated features were stored within the existing `FeatureMap` structure for downstream analysis.

In [None]:
# Load identification (.idXML) file extract peptides and protein
peptide_ids = oms.PeptideIdentificationList()
protein_ids = []
oms.IdXMLFile().load("BSA1_F1.idXML", protein_ids, peptide_ids)


In [None]:
# Configure IDMapper
id_mapper = oms.IDMapper()
params = id_mapper.getParameters()
params.setValue("rt_tolerance", 5.0)  # RT tolerance in seconds
params.setValue("mz_tolerance", 10.0)  # m/z tolerance in ppm
id_mapper.setParameters(params)


In [None]:
id_mapper.annotate(features, peptide_ids, protein_ids, True, True, exp)


# 3. Visually inspect detected features in retention time–m/z–intensity space.

**Aims of this task**
- To visually evaluate the detected MS1 features in retention time–mass-to-charge–intensity space, enabling qualitative assessment of feature detection performance.

**Implementation**
- The detected feature map was converted into a tabular pandas DataFrame for exploratory analysis. see: [https://pyopenms.readthedocs.io/en/latest/user_guide/export_pandas_dataframe.html](https://pyopenms.readthedocs.io/en/latest/user_guide/export_pandas_dataframe.html)
- The plotting backend was configured to enable mass spectrometry–specific visualizations. see: [https://pyopenms-viz.readthedocs.io/en/latest/](https://pyopenms-viz.readthedocs.io/en/latest/)
- A peak map visualization was generated, projecting features in retention time, m/z, and intensity space.


In [None]:
# Export features into dataframe
df = features.get_df()
df.head(2)

Unnamed: 0_level_0,peptide_sequence,peptide_score,ID_filename,ID_native_id,charge,rt,mz,rt_start,rt_end,mz_start,mz_end,quality,intensity
feature_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
12592275275208522725,,,unknown,,5,2272.696289,674.534031,2252.12915,2299.366455,674.532496,676.139162,9.0,83422.359375
17439098422051461769,,,,,3,2208.773438,655.301653,2195.811768,2217.431396,655.297018,657.308736,7.0,230824.5


In [None]:
# interactive PeakMap plot with plotly
from pyopenms_viz._plotly import PLOTLYPeakMapPlot

plot = PLOTLYPeakMapPlot(
    data=df,
    x="rt",
    y="mz",
    z="intensity",
    width=800,
    height=800,
    grid=False,
    add_marginals=True, # showing RT and intensities
)

plot.show()


In [None]:
# ploting peakmap and having bounding boxes at the feature position
plot = PLOTLYPeakMapPlot(
    data=df,
    x="rt",
    y="mz",
    z="intensity",
    width=1000,
    height=1000,
    grid=False,
)

# Create rectangles for all features
shapes = []
for _, row in df.iterrows():
    shapes.append(
        dict(
            type="rect",
            x0=row["rt_start"],
            x1=row["rt_end"],
            y0=row["mz_start"],
            y1=row["mz_end"],
            line=dict(color="blue", width=1)
        )
    )

# Add all rectangles to the plot
plot.fig.update_layout(shapes=shapes)

# Show the interactive plot
plot.show()


In [None]:
# Filter features within the RT window
df_cut = df[(df["rt_start"] >= 1600) & (df["rt_end"] <= 1650)]

# Plot peakmap
plot = PLOTLYPeakMapPlot(
    data=df_cut,
    x="rt",
    y="mz",
    z="intensity",
    width=1000,
    height=1000,
    grid=False,
)

# Create rectangles for filtered features
shapes = []
for _, row in df_cut.iterrows():
    shapes.append(
        dict(
            type="rect",
            x0=row["rt_start"],
            x1=row["rt_end"],
            y0=row["mz_start"],
            y1=row["mz_end"],
            line=dict(color="blue", width=1)
        )
    )

# Add rectangles to the plot
plot.fig.update_layout(shapes=shapes)

# Show the interactive plot
plot.show()
