# QC from mzML runs 

In this notebook, we will explore the quality control (QC) of mass spectrometry data stored in mzML format. The goal is to ensure that the data is suitable for further analysis and to identify any potential issues that may affect the results. This code is specifically designed to work with mzML files from SCAPIS DIA experiements with Khue. There are around 16 plates of plasma samples, each plate containing 96 samples. The mzML files are stored in a specific directory structure, and we will process them to extract relevant information for QC.


In [None]:
# Import packages
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pyteomics import mzml
from pyteomics import mass  



In [None]:
# Path to mzML 
mzml_path = '/home/thanadol/Documents/GitHub/deepmrm_input/mzML'

In [None]:
# Read mzML filesdef read_mzml_files(directory):
mzml_files = [f for f in os.listdir(mzml_path) if f.endswith('.mzML')]
data = []
for file in mzml_files:
    file_path = os.path.join(directory, file)
    with mzml.MzML(file_path) as reader:
        for spectrum in reader:
            if 'm/z array' in spectrum and 'intensity array' in spectrum:
                mz = spectrum['m/z array']
                intensity = spectrum['intensity array']
                data.append({
                    'file': file,
                    'mz': mz,
                    'intensity': intensity,
                    'scan_time': spectrum.get('scan start time', None)
                })
pd.DataFrame(data)
