[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/timosachsenberg/pyopenms-idfreeqc/hackathon_ID_free_metrics.ipynb)


# PyOpenMS ID-Free QC Demo

This notebook demonstrates how to use the `pyopenms-idfreeqc` library to calculate quality control metrics from mzML files.

## Setup

First, let's import the necessary libraries and download demo data.

In [None]:
import os
import json
from urllib.request import urlretrieve
from pyopenms_idfreeqc.calculate_metrics import calculate_metrics, print_metrics_tables
from IPython.display import Image, display, JSON

## Download Demo Data

We'll download a few example mzML files for demonstration.

In [None]:
# Demo mzML files (mix of proteomics and metabolomics)
DEMO_FILES = {
    "https://raw.githubusercontent.com/OpenMS/OpenMS/refs/heads/develop/share/OpenMS/examples/BSA/BSA1.mzML": "BSA1.mzML",
    "https://raw.githubusercontent.com/OpenMS/OpenMS/refs/heads/develop/share/OpenMS/examples/BSA/BSA2.mzML": "BSA2.mzML",
    "https://raw.githubusercontent.com/OpenMS/OpenMS/refs/heads/develop/share/OpenMS/examples/BSA/BSA3.mzML": "BSA3.mzML",
}

# Download files if they don't exist
demo_dir = "demo_data"
os.makedirs(demo_dir, exist_ok=True)

demo_file_paths = []
for url, filename in DEMO_FILES.items():
    filepath = os.path.join(demo_dir, filename)
    demo_file_paths.append(filepath)
    
    if os.path.exists(filepath):
        print(f"✓ {filename} already exists")
    else:
        print(f"Downloading {filename}...")
        urlretrieve(url, filepath)
        print(f"✓ Downloaded {filename}")

print(f"\nReady to process {len(demo_file_paths)} files")

## Calculate QC Metrics

Now let's calculate the metrics using the library's main function.

In [None]:
# Define output paths
output_json = "demo_qc_metrics.mzQC.json"
output_plot = "demo_qc_heatmap.png"

# Calculate metrics
# Note: show_tables=False to suppress table output here (we'll format it nicely later)
# show_json=False to avoid printing raw JSON in notebook output
json_output = calculate_metrics(
    mzml_files=demo_file_paths,
    output_file=output_json,
    generate_plot=True,
    plot_output=output_plot,
    show_tables=False,  # We'll display tables separately
    show_json=False     # We'll display JSON nicely using IPython
)

print("\n✅ Metrics calculation complete!")

## Display Metrics Table

Let's display the metrics in a formatted table.

In [None]:
# Print formatted metrics tables
print("="*120)
print("QC METRICS TABLES")
print("="*120)
print_metrics_tables(json_output)

## View mzQC JSON Output

The complete mzQC format output with all metrics and metadata.

In [None]:
# Parse and display JSON in a nicely formatted way
mzqc_data = json.loads(json_output)
display(JSON(mzqc_data, expanded=False))

## Display Heatmap Visualization

Visual comparison of QC metrics across all runs.

In [None]:
# Display the generated heatmap
if os.path.exists(output_plot):
    display(Image(filename=output_plot))
else:
    print("Heatmap not found!")

## Accessing Specific Metrics

You can also parse the mzQC JSON to access specific metrics programmatically.

In [None]:
# Example: Extract specific metrics from the mzQC data
print("Sample-level metrics extracted from mzQC:\n")

for run_quality in mzqc_data['runQualities']:
    metadata = run_quality['metadata']
    run_name = metadata['label']
    
    print(f"\n{run_name}:")
    print(f"  Input file: {metadata['inputFiles'][0]['name']}")
    
    # Find specific metrics
    for metric in run_quality['qualityMetrics']:
        name = metric.get('name', 'Unknown')
        
        # Show a few key metrics
        if name in ['NumberOfSpectra_MS1', 'NumberOfSpectra_MS2', 'ChromatographyDuration']:
            value = metric.get('value', 'N/A')
            unit = metric.get('unit', {}).get('name', '')
            print(f"  {name}: {value} {unit}")

## Summary

This notebook demonstrated:

1. **Library usage**: Calling `calculate_metrics()` programmatically
2. **Data processing**: Computing 100+ QC metrics from mzML files
3. **Output formats**: 
   - mzQC JSON (standard format)
   - Formatted tables
   - Heatmap visualization
4. **Data access**: Parsing and extracting specific metrics from mzQC output

### Next Steps

- Use your own mzML files by changing the `demo_file_paths` list
- Integrate into your QC pipeline
- Compare metrics across different experiments
- Set quality thresholds based on the metrics

## Clean Up (Optional)

Remove demo files and outputs if desired.

In [None]:
# Uncomment to clean up demo files
# import shutil
# shutil.rmtree(demo_dir)
# os.remove(output_json)
# os.remove(output_plot)
# print("Cleanup complete!")