# Rescoring of cleavable XL-MS data using Oktoberfest

This notebook provides an overview of rescoring cleavable XL-MS data in Oktoberfest. 

## 1. Installation:

### Percolator:
- To install percolator on windows download this file: https://github.com/percolator/percolator/releases/download/rel-3-06-01/percolator-v3-06.exe
- Run the downloaded file while running the setup make sure to select "add percolator to the system PATH for all users" when asked.

### ThermoRawFileParser:
- You need this if you want to read thermo rawfiles.
- First download this zip folder localy: https://github.com/compomics/ThermoRawFileParser/releases/download/v1.4.3/ThermoRawFileParser1.4.3.zip
- Extract the contents of the zip folder and make sure to know where this is saved this will be used later by oktoberfest.

### Oktoberfest:
- Oktoberfest currently supports Python versions 3.10 and 3.11. Support for 3.12 will be added in the near future.
- Install oktoberfest using pip install oktoberfest

## 2. Import necessary python packages

In [1]:
import os
from oktoberfest.runner import run_job
import json
import urllib.request
import shutil
from tqdm import tqdm

  machar = _get_machar(dtype)
  from .autonotebook import tqdm as notebook_tqdm


## 3. Download files from zenodo required for rescoring task

The data used in this tutorial is available through a public Zenodo record.
The dataset is approximately 639 MB in size and includes:

- msms.csv: the output from **XiSearch (version 1.8.7)**

- one RAW file used for rescoring crosslinked peptides.

### Get the current directory and set the file name

In [2]:
download_dir = os.getcwd()
download_file = os.path.join(download_dir, "Oktoberfest_XL_input.zip")
url = "https://zenodo.org/records/15639875/files/Oktoberfest_XL_input.zip?download=1"

# set download to False if you already have the file and don"t want to download again in the next step
download = True

### Download and extract files from zenodo to the same directory

In [3]:
if download:
    with tqdm(unit="B", total=639600000, unit_scale=True, unit_divisor=1000, miniters=1, desc=url.split("/")[-1]) as t:
        urllib.request.urlretrieve(
            url=url,
            filename=download_file,
            reporthook=lambda blocks, block_size, _: t.update(blocks * block_size - t.n)
        )
    shutil.unpack_archive(download_file, download_dir)


Oktoberfest_XL_input.zip?download=1: 640MB [00:50, 12.7MB/s]                                                 


### Check downloaded files

In [4]:
input_dir = download_file[:-4]
print(f"Downloaded data is stored in {input_dir}\nContents:")
os.listdir(input_dir)

Downloaded data is stored in /home/mkalhor/oktoberfest/tutorials/Oktoberfest_XL_input
Contents:


['msms.csv',
 'XLpeplib_Beveridge_QEx-HFX_DSSO_stHCD.raw',
 'XLpeplib_Beveridge_QEx-HFX_DSSO_stHCD.raw:Zone.Identifier']

## 4. Rescoring
**Important**: The intensity model specified in the config file is
**Prosit_2023_intensity_XL_CMS2** — this is used because DSSO was the crosslinker in this dataset.

If you are using non-cleavable crosslinkers (e.g., DSS or BS3), make sure to update the model name in the config file to:
**Prosit_2024_intensity_XL_NMS2**

### Generate config file

In [5]:
task_config_rescoring = {
    "type": "Rescoring",
    "tag": "",
    "inputs":{
        "search_results": input_dir + "/msms.csv",
        "search_results_type": "Xisearch",
        "spectra": input_dir,
        "spectra_type": "raw",
    },
    "output": "./XL_out",
    "models": {
        "intensity": "Prosit_2023_intensity_XL_CMS2",
        "irt": ""
    },
    "prediction_server": "koina.wilhelmlab.org:443",
    "ssl": True,
    "thermoExe": "ThermoRawFileParser.exe",
    "numThreads": 1,
    "fdr_estimation_method": "percolator",
    "allFeatures": False,
    "massTolerance": 40,
    "unitMassTolerance": "ppm",
    "ce_alignment_options": {
        "ce_range": [
            5,
            45
        ],
        "use_ransac_model": True
    }
}

### Save the config file

In [6]:
with open("./rescoring_config.json", "w") as fp:
    json.dump(task_config_rescoring, fp)

### Start rescoring

In [7]:
run_job("./rescoring_config.json")

2025-06-14 15:44:07,359 - INFO - oktoberfest.utils.config::read Reading configuration from ./rescoring_config.json
2025-06-14 15:44:07,383 - INFO - oktoberfest.runner::run_job Oktoberfest version 0.9.1
Copyright 2025, Wilhelmlab at Technical University of Munich
2025-06-14 15:44:07,433 - INFO - oktoberfest.runner::run_job Job executed with the following config:
2025-06-14 15:44:07,436 - INFO - oktoberfest.runner::run_job {
    "type": "Rescoring",
    "tag": "",
    "inputs": {
        "search_results": "/home/mkalhor/oktoberfest/tutorials/Oktoberfest_XL_input/msms.csv",
        "search_results_type": "Xisearch",
        "spectra": "/home/mkalhor/oktoberfest/tutorials/Oktoberfest_XL_input"
    },
    "output": "./XL_out",
    "models": {
        "intensity": "Prosit_2023_intensity_XL_CMS2",
        "irt": ""
    },
    "prediction_server": "koina.wilhelmlab.org:443",
    "ssl": true,
    "thermoExe": "ThermoRawFileParser.exe",
    "numThreads": 1,
    "fdr_estimation_method": "percolat

Cannot open assembly 'ThermoRawFileParser.exe': No such file or directory.


CalledProcessError: Command '['mono', PosixPath('ThermoRawFileParser.exe'), '--msLevel=2', '-i', PosixPath('/home/mkalhor/oktoberfest/tutorials/Oktoberfest_XL_input/XLpeplib_Beveridge_QEx-HFX_DSSO_stHCD.raw'), '-b', PosixPath('XL_out/spectra/XLpeplib_Beveridge_QEx-HFX_DSSO_stHCD.mzML')]' returned non-zero exit status 2.

### Check the results

The results are written to the output folder specified in your config file.

You should find the following key output files:

1. **percolator_xifdr_input.csv**

   Location: `.../XL_out/results/percolator/percolator_xifdr_input.csv`  

   This file contains Percolator scores for each CSM (cross-linked spectrum match).  

   It is intended for use with the **xiFDR** tool to estimate FDR.  

   👉 Note: Oktoberfest **does not perform FDR estimation** itself for XL-MS data — it only generates Percolator-based scores.  

   You can upload this file directly to **xiFDR** and apply FDR estimation as needed.  

   🔗 More info: [xiFDR GitHub Repository](https://github.com/Rappsilber-Laboratory/xiFDR)


   

2. **xisearch_xifdr_input.csv**  

   **Location:** `.../XL_out/results/percolator/xisearch_xifdr_input.csv`  

   This file contains XiSearch scores for each CSM (cross-linked spectrum match).  

   It can also be used as input for **xiFDR**, just like the Percolator version.  

   This allows you to **compare the performance of rescoring** (Percolator) against the original XiSearch scores.  

   📊 Useful for benchmarking rescoring effectiveness.

