Skip to content

uppersaranac/masskit

Repository files navigation

Masskit logo


Masskit is a unified toolkit for mass spectrometry analytics. The project pairs high-level Python workflows with high-performance Rust extensions to deliver scalable spectrum processing, library search, and deep-learning models for peptides and small molecules.

Table of Contents

Overview

Masskit provides a complete workflow for spectrum-centric research. Datasets are represented in columnar stores with Apache Arrow and Polars, while compute-intensive routines are accelerated through a Rust extension module. The machine-learning models enable end-to-end prediction, training, and evaluation.

Highlights

  • Spectrum utilities for filtering, normalization, annotation, library search, and visualization.
  • Unified ML stack for peptide and small-molecule spectra built on Hugging Face packages with ready-to-use training & inference apps.
  • Broad file-format support (MGF, MSP, SDF, and more) with Arrow-backed data pipelines for scalable processing.
  • Hybrid Python/Rust architecture that balances developer productivity with optimized performance.

Getting Started

Requirements

  • Python 3.11+
  • uv for environment and dependency management
  • Rust toolchain (via rustup) for building the extension module
  • Git for cloning the repository

Install the environment

# Clone the repository
git clone https://github.com/reductionnist/masskit.git
cd masskit

# Create or update the virtual environment
uv sync

# Option A: run commands through uv without activating the venv
uv run maturin develop

# Option B: activate the environment manually if you prefer
source .venv/bin/activate           # macOS/Linux
.\.venv\Scripts\Activate.ps1       # Windows PowerShell
maturin develop

Verify the installation

uv run python -m pytest -k smoke
uv run predict --help

Using Masskit

Python API

from masskit.polars import read_mgf

spectra = read_mgf("example.mgf", num=5)
print(spectra.select(["id", "precursor_mz", "scan"]))

Explore the masskit.ai, masskit.spectra, masskit.peptide, and masskit.small_molecule packages for neural networks, spectrum manipulations, and chemistry-specific tooling.

Command-line tools

Common entry points are installed automatically. Prefix each call with uv run if the environment is not activated.

  • predict – run spectrum predictions using trained models.
  • train_accel – launch supervised training jobs.
  • build_dataset – assemble Arrow datasets for model training and evaluation.
  • rewrite_sdf, batch_converter, transform_table – convert and transform spectral libraries.
  • fasta2peptides, create_peptides, import_fasta – generate peptide libraries from sequence data.

Run uv run <command> --help to see options for each tool.

Development Guide

Testing

uv run python -m pytest                 # Run the full suite
uv run python -m pytest tests/unit/     # Focused unit tests
uv run python -m pytest -k "peptide"    # Pattern-based selection

Linting & formatting

uv run ruff check .
uv run ruff format .

Building the Rust extension

uv run maturin develop          # Debug build for iterative development
uv run maturin develop -r       # Optimized build
uv run maturin build -r         # Produce a wheel in rust/target/wheels/

Use cargo build or cargo build --release inside rust/ when working directly on the standalone binaries.

Resources

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors