Masskit is a unified toolkit for mass spectrometry analytics. The project pairs high-level Python workflows with high-performance Rust extensions to deliver scalable spectrum processing, library search, and deep-learning models for peptides and small molecules.
Masskit provides a complete workflow for spectrum-centric research. Datasets are represented in columnar stores with Apache Arrow and Polars, while compute-intensive routines are accelerated through a Rust extension module. The machine-learning models enable end-to-end prediction, training, and evaluation.
- Spectrum utilities for filtering, normalization, annotation, library search, and visualization.
- Unified ML stack for peptide and small-molecule spectra built on Hugging Face packages with ready-to-use training & inference apps.
- Broad file-format support (MGF, MSP, SDF, and more) with Arrow-backed data pipelines for scalable processing.
- Hybrid Python/Rust architecture that balances developer productivity with optimized performance.
- Python 3.11+
- uv for environment and dependency management
- Rust toolchain (via rustup) for building the extension module
- Git for cloning the repository
# Clone the repository
git clone https://github.com/reductionnist/masskit.git
cd masskit
# Create or update the virtual environment
uv sync
# Option A: run commands through uv without activating the venv
uv run maturin develop
# Option B: activate the environment manually if you prefer
source .venv/bin/activate # macOS/Linux
.\.venv\Scripts\Activate.ps1 # Windows PowerShell
maturin developuv run python -m pytest -k smoke
uv run predict --helpfrom masskit.polars import read_mgf
spectra = read_mgf("example.mgf", num=5)
print(spectra.select(["id", "precursor_mz", "scan"]))Explore the masskit.ai, masskit.spectra, masskit.peptide, and masskit.small_molecule packages for neural
networks, spectrum manipulations, and chemistry-specific tooling.
Common entry points are installed automatically. Prefix each call with uv run if the environment is not activated.
predict– run spectrum predictions using trained models.train_accel– launch supervised training jobs.build_dataset– assemble Arrow datasets for model training and evaluation.rewrite_sdf,batch_converter,transform_table– convert and transform spectral libraries.fasta2peptides,create_peptides,import_fasta– generate peptide libraries from sequence data.
Run uv run <command> --help to see options for each tool.
uv run python -m pytest # Run the full suite
uv run python -m pytest tests/unit/ # Focused unit tests
uv run python -m pytest -k "peptide" # Pattern-based selectionuv run ruff check .
uv run ruff format .uv run maturin develop # Debug build for iterative development
uv run maturin develop -r # Optimized build
uv run maturin build -r # Produce a wheel in rust/target/wheels/Use cargo build or cargo build --release inside rust/ when working directly on the standalone binaries.
