scDLKit

Train, evaluate, compare, and visualize baseline deep-learning models for single-cell data without writing PyTorch from scratch.

Start Here

Documentation site: https://uddamvathanak.github.io/scDLKit/
Primary notebook tutorial: examples/train_vae_pbmc.ipynb
Install path for tutorials: python -m pip install "scdlkit[tutorials]"
CPU and GPU use the same notebook path through device="auto"
Secondary notebooks: examples/compare_models_pbmc.ipynb, examples/classification_demo.ipynb
Synthetic smoke examples: examples/first_run_synthetic.ipynb, examples/first_run_synthetic.py

Why scDLKit

AnnData-native workflow for single-cell users.
Baseline-first model zoo: AE, VAE, DAE, Transformer AE, and MLP classification.
Built-in training, evaluation, comparison, and plotting.
Reproducible reports and notebooks for portfolio-ready demonstrations.
Gene-expression-focused scope while the core toolkit stabilizes.

Supported platforms

Linux: supported
macOS: supported
Windows: supported

Installation

Primary tutorial install path:

python -m pip install "scdlkit[tutorials]"

Windows note: if you install into a deeply nested virtual environment path, Jupyter dependencies can hit Windows path-length limits. Use a short environment path such as C:\venvs\scdlkit, or enable Windows Long Paths if needed.

Optional extras:

python -m pip install "scdlkit[scanpy]"
python -m pip install "scdlkit[notebook]"
python -m pip install scdlkit
python -m pip install "scdlkit[dev,docs]"

For GPU users, install the matching PyTorch build first using the official selector:

https://docs.pytorch.org/get-started/locally/

Then install scdlkit[tutorials]. The same notebook examples run on CPU or GPU with device="auto".

Scanpy Quickstart

Primary tutorial example. The notebook now uses a quickstart profile by default and exposes a full profile in its first config cell:

quickstart: CPU-friendly, docs-friendly, reproducible
full: longer run for stronger qualitative separation

import scanpy as sc
from scdlkit import TaskRunner

adata = sc.datasets.pbmc3k_processed()

runner = TaskRunner(
    model="vae",
    task="representation",
    label_key="louvain",
    device="auto",
    epochs=20,
    batch_size=128,
    model_kwargs={"kl_weight": 1e-3},
)

runner.fit(adata)
adata.obsm["X_scdlkit_vae"] = runner.encode(adata)

For the PBMC quickstart, use a light VAE KL term so the latent UMAP preserves broad cell-type structure instead of collapsing into a uniform blob. A healthy result should show broad cell-type groups as visibly separated regions rather than a single mixed cloud.

Then continue with Scanpy:

import scanpy as sc

sc.pp.neighbors(adata, use_rep="X_scdlkit_vae")
sc.tl.umap(adata)
sc.pl.umap(adata, color="louvain")

Notebook-First Examples

Most researchers should start with the Scanpy PBMC quickstart:

python -m pip install "scdlkit[tutorials]"
jupyter notebook examples/train_vae_pbmc.ipynb

This notebook:

loads PBMC data through Scanpy
trains a VAE baseline with scDLKit
writes the latent representation into adata.obsm
continues with Scanpy neighbors and UMAP
explains the quickstart versus full tutorial profiles
works on CPU or GPU through device="auto"

Additional Scanpy-first notebooks:

examples/compare_models_pbmc.ipynb: compare PCA, autoencoder, vae, and transformer_ae
examples/classification_demo.ipynb: run the mlp_classifier baseline and inspect a confusion matrix

The synthetic notebook and script are still available, but they are now the smoke-test path rather than the primary researcher onboarding flow:

python -m pip install "scdlkit[notebook]"
jupyter notebook examples/first_run_synthetic.ipynb

python examples/first_run_synthetic.py

These write small reproducible artifacts to artifacts/first_run_notebook/ and artifacts/first_run/.

Optional contributor Conda environment

Conda is kept for contributors and demos. It is not the primary public install path.

Official installers:

Miniconda install guide: https://www.anaconda.com/docs/getting-started/miniconda/install
Anaconda Distribution download: https://www.anaconda.com/download

From the repo root:

conda env create -f environment.yml
conda activate scdlkit

Core APIs

High-level:

from scdlkit import TaskRunner

Lower-level:

from scdlkit import Trainer, create_model, prepare_data

Comparison:

from scdlkit import compare_models

benchmark = compare_models(
    adata,
    models=["autoencoder", "vae", "transformer_ae"],
    task="representation",
    shared_kwargs={"epochs": 10, "label_key": "cell_type"},
    output_dir="artifacts/compare",
)

Supported models

autoencoder
vae
denoising_autoencoder
transformer_ae
mlp_classifier

Supported tasks

representation
reconstruction
classification

Current scope

Gene-expression baselines for AnnData workflows
Scanpy-first tutorial and downstream embedding usage
Built-in deep-learning baselines plus classical comparison context in notebooks

Spatial omics, multimodal workflows, and custom PyTorch model adapters are future work once the gene-expression toolkit quality gates stay stable.

Documentation

Project documentation is published as a Sphinx-based scientific docs site:

Docs site: https://uddamvathanak.github.io/scDLKit/
Tutorials: Scanpy-first notebook walkthroughs rendered in the docs site
API reference: docs/api/index.md
Example notebooks: examples/

GitHub Pages setup

The docs workflow expects GitHub Pages to be enabled once at the repository level.

Open Settings -> Pages for this repo: https://github.com/uddamvathanak/scDLKit/settings/pages
Under Build and deployment, set Source to GitHub Actions.
Save the setting.
Re-run the docs workflow.

Without that one-time setting, GitHub returns a 404 when actions/configure-pages or actions/deploy-pages tries to access the Pages site.

Optional automatic Pages enablement

If you want the workflow to bootstrap Pages automatically instead of doing the one-time manual setup:

Create a repository secret named PAGES_ENABLEMENT_TOKEN.
Use a Personal Access Token with repo scope or Pages write permission.
Re-run the docs workflow.

Release flow

Stage to TestPyPI first with release-testpypi.yml.
Publish the final release from a v* tag with release.yml.
Use trusted publishing instead of long-lived PyPI API tokens.
See RELEASING.md for the full checklist.

Examples

examples/train_vae_pbmc.ipynb is the primary Scanpy-first notebook tutorial.
examples/compare_models_pbmc.ipynb compares autoencoder, vae, and transformer_ae on PBMC data.
examples/classification_demo.ipynb covers the mlp_classifier workflow and confusion-matrix reporting.
examples/first_run_synthetic.ipynb is the secondary smoke-test notebook with minimal setup.
examples/first_run_synthetic.py is the secondary smoke-test script.

Roadmap

Immediate roadmap target:

quality-only hardening toward the next patch release
longer notebook tutorials with quickstart and full profiles
explicit toolkit-quality benchmarking on small Scanpy built-ins
internal release gates for latent quality, classification quality, and seed stability

Released so far:

v0.1

Expanded core workflow with training, evaluation, reporting, and plotting.
Staged TestPyPI and PyPI publishing.
Cross-platform smoke validation and reproducible notebooks.

Later:

adapter-based custom PyTorch model support
deeper downstream tutorials
spatial baselines only after the gene-expression toolkit is stable

Citation

If you use scDLKit, cite the software entry in CITATION.cff.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
scripts		scripts
src/scdlkit		src/scdlkit
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
environment.yml		environment.yml
google6ef07143e50eeac2.html		google6ef07143e50eeac2.html
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scDLKit

Start Here

Why scDLKit

Supported platforms

Installation

Scanpy Quickstart

Notebook-First Examples

Optional contributor Conda environment

Core APIs

Supported models

Supported tasks

Current scope

Documentation

GitHub Pages setup

Optional automatic Pages enablement

Release flow

Examples

Roadmap

Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scDLKit

Start Here

Why scDLKit

Supported platforms

Installation

Scanpy Quickstart

Notebook-First Examples

Optional contributor Conda environment

Core APIs

Supported models

Supported tasks

Current scope

Documentation

GitHub Pages setup

Optional automatic Pages enablement

Release flow

Examples

Roadmap

Citation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages