Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires (ICLR 2025)

title
intro

Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires (ICLR 2025)

This repository contains the Pytorch code to replicate experiments in our paper Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires accepted at the International Conference on Learning Representations (ICLR 2025):

@inproceedings{
    chapfuwa2025scalable,
    title={Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires},
    author={Paidamoyo Chapfuwa and Ilker Demirel and Lorenzo Pisani and Javier Zazo and Elon Portugaly and H. Jabran Zahid and Julia Greissl},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=wyF5vNIsO7}
}

Model type: Unsupervised representation learning
License: MIT

Model

Prerequisites

The code is implemented with the following dependencies:

Python 3.10.16
Additional python packages can be installed by running:

poetry install

Data

We consider the following public datasets:

Synthentic for validating of the proposed JL-GloVe algorithm
ImmuneCODE for training the publicly available JL-GloVe TCR embeddings
Emerson for evaluating the trained public TCR embeddings

Model Training

To train JL-GloVe embeddings using synthentic data run:

jl-glove generate && jl-glove train

Metrics and Visualizations

We provide the 535,186 JL-GloVe TCR embeddings derived from the 3,991 ImmuneCODE repertoires here:

Direct intended uses

JL-GloVe is shared for research purposes only, namely, benchmarking and inference on downstream tasks. It is not meant to be used for clinical practice. JL-Glove was not extensively tested for its capabilities and properties, including its accuracy and reliability in application settings, fairness across different demographics and uses, and security and privacy.

Out-of-scope uses

This is a research model which should not be used in any real clinical or production scenario.

Risks and limitations

JL-GloVe TCR embeddings reflect the co-occurrence statistics of the data used for training.

License and Usage Notices

The data, code, and model checkpoints described in this repository is provided for research use only. The data, code, and model checkpoints is not intended for use in clinical decision-making or for any other clinical use, and the performance of model for clinical use has not been established. You bear sole responsibility for any use of these data, code, and model checkpoints, including incorporation into any product intended for clinical use.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.azure-pipelines		.azure-pipelines
.devcontainer		.devcontainer
.github		.github
.tmp		.tmp
bin		bin
docs		docs
src		src
test		test
.envrc		.envrc
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
jl-glove.code-workspace		jl-glove.code-workspace
mkdocs.yaml		mkdocs.yaml
model_v2.png		model_v2.png
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires (ICLR 2025)

Model

Prerequisites

Data

Model Training

Metrics and Visualizations

Direct intended uses

Out-of-scope uses

Risks and limitations

License and Usage Notices

Trademarks

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

License

microsoft/jl-glove

Folders and files

Latest commit

History

Repository files navigation

Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires (ICLR 2025)

Model

Prerequisites

Data

Model Training

Metrics and Visualizations

Direct intended uses

Out-of-scope uses

Risks and limitations

License and Usage Notices

Trademarks

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages