title |
---|
intro |
This repository contains the Pytorch code to replicate experiments in our paper Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires accepted at the International Conference on Learning Representations (ICLR 2025):
@inproceedings{
chapfuwa2025scalable,
title={Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires},
author={Paidamoyo Chapfuwa and Ilker Demirel and Lorenzo Pisani and Javier Zazo and Elon Portugaly and H. Jabran Zahid and Julia Greissl},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=wyF5vNIsO7}
}
- Model type: Unsupervised representation learning
- License: MIT
The code is implemented with the following dependencies:
- Python 3.10.16
- Additional python packages can be installed by running:
poetry install
We consider the following public datasets:
- Synthentic for validating of the proposed JL-GloVe algorithm
- ImmuneCODE for training the publicly available JL-GloVe TCR embeddings
- Emerson for evaluating the trained public TCR embeddings
- To train JL-GloVe embeddings using synthentic data run:
jl-glove generate && jl-glove train
- We provide the 535,186 JL-GloVe TCR embeddings derived from the 3,991 ImmuneCODE repertoires here:
JL-GloVe is shared for research purposes only, namely, benchmarking and inference on downstream tasks. It is not meant to be used for clinical practice. JL-Glove was not extensively tested for its capabilities and properties, including its accuracy and reliability in application settings, fairness across different demographics and uses, and security and privacy.
This is a research model which should not be used in any real clinical or production scenario.
JL-GloVe TCR embeddings reflect the co-occurrence statistics of the data used for training.
The data, code, and model checkpoints described in this repository is provided for research use only. The data, code, and model checkpoints is not intended for use in clinical decision-making or for any other clinical use, and the performance of model for clinical use has not been established. You bear sole responsibility for any use of these data, code, and model checkpoints, including incorporation into any product intended for clinical use.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.