Skip to content

ICLR 2025: We employ GloVe and random projection theory to infer immunologically meaningful T-cell receptor embeddings from adaptive immune repertoires

License

Notifications You must be signed in to change notification settings

microsoft/jl-glove

Repository files navigation

title
intro

Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires (ICLR 2025)

This repository contains the Pytorch code to replicate experiments in our paper Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires accepted at the International Conference on Learning Representations (ICLR 2025):

@inproceedings{
    chapfuwa2025scalable,
    title={Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires},
    author={Paidamoyo Chapfuwa and Ilker Demirel and Lorenzo Pisani and Javier Zazo and Elon Portugaly and H. Jabran Zahid and Julia Greissl},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=wyF5vNIsO7}
}
  • Model type: Unsupervised representation learning
  • License: MIT

Model

Model

Prerequisites

The code is implemented with the following dependencies:

  • Python 3.10.16
  • Additional python packages can be installed by running:
poetry install

Data

We consider the following public datasets:

  • Synthentic for validating of the proposed JL-GloVe algorithm
  • ImmuneCODE for training the publicly available JL-GloVe TCR embeddings
  • Emerson for evaluating the trained public TCR embeddings

Model Training

  • To train JL-GloVe embeddings using synthentic data run:
jl-glove generate && jl-glove train

Metrics and Visualizations

  • We provide the 535,186 JL-GloVe TCR embeddings derived from the 3,991 ImmuneCODE repertoires here:

Direct intended uses

JL-GloVe is shared for research purposes only, namely, benchmarking and inference on downstream tasks. It is not meant to be used for clinical practice. JL-Glove was not extensively tested for its capabilities and properties, including its accuracy and reliability in application settings, fairness across different demographics and uses, and security and privacy.

Out-of-scope uses

This is a research model which should not be used in any real clinical or production scenario.

Risks and limitations

JL-GloVe TCR embeddings reflect the co-occurrence statistics of the data used for training.

License and Usage Notices

The data, code, and model checkpoints described in this repository is provided for research use only. The data, code, and model checkpoints is not intended for use in clinical decision-making or for any other clinical use, and the performance of model for clinical use has not been established. You bear sole responsibility for any use of these data, code, and model checkpoints, including incorporation into any product intended for clinical use.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

About

ICLR 2025: We employ GloVe and random projection theory to infer immunologically meaningful T-cell receptor embeddings from adaptive immune repertoires

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •