Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow encoding #1

Closed
Akazhiel opened this issue Oct 13, 2021 · 6 comments
Closed

Slow encoding #1

Akazhiel opened this issue Oct 13, 2021 · 6 comments

Comments

@Akazhiel
Copy link
Contributor

Greetings!

Great tool to help predict the TCR-pMHC bindings although, is there any way to speed up the encoding step? Since I understand the aim of this tool is to predict how well your TCR repertoire binds to the predicted pMHCs, the encoding is far slower than what I'd expect. Given you'd pair each TCR to the whole list of pMHCs to test for binding, this would generate files of millions of lines. Currently I'm running it on a file with 2M lines and it's been almost 3 days of running time and the encoding is not even close to be done. Maybe it's not expected to use as input all the possible combinations but just some of them? In that case how would you select them?

Best regards,

Jonatan

@tianshilu
Copy link
Owner

Hi @Akazhiel ,

Thanks for your interest!

One selection step that you can do before encoding is to run all pMHCs through netMHCpan and only keep the pMHCs with a satisfactory rank (e.g. 2%). Then input the pMHCs with TCRs to pMTnet. We are also working on computationally speeding up the encoding process.

Best,
Tianshi

@Akazhiel
Copy link
Contributor Author

Hello @tianshilu ,

Yes we do run the pMHCs through an algorithm different than netMHCpan and filter them by the affinity percentile. My question was more towards how (if possible) to reduce the number of candidates TCRs. Since you'd want to screen each TCR against all the pMHCs.

Cheers,

Jonatan

@tianshilu
Copy link
Owner

Hi @Akazhiel ,

Sorry that we don't have a pre-selection step for TCRs. We are working on speeding up the encoding and prediction. Thanks very much for your feed back!!

Tianshi

@Akazhiel
Copy link
Contributor Author

Hi @tianshilu

That's totally understandable, indeed subsetting the TCRs might be a really hard feat to achieve. I've been tinkering with the code and sped up the encoding steps that take place previous to the encoding with the autoencoder since my knowledge and capabilities regarding machine learning are pretty limited and wouldn't know how to speed up the autoencoder or the predictions.

If it's okay with you I'll open a pull request so that you can review the code. I've done some testing and the TCRmap, antigenMap and HLAMap together take less than one minute for a dataset of 2M rows, the bottleneck of the software now for large datasets is as I've mentioned the prediction step since it needs to loop through each value.

Cheers,

Jonatan

@tianshilu
Copy link
Owner

Hi @Akazhiel ,

Thanks for your effort on this. Please feel free to open a pull request!

Thanks!

Tianshi

@tianshilu
Copy link
Owner

The encoding part has been updated for faster encoding speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants