Slow encoding #1

Akazhiel · 2021-10-13T08:28:27Z

Greetings!

Great tool to help predict the TCR-pMHC bindings although, is there any way to speed up the encoding step? Since I understand the aim of this tool is to predict how well your TCR repertoire binds to the predicted pMHCs, the encoding is far slower than what I'd expect. Given you'd pair each TCR to the whole list of pMHCs to test for binding, this would generate files of millions of lines. Currently I'm running it on a file with 2M lines and it's been almost 3 days of running time and the encoding is not even close to be done. Maybe it's not expected to use as input all the possible combinations but just some of them? In that case how would you select them?

Best regards,

Jonatan

tianshilu · 2021-10-13T15:07:43Z

Hi @Akazhiel ,

Thanks for your interest!

One selection step that you can do before encoding is to run all pMHCs through netMHCpan and only keep the pMHCs with a satisfactory rank (e.g. 2%). Then input the pMHCs with TCRs to pMTnet. We are also working on computationally speeding up the encoding process.

Best,
Tianshi

Akazhiel · 2021-10-14T08:45:05Z

Hello @tianshilu ,

Yes we do run the pMHCs through an algorithm different than netMHCpan and filter them by the affinity percentile. My question was more towards how (if possible) to reduce the number of candidates TCRs. Since you'd want to screen each TCR against all the pMHCs.

Cheers,

Jonatan

tianshilu · 2021-10-14T21:20:34Z

Hi @Akazhiel ,

Sorry that we don't have a pre-selection step for TCRs. We are working on speeding up the encoding and prediction. Thanks very much for your feed back!!

Tianshi

Akazhiel · 2021-10-15T10:11:14Z

Hi @tianshilu

That's totally understandable, indeed subsetting the TCRs might be a really hard feat to achieve. I've been tinkering with the code and sped up the encoding steps that take place previous to the encoding with the autoencoder since my knowledge and capabilities regarding machine learning are pretty limited and wouldn't know how to speed up the autoencoder or the predictions.

If it's okay with you I'll open a pull request so that you can review the code. I've done some testing and the TCRmap, antigenMap and HLAMap together take less than one minute for a dataset of 2M rows, the bottleneck of the software now for large datasets is as I've mentioned the prediction step since it needs to loop through each value.

Cheers,

Jonatan

tianshilu · 2021-10-15T14:46:14Z

Hi @Akazhiel ,

Thanks for your effort on this. Please feel free to open a pull request!

Thanks!

Tianshi

tianshilu · 2021-10-18T16:29:12Z

The encoding part has been updated for faster encoding speed.

tianshilu closed this as completed Oct 18, 2021

Whale-fall mentioned this issue May 26, 2022

Layer lstm_2 will not use cuDNN kernels since it doesn't meet the criteria. #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow encoding #1

Slow encoding #1

Akazhiel commented Oct 13, 2021

tianshilu commented Oct 13, 2021

Akazhiel commented Oct 14, 2021

tianshilu commented Oct 14, 2021

Akazhiel commented Oct 15, 2021

tianshilu commented Oct 15, 2021

tianshilu commented Oct 18, 2021

Slow encoding #1

Slow encoding #1

Comments

Akazhiel commented Oct 13, 2021

tianshilu commented Oct 13, 2021

Akazhiel commented Oct 14, 2021

tianshilu commented Oct 14, 2021

Akazhiel commented Oct 15, 2021

tianshilu commented Oct 15, 2021

tianshilu commented Oct 18, 2021