Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLA Pseudosequence generation #8

Closed
Hugh-OBrien opened this issue Mar 23, 2022 · 3 comments
Closed

HLA Pseudosequence generation #8

Hugh-OBrien opened this issue Mar 23, 2022 · 3 comments

Comments

@Hugh-OBrien
Copy link

Thanks for making the test code for the tool available.

I have a query regarding how the HLA pseudosequences are generated

Here there are hard coded indexes for generating the pseudo sequences; however, my understanding was that an alignment was needed before using these indexes since the HLAs in the fastas you've used are of varying length. After this the indexes from the original netMHCpan paper describing the method wouldn't necessarily be correct for your HLA sequences.

If you look at the HLA analysis in netMHCpan (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000796) the pseudosequences have an expected pattern which I don't think holds using your method, indicating you're not using the same pseudosequences they are (at least at test time).

MHCFlurry used what looks like a similar set of HLA fastas to you and after their alignment they start at index 31 not 7

Did you use a different method for training? If not it could be possible that the network is mainly performing an accurate match between peptide-TCR. The HLAs are still being encoded, but not in a way which preserves the likely contact points.

Apologies if I've missed part of the implementation which addresses this!

@tianshilu
Copy link
Owner

tianshilu commented Mar 25, 2022

Hi @Hugh-OBrien

Thank you for carefully looking into the code and pointing out the problem! After reviewing the pseudo-seq method in both netMHCpan and MHCflurry, I think you are right that we need to do alignments for HLA sequences before taking pseudo sequences. Even though we had the pseudo-sequences not as same as netMHCpans, the performance is still good compared to netMHCpan. I will comment on this caveat in the code. We are also working on pMTnet version 2.0. We will take this caveat into consideration. Thank you again for looking into this and letting us know!

Tianshi

@Hugh-OBrien
Copy link
Author

Thank you for the reply. Yes on my quick retraining test I found there was a modest performance hit on the MHC-peptide binding affinity; however, as this is only the encoder branch it may not end up affecting the overall performance of the overall model in a significant way. Would be good to add in a caveat in a comment as it may increase the performance of future work

@tianshilu
Copy link
Owner

I totally agree with you. Thanks again for your input!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants