-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HLA Pseudosequence generation #8
Comments
Hi @Hugh-OBrien , Thank you for carefully looking into the code and pointing out the problem! After reviewing the pseudo-seq method in both netMHCpan and MHCflurry, I think you are right that we need to do alignments for HLA sequences before taking pseudo sequences. Even though we had the pseudo-sequences not as same as netMHCpans, the performance is still good compared to netMHCpan. I will comment on this caveat in the code. We are also working on pMTnet version 2.0. We will take this caveat into consideration. Thank you again for looking into this and letting us know! Tianshi |
Thank you for the reply. Yes on my quick retraining test I found there was a modest performance hit on the MHC-peptide binding affinity; however, as this is only the encoder branch it may not end up affecting the overall performance of the overall model in a significant way. Would be good to add in a caveat in a comment as it may increase the performance of future work |
I totally agree with you. Thanks again for your input! |
Thanks for making the test code for the tool available.
I have a query regarding how the HLA pseudosequences are generated
Here there are hard coded indexes for generating the pseudo sequences; however, my understanding was that an alignment was needed before using these indexes since the HLAs in the fastas you've used are of varying length. After this the indexes from the original netMHCpan paper describing the method wouldn't necessarily be correct for your HLA sequences.
If you look at the HLA analysis in netMHCpan (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000796) the pseudosequences have an expected pattern which I don't think holds using your method, indicating you're not using the same pseudosequences they are (at least at test time).
MHCFlurry used what looks like a similar set of HLA fastas to you and after their alignment they start at index 31 not 7
Did you use a different method for training? If not it could be possible that the network is mainly performing an accurate match between peptide-TCR. The HLAs are still being encoded, but not in a way which preserves the likely contact points.
Apologies if I've missed part of the implementation which addresses this!
The text was updated successfully, but these errors were encountered: