Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labelled training data used in pMTnet #6

Open
madnessfish opened this issue Mar 15, 2022 · 5 comments
Open

Labelled training data used in pMTnet #6

madnessfish opened this issue Mar 15, 2022 · 5 comments

Comments

@madnessfish
Copy link

Thank you for a great tool! I am still pretty new in this field.

I would like to learn more about the training process on pMTnet. I am not sure if I missed the training data in the repository. Could you please provide the training data used in pMTnet with positive and negative labels (e.g. positive/TCR_output.csv, negative/TCR_output.csv, training_positive.csv)? Thank you so much for all your efforts!

@tianshilu
Copy link
Owner

Hi,

Thanks for your interest. You can find positive training and negative training through the links below. https://drive.google.com/file/d/1_pf6xIK2dRql_zZ5A1BzoGvWIYVvEaBp/view?usp=sharing https://drive.google.com/file/d/1KLlH-CBS4ep6UAEeh4Zv9Eghk1hZUWj7/view?usp=sharing

Thanks!

@ddd9898
Copy link

ddd9898 commented Mar 30, 2022

Hi @tianshilu
I have a similar request. I'd like to make some comparisons on the method you proposed. Could you provide the testing data used in pMTnet with both pos/neg labels? Thank you!

@tianshilu
Copy link
Owner

Hi @Miles-DDD,

Please find the testing data with labels through the links below:
https://drive.google.com/file/d/1iddT16YEbEh5LYULokEMoey53RPiVsXt/view?usp=sharing
https://github.com/tianshilu/pMTnet/blob/master/test/input/test_input.csv

Thanks!

@madnessfish
Copy link
Author

Hi @tianshilu
Thank you for providing all the information!

I am curious about how the negative sets are generated (like any script?), as I have found 1912 entries are overlapping in the positive and negative training sets as the following command. Not sure if I have made any mistakes here.
comm -12 <(sort -u neg_training.csv ) <(sort -u pos_training.csv ) | wc -l

Also, I would like to know how these labeled training/ test data contribute to the training_data.csv and testing_data.csv under the pMTNet/data repository.

@tianshilu
Copy link
Owner

Hi @madnessfish,

Thanks for your interest in our study! For each pair of TCR-pMHC, 10 negative pairs are generated by sampling 10 TCRs from the other TCRs randomly. So, there is a very small proportion overlapping between positive and negative by chance. We didn't remove the overlapped pairs from the negative dataset because they help reduce overfitting.

Negative datasets are generated from the training_data.csv and testing_data.csv as I described above. Hope this helps!

Tianshi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants