AbLSTM.py returns scores in different order #4

prihoda · 2020-12-17T09:38:22Z

Hi all, I noticed a very important issue - the ablstm.py script returns scores in a different order than the order of the input sequences.

I tried processing a diverse set of sequences in one file (human, humanized and murine therapeutic sequences) and got scores that were not consistent with your published distributions:

At first I thought it was an overfitting issue, but then I found that I am getting a different result when processing just the first few sequences. When I processed the sequences one by one, the scores now fall into the expected ranges:

The text was updated successfully, but these errors were encountered:

xf3227 · 2020-12-17T16:58:16Z

Hi prihoda. Thank you for the comment! I have encountered similar issues caused by the inconsistent mechanisms of random number generation across different environments. Since we was also processing the sequences one by one during the testing stage, we failed to notice this bug. I will try fixing it and get back to you soon.

xf3227 · 2020-12-17T19:37:42Z

In the eval() function, I accidently made the dataloader shuffle the sequences. Thank you for pointing this out. It will also be greatly appreciated that you could help us test the code again to see if the issue has been resolved.

prihoda · 2020-12-17T21:06:50Z

Hi @xf3227, thanks for the quick fix. I am now getting the same result when running one by one as when running the whole file 👍 You can close this issue.

Btw a side note, in terms of usability, I think users might find useful to have some instructions on producing the AHo aligned input files. You could even include a script, since it takes a few steps (running anarci to produce an aligned CSV and then converting that CSV to txt while making sure that the same positions as in your input files are present).

Anarci will only include positions that exist within your processed set of sequences, so here's what I got from the ANARCI CSV on my set of sequences:

QVQLKES-GPGLVAPSQSLSITCTVSG-FSVTN-----YGVHWVRQPPGKGLEWLGVIWA----GGITNYNSAFMSRLSISKDNSKSQVFLKMNSLQIDDTAMYYCASRGGHY-------------------GYALDYWGQGTSVTVSS

I then needed to insert the gaps at the correct positions:

-QVQLKES-GPGLVAPSQSLSITCTVSG-FSVTN-----YGVHWVRQPPGKGLEWLGVIWA----GGITNYNSAFMSRLSISKDNSKSQVFLKMNSLQIDDTAMYYCASRGGHY-------------------GYALDYWGQGTSVTVSS

xf3227 · 2020-12-18T22:13:00Z

Hi @prihoda, thank you for locating this bug. I just closed this thread.

As to sequence alignment, sorry that I was not the guy handling this part, neither am I experienced on using sequence alignment tools. Two possible solutions could be:

Simply remove gaps from all sequences. The model can run under two modes one of which is to handle unaligned sequences, although the performance may be expected to be a bit poorer.
Create user's own training dataset aligned in any specific format.

Of course, thank you for bringing this up! Hope this repo could help with your researches and projects!

xf3227 closed this as completed Dec 18, 2020

tanggis mentioned this issue Feb 16, 2021

Classification model in figure 2b #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AbLSTM.py returns scores in different order #4

AbLSTM.py returns scores in different order #4

prihoda commented Dec 17, 2020

xf3227 commented Dec 17, 2020 •

edited

Loading

xf3227 commented Dec 17, 2020

prihoda commented Dec 17, 2020

xf3227 commented Dec 18, 2020

AbLSTM.py returns scores in different order #4

AbLSTM.py returns scores in different order #4

Comments

prihoda commented Dec 17, 2020

xf3227 commented Dec 17, 2020 • edited Loading

xf3227 commented Dec 17, 2020

prihoda commented Dec 17, 2020

xf3227 commented Dec 18, 2020

xf3227 commented Dec 17, 2020 •

edited

Loading