Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in DIAMOND_analysis_counter.py #57

Open
mweberr opened this issue Oct 16, 2020 · 3 comments
Open

Error in DIAMOND_analysis_counter.py #57

mweberr opened this issue Oct 16, 2020 · 3 comments
Labels
Python Bug or fix related to the Python scripts.

Comments

@mweberr
Copy link

mweberr commented Oct 16, 2020

Hi,
I have compiled a DIAMOND database from the current RefSeq database, but apparently the script DIAMOND_analysis_counter.py get stuck at one line.

Do you have any idea if I need to do any preprocessing of the database before starting DIAMOND_analysis_counter.py

Traceback (most recent call last):
  File "samsa2/python_scripts/DIAMOND_analysis_counter.py", line 151, in <module>
    if split_db_org[1] == "sp.":
IndexError: list index out of range

line 162, in <module>
    db_org = split_db_org[1] + " " + split_db_org[2]
IndexError: list index out of range

Best, Michael

@transcript
Copy link
Owner

Hey Michael,

Could you share the command you're running to call DIAMOND_analysis_counter.py? What are you specifying as inputs?

My guess is that something's funky with the database file you're supplying, and seeing the command may help a bit.

Best,
Sam

@transcript transcript added the Python Bug or fix related to the Python scripts. label Oct 17, 2020
@mweberr
Copy link
Author

mweberr commented Oct 20, 2020

Hi Sam,
I started to debug the run of DIAMOND_analysis_counter.py and apparently it exits with error in the following line

>ADN03191.1 VP4, partial [Rotavirus pig/2B/IRL/2005/P[13]/[22]]

The split to extract the db_org variable needs probably to be extended. I will first check if there are other lines causing similar problems.

@transcript
Copy link
Owner

Ah, yes, the parsing script doesn't do well when there are multiple instances of square brackets in the line. I've noticed that the majority of brackets are used in the function, rather than the organism name, so this section (lines 146-162) are parsing out the organism name by assuming that this is what's in the last set of brackets.

The issue is actually with line 147, where it's selecting 22] as the organism name, as this is what's inside the last set of brackets.

You could try running a command on your database to replace this line with one that uses parentheses instead of brackets, if this is the only database entry where you hit this error - otherwise, this may take some regex work that will be a bit tougher for me to work out. Did you find other lines causing issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python Bug or fix related to the Python scripts.
Projects
None yet
Development

No branches or pull requests

2 participants