Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in DIAMOND_analysis_counter.py #74

Open
McKSal opened this issue Aug 4, 2022 · 1 comment
Open

Error in DIAMOND_analysis_counter.py #74

McKSal opened this issue Aug 4, 2022 · 1 comment

Comments

@McKSal
Copy link

McKSal commented Aug 4, 2022

Hello, I am having issues with DIAMOND_analysis_counter.py script
I am getting a similar error as in this previous post #57

command:
python Diamond_analysis_counter2.py -I BMRNA2_other_nr.daa_viewable -D /media/scratch/2022_diamond_nr_db/nr -O BMRNA2_other_nr_organism

error:
Now reading through the m8 results infile.

Analysis of BMRNA2_other_nr.daa_viewable complete.
Number of total lines: 426637
Number of unique sequences: 422738
Time elapsed: 0.5995767116546631 seconds.

Starting database analysis now.
Traceback (most recent call last):
File "Diamond_analysis_counter2.py", line 151, in
if split_db_org[1] == "sp.":
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "Diamond_analysis_counter2.py", line 157, in
db_org = split_db_org[1]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "Diamond_analysis_counter2.py", line 162, in
db_org = split_db_org[1] + " " + split_db_org[2]
IndexError: list index out of range

From post linked above:
"the parsing script doesn't do well when there are multiple instances of square brackets in the line."

When I go in and look at the line (151) all I see is the string of AA's:
TREFEAFEAGRRYANTAYLVDLQEMQGDNLLRELVRITAQMNWQLNDLKEQIRQGNVISGQQLALTARQYYEKQLGSLEK

@transcript
Copy link
Owner

Hi McKSal,

Sure, let's see if I can help. I might need to ask a couple questions and have you try a couple things.

First, don't worry about the line 151 - that's the line in the Python script that's throwing the error, not the line in the input file.

The error that you're getting is when the script is trying to read in the nr database. There's some line in there that's giving it trouble because it can't seem to split it into the ID, organism, and functional names.

I don't have it print out a count by default of which line in the database causes it to error out; that would be a good item for me to add, since it would provide a bit more debugging information. Do you feel comfortable making a couple small edits to the Python script and then running this again?

If so, you could replace lines 161 and 162 in the DIAMOND_analysis_counter.py script with the following:

if db_org[0].isdigit():
	split_db_org = db_org.split()
	try:
		db_org = split_db_org[1] + " " + split_db_org[2]
	except IndexError:
		print(line)
		print(str(db_line_counter))

When you rerun the script, it will still fail in the same place - but this time it will print out the offending line from the database that's causing the issue, as well as the count of which line in the database this is.

That should give me more information so I can recommend a solution.

-Sam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants