usearch_global match potential match bug #298

Felipealbornoz · 2018-03-22T18:37:37Z

Hi, when I use -usearch_global to "blast" my OTUs against a custom database it does not show the best hit. a particular OTU, is matched against the database with maxaccept 1 and -id 0.97, it matches to SPECIES1 with 98% similarity. but when I use maxaccept 3, the third option is matched to SPECIES2 with 99.5% similarity. however, SPECIES2 never gets selected. I am using the following command:

vsearch -usearch_global OTUS.fasta --db db.fasta --id 0.97 --maxaccepts 1 --dbmatched dbmatched.fasta --notmatched notmatched.fasta --output_no_hits --blast6out otu.tax.csv

torognes · 2018-03-23T11:00:30Z

When you run vsearch with usearch_global, it performs a search using a heuristic algorithm. That means that it is not guaranteed to find the best match, but it usually finds a very good match.

The heuristics involves looking at the number of shared k-mers (8-mers) between the query and each database sequence, and starting with those database sequences that have the highest number of k-mers in common with the query. When you specify --maxaccepts 1 it means that it will stop at the first sequence found that satisfy the similarity threshold set with the --id option (e.g. 97%). If you set a higher --maxaccepts value (e.g. 3) it will look at more (i.e. 3) sequences and report those sequences that satisfy the similarity threshold in order of decreasing similarity.

If the sequence with the highest number of shared k-mers is not the one with the highest alignment similarity you will get a suboptimal result when using --maxaccepts 1. This is probably what happened in the example you provided.

The option --maxrejects is also important as it indicates how many database sequences below the similarity threshold will be considered before the search is stopped. By default it is 32.

To get more accurate results you could use --maxaccepts 1000 --maxrejects 1000, but it will take more time.

You could also use --maxaccepts 0 --maxrejects 0, which will cause vsearch to consider all database sequences. It will take much longer as all the heuristics are bypassed.

I hope this clarifies how vsearch and these options work.

torognes added the question label Mar 23, 2018

torognes closed this as completed Apr 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usearch_global match potential match bug #298

usearch_global match potential match bug #298

Felipealbornoz commented Mar 22, 2018

torognes commented Mar 23, 2018

usearch_global match potential match bug #298

usearch_global match potential match bug #298

Comments

Felipealbornoz commented Mar 22, 2018

torognes commented Mar 23, 2018