vsearch --usearch_global not showing "full alignment" instead only the segment pair #545

gbbio · 2023-11-23T13:21:27Z

Hello,

If I use usearch_global to search a short primer sequence against a reference (or preferably search the reference against the primer) I don't get the "full alignment" back. For example:

primer query:   ACAGTGACATGGGGACGTAT
reference:       CAGTGACATGGGGACGTAT...

Qry    2 + CAGTGACATGGGGACGTAT 20
           |||||||||||||||||||
Tgt    1 + CAGTGACATGGGGACGTAT 19


19 cols, 19 ids (100.0%), 0 gaps (0.0%)

The first base of the primer is not there and the identity is 100%. With a global alignment I was expecting something like:

Qry    1 + ACAGTGACATGGGGACGTAT 20
            |||||||||||||||||||
Tgt    1 + -CAGTGACATGGGGACGTAT 19


20 cols, 19 ids (95.0%), 0 gaps (0.0%)

Is this intended?

How can I achieve the above result like the second example? I want to align 1 primer sequence against many references and also detect multiple matches per reference. So preferably using --usearch_global where my file with references is the query. And get the alignment as qrow and trow columns.

Many thanks in advance.

The text was updated successfully, but these errors were encountered:

torognes · 2023-11-23T15:23:24Z

Hi!

Yes, this is intended. VSEARCH uses global alignments, but the terminal gap penalties are very low by default, and terminal gaps are not shown in the alignments. It may therefore look like local alignments. This is similar to USEARCH.

By default, the identity between the query and target is calculated as the percentage of matching nucleotides in the aligned region, excluding gaps at the ends (in the terminal regions). If you want to, you can choose to use a different definition of identity, for instance the CD-HIT definition, which is the number of matching columns divided by the shortest sequence length. In your case, when aligning primers, the primer length will almost always be the shortest sequence. I think that will give you the result you want. Use the option --iddef 0 to use this identity definition. In the example above, the result will be 95%.

You can use the userout and userfields options to get a tab-separated text file with the information you may need. For instance, using --userfields query+target+id0+alnlen+qilo+qihi+tilo+tihi+qrow+trow could be useful in this case. Like this:

vsearch --usearch_global primer.fasta --db ref.fasta --alnout alnout.txt --id 0.8 --iddef 0 --userfields query+target+id0+alnlen+qilo+qihi+tilo+tihi+qrow+trow --userout userout.tsv

Please see the manual for more info about these options.

Some remarks: You cannot get multiple matches within the same database sequence for one query, so you may have to reverse the search by having the primers in the database (as you indicate). If you are using short sequences (less than 32 nucleotides) in the database you need to use the --minseqlength 1 option. You will also need to use the --maxaccepts 0 option to find all matches to each query, not just the best match.

I hope this works for you.

gbbio · 2023-11-23T15:55:58Z

Thanks for the reply.

The command I am using now is:

vsearch --usearch_global genomes.fa --db primer --id 0.5 --qmask none --strand both --minwordmatches 0 --wordlength 3 --maxaccepts 0 --maxrejects 0 -maxgaps 15 -mincols 5 -userout {output.userout} -qmask none -userfields query+target+id+alnlen+mism+gaps+qilo+qihi+tilo+tihi+qstrand+qrow+trow+ql+tl

I will check the CD-HIT definition and maybe I can do some parsing myself with the help of the aln or caln column. A tseq or qseq column would maybe also give some more options but I see they are not available.

Thanks again for the reply and thanks for this tool.

frederic-mahe · 2023-12-06T10:48:47Z

Thanks @gbbio I've created a new issue regarding the missing --userfields options (see issue #548).

Also, regression tests added to our test-suite (see frederic-mahe/vsearch-tests@bdb1a50)

Please close the issue if you consider it solved.

torognes self-assigned this Nov 23, 2023

torognes added the question label Nov 23, 2023

frederic-mahe mentioned this issue Dec 6, 2023

missing userfields options #548

Open

gbbio closed this as completed Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vsearch --usearch_global not showing "full alignment" instead only the segment pair #545

vsearch --usearch_global not showing "full alignment" instead only the segment pair #545

gbbio commented Nov 23, 2023

torognes commented Nov 23, 2023

gbbio commented Nov 23, 2023

frederic-mahe commented Dec 6, 2023

vsearch --usearch_global not showing "full alignment" instead only the segment pair #545

vsearch --usearch_global not showing "full alignment" instead only the segment pair #545

Comments

gbbio commented Nov 23, 2023

torognes commented Nov 23, 2023

gbbio commented Nov 23, 2023

frederic-mahe commented Dec 6, 2023