-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
losing sequences with vsearch --derep_prefix
#270
Comments
I either found a bug in this functionality in vsearch, or I don't understand what it's supposed to be doing: torognes/vsearch#270
@gregcaporaso I think this is the same bug as #201, which is fixed in newer versions. When using the newest version on bioconda, here is my output:
Not sure why |
Thanks for reporting this bug, @gregcaporaso. @colinbrislawn is right, this was an earlier bug that was fixed in version 2.1.1. It failed to output the H-lines to the UC file when clustering. |
Ok, thank you both for the input! |
Identical sequences are sorted by decreasing abundance and label increasing alpha-numerical order. Here there are no abundance value, and the label |
Ah, that's how it works! I love how vsearch uses rounds of sorting to produce consistent, stable results. |
I may be misunderstanding what
vsearch --derep_prefix
does exactly, but it seems to me that I'm losing some sequences when trying to dereplicate with this command.Here is my input file (
seqs.fna
):Here is the command and stdout:
And here is my output:
In this case,
s2_2
is a prefix of sequencessample1_1
,sample1_2
, ands2_1
, which are all identical. Shouldn'ts2_2
,sample1_1
andsample1_2
be inout.uc
?Thanks for the help!
The text was updated successfully, but these errors were encountered: