way too many clusters? #166

Gian77 · 2021-10-29T15:18:37Z

Hello,

I am trying to figure out what would be the best --differences. I have 155 samples in my library, average of 700 bp ITS fragment, medium diversity since it is a mix of soil, roots and leaves.

I tested --differences 1 with --fastidious and o got ~65 thousands clusters. Then I tried --differences 3 and I got ~31 thousands. I think it is still a little too much, what do you think? As a note, using UPARSE I got about ~6000 97% OTUs.

Here's my code

    --differences 3 \
    --usearch-abundance \
    --threads $SLURM_CPUS_PER_TASK \
    --statistics-file clustered_SWARM/clusters_stats_R1.txt \
    --seeds clustered_SWARM/clusters_R1.fasta \
    clustered_SWARM/linear_derep_R1.fasta > /dev/null

Thanks a lot,

G.

The text was updated successfully, but these errors were encountered:

torognes · 2021-11-04T15:25:20Z

I am not sure, but I think UPARSE is quite strict and eliminates clusters that have a low abundance or low quality sequence. It also removes chimeras as far as I know. That may be a reason why it ends up with fewer clusters.

torognes · 2021-11-04T15:34:05Z

If I am not mistaken, the ITS sequences of fungi (if that's what you are studying) often have highly variable length gaps when aligned. That may cause Swarm to split groups more than other algorithms.

frederic-mahe · 2021-11-04T16:20:07Z

@torognes is right. Swarm does only one thing: it makes clusters of sequences; whereas UPARSE also applies aggressive filters to remove rare sequences, low quality sequences, and chimeras.

In my own analyses I use swarm --difference 1 --fastidious, and I apply all these somewhat arbitrary filters after clustering, rather than before. The idea is that applying filters is less harmful once the clusters are defined.

Other filters that can efficiently reduce the number of clusters:

eliminating clusters present in only one technical replicate,
eliminating low-abundant clusters present in only one biological replicate,
eliminating clusters dissimilar to any known reference sequence

Gian77 · 2021-11-04T16:24:49Z

Thank you @torognes and @frederic-mahe

I think it is possible, I do eliminate singletons in UPARSE and it is true that removes chimeras automatically.

I can try to remove chimeras before using SWARM, remove singletons after generating the clusters.
I am not sure about the reference based approach but it may be another more conservative option.

What if I increase --difference to e.g. 3 or more?

Gian

frederic-mahe · 2021-11-05T09:38:00Z

I suggest to eliminate chimeras after clustering and after removing singletons.

Increasing the --difference value will indeed reduce the number of clusters, but it will also reduce the resolution (you will not be able to distinguish taxa with only a few differences in your molecular marker).

In my own projects, using the high resolution --difference 1 has always been a greater advantage. The final number of clusters is an issue only if you are trying to get absolute alpha diversity values (and in that case having a lot of replicates is the prefered solution). For normal comparative diversity studies, the total number of clusters doesn't really matter.

Gian77 · 2021-11-05T18:29:58Z

@frederic-mahe,

thanks a lot for the explanation. I will follow your advice and let you know what I get.

Gian

frederic-mahe · 2021-11-08T12:02:28Z

I am going to close that issue. Please feel free to re-open if need be.

torognes added the question label Nov 4, 2021

frederic-mahe closed this as completed Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

way too many clusters? #166

way too many clusters? #166

Gian77 commented Oct 29, 2021 •

edited

Loading

torognes commented Nov 4, 2021

torognes commented Nov 4, 2021

frederic-mahe commented Nov 4, 2021

Gian77 commented Nov 4, 2021 •

edited

Loading

frederic-mahe commented Nov 5, 2021

Gian77 commented Nov 5, 2021

frederic-mahe commented Nov 8, 2021

way too many clusters? #166

way too many clusters? #166

Comments

Gian77 commented Oct 29, 2021 • edited Loading

torognes commented Nov 4, 2021

torognes commented Nov 4, 2021

frederic-mahe commented Nov 4, 2021

Gian77 commented Nov 4, 2021 • edited Loading

frederic-mahe commented Nov 5, 2021

Gian77 commented Nov 5, 2021

frederic-mahe commented Nov 8, 2021

Gian77 commented Oct 29, 2021 •

edited

Loading

Gian77 commented Nov 4, 2021 •

edited

Loading