phyloseq_mult_raref_avg: Perform rarefaction and average relative OTU abundance issue #13

Salineraptor · 2020-03-10T16:12:23Z

Hi all

I've an issue with the phyloseq_mult_raref_avg function; it works on this phyloseq object.
phyloseq_summary(ps, cols = NULL, more_stats = FALSE,
+ long = FALSE)

Parameter Phys1

1 Number of samples 108.0000

2 Number of OTUs 7981.0000

3 Total number of reads 4781965.0000

4 Average number of reads per OTU 599.1687

5 Average number of reads per sample 44277.4537

works<-phyloseq_mult_raref_avg(ps,replace = T, SampSize = 10000, iter = 3)
..Multiple rarefaction
|=====================================================================================| 100%
..Sample renaming
..Rarefied data merging
..Splitting by sample
..OTU abundance averaging within rarefaction iterations
|=====================================================================================| 100%
..Re-create phyloseq object

But not this phyloseq object;

phyloseq_summary(p.b.a.m.s.lab, cols = NULL, more_stats = FALSE,
long = FALSE)

Parameter Phys1

1 Number of samples 36.0000

2 Number of OTUs 7981.0000

3 Total number of reads 4781965.0000

4 Average number of reads per OTU 599.1687

5 Average number of reads per sample 132832.3611

fails<-phyloseq_mult_raref_avg(p.b.a.m.s.lab,,replace = T, SampSize = 10000, iter = 3)
..Multiple rarefaction
|=====================================================================================| 100%
..Sample renaming
..Rarefied data merging
..Splitting by sample
Error in validObject(.Object) : invalid class “otu_table” object:
OTU abundance data must have non-zero dimensions.

validotu_table(otu_table(p.b.a.m.s.lab))
[1] TRUE
sum(is.na(otu_table(p.b.a.m.s.lab)))
[1] 0

I've psmelted it etc and all looks good no irregularities. Makes zero sense. phyloseq_mult_raref works on both...

Regards Cameron

vmikk · 2020-03-11T09:46:45Z

Hello Cameron!

It is difficult to say why it's not working without seeing the data.
I think that it could be because of the small number of reads in some samples (probably after rarefaction). Could you post the results of:

sample_sums(p.b.a.m.s.lab)
taxa_are_rows(p.b.a.m.s.lab)

With best regards,
Vladimir

Salineraptor · 2020-03-12T03:56:47Z

Hi Vladimir Thank you so much for getting back to me. I do lose two samples after rarefaction functions like https://rdrr.io/github/vmikk/metagMisc/man/phyloseq_mult_raref.html are applied yes. B30 & B31. I lose the same samples if i were to rarefy on the unmerged replicates. So it in theory shouldn't make a difference. # > phyloseq.runs # $`1` # phyloseq-class experiment-level object # otu_table() OTU Table: [ 5460 taxa and 34 samples ] # sample_data() Sample Data: [ 34 samples by 20 sample variables ] # tax_table() Taxonomy Table: [ 5460 taxa by 7 taxonomic ranks ] # phy_tree() Phylogenetic Tree: [ 5460 tips and 5459 internal nodes ] # # $`2` # phyloseq-class experiment-level object # otu_table() OTU Table: [ 5348 taxa and 34 samples ] # sample_data() Sample Data: [ 34 samples by 20 sample variables ] # tax_table() Taxonomy Table: [ 5348 taxa by 7 taxonomic ranks ] # phy_tree() Phylogenetic Tree: [ 5348 tips and 5347 internal nodes ] # ... However i still want an average.. sample_sums(p.b.a.m.s.lab) B1 B2 B3 101185 97908 37062 B4 B5 B6 54592 52062 131014 B7 B8 B9 172053 171042 277328 B10 B11 B12 478035 253444 257516 B13 B14 B15 141324 115771 22225 B16 B17 B18 277429 161581 97891 B19 B20 B21 62709 220285 48663 B22 B23 B24 48705 37352 144526 B25 B26 B27 201021 174101 38829 B28 B29 B30 245844 16787 3236 B31 B32 B33 1802 20673 229191 B34 B35 B36 205454 66717 116608 taxa_are_rows(p.b.a.m.s.lab) So yes this was the problem: the taxa weren't rows so i transposed the OTU table and it works. However, I now have a new discrepancy. Whats the difference here? p.rf<-phyloseq_mult_raref_avg(p.T.O, SampSize = 10000,iter = 100, replace = T) ..Multiple rarefaction |=================================================================| 100% ..Sample renaming ..Rarefied data merging ..Splitting by sample ..OTU abundance averaging within rarefaction iterations |=================================================================| 100% ..Re-create phyloseq object

p.rf

phyloseq-class experiment-level object otu_table() OTU Table: [ 7943 taxa and 34 samples ] sample_data() Sample Data: [ 34 samples by 20 sample variables ] tax_table() Taxonomy Table: [ 7943 taxa by 7 taxonomic ranks ] *AND THIS*

p.rf.1<-rarefy_even_depth(p.T.O, sample.size = 10000,

+ rngseed = T, replace = TRUE, trimOTUs = TRUE, verbose = TRUE) `set.seed(TRUE)` was used to initialize repeatable random subsampling. Please record this for your records so others can reproduce. Try `set.seed(TRUE); .Random.seed` for the full vector ... 2 samples removedbecause they contained fewer reads than `sample.size`. Up to first five removed samples are: B30 B31 ... 2521OTUs were removed because they are no longer present in any sample after random subsampling ...

p.rf.1

phyloseq-class experiment-level object otu_table() OTU Table: [ 5460 taxa and 34 samples ] sample_data() Sample Data: [ 34 samples by 20 sample variables ] tax_table() Taxonomy Table: [ 5460 taxa by 7 taxonomic ranks ] Why are the outputs so different ? They are doing the same thing no ? Regards Cameron

…

On Wed, 11 Mar 2020 at 17:47, Vladimir Mikryukov ***@***.***> wrote: Hello Cameron! It is difficult to say why it's not working without seeing the data. I think that it could be because of the small number of reads in some samples (probably after rarefaction). Could you post the results of: sample_sums(p.b.a.m.s.lab) taxa_are_rows(p.b.a.m.s.lab) With best regards, Vladimir — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#13?email_source=notifications&email_token=AOZHHWAX5I4HFMJ4GVUYDSDRG5MZHA5CNFSM4LFCTKR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOO3CIA#issuecomment-597537056>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOZHHWEP7LA3QIANFRNPN63RG5MZHANCNFSM4LFCTKRQ> .

vmikk · 2020-03-12T04:29:12Z

Hello Cameron,

By default, phyloseq_mult_raref does not remove OTUs with zero abundance (trimOTUs = FALSE).
So you may remove these OTUs after the averaging:

prune_taxa(taxa_sums(p.rf) > 0, p.rf)

Please let me know if it works for you.
With best regards,
Vladimir

Salineraptor · 2020-03-12T05:13:25Z

Hi Vladimir That was my theory. Thanks for the quick response. But i still get this; p.rf<-phyloseq_mult_raref_avg(p.T.O, SampSize = 10000, MinSizeTreshold = 10000, iter = 100, replace = T) # phyloseq-class experiment-level object # otu_table() OTU Table: [ 7943 taxa and 34 samples ] # sample_data() Sample Data: [ 34 samples by 20 sample variables ] # tax_table() Taxonomy Table: [ 7943 taxa by 7 taxonomic ranks ] p.rf.correct<-prune_taxa(taxa_sums(p.rf) > 0, p.rf) # phyloseq-class experiment-level object # otu_table() OTU Table: [ 7943 taxa and 34 samples ] # sample_data() Sample Data: [ 34 samples by 20 sample variables ] # tax_table() Taxonomy Table: [ 7943 taxa by 7 taxonomic ranks ] p.rf.1<-rarefy_even_depth(p.T.O, sample.size = 10000, rngseed = T, replace = TRUE, trimOTUs = T, verbose = TRUE) # phyloseq-class experiment-level object # otu_table() OTU Table: [ 5460 taxa and 34 samples ] # sample_data() Sample Data: [ 34 samples by 20 sample variables ] # tax_table() Taxonomy Table: [ 5460 taxa by 7 taxonomic ranks ] p.rf.1.correct<-prune_taxa(taxa_sums(p.rf.1) > 0, p.rf.1) # phyloseq-class experiment-level object # otu_table() OTU Table: [ 5460 taxa and 34 samples ] # sample_data() Sample Data: [ 34 samples by 20 sample variables ] # tax_table() Taxonomy Table: [ 5460 taxa by 7 taxonomic ranks ] NB > p.T.O phyloseq-class experiment-level object otu_table() OTU Table: [ 7981 taxa and 36 samples ] sample_data() Sample Data: [ 36 samples by 20 sample variables ] tax_table() Taxonomy Table: [ 7981 taxa by 7 taxonomic ranks ] Regards Cameron

…

On Thu, 12 Mar 2020 at 12:29, Vladimir Mikryukov ***@***.***> wrote: Hello Cameron, By default, phyloseq_mult_raref does not remove OTUs with zero abundance (trimOTUs = FALSE). So you may remove these OTUs after the averaging: prune_taxa(taxa_sums(p.rf) > 0, p.rf) Please let me know if it works for you. With best regards, Vladimir — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOZHHWCPTEX2KDGJZIRP7M3RHBQKHANCNFSM4LFCTKRQ> .

vmikk · 2020-03-12T05:24:20Z

Maybe you can me send me your phyloseq object and I'll take a look why it doesn't work as expected?
Just remove the metadata and anonymize or shuffle the labels.

Salineraptor · 2020-03-12T06:00:07Z

Should be there.
Help.zip

vmikk · 2020-03-12T07:22:06Z

This discrepancy in the number of observed OTUs is due to the large number of OTUs with very small relative abundance (<= 0.054%).
So when you rarefy data multiple times, there is a small probability that rare OTUs will be present in some iterations, but not in the others. After the averaging, abundance of these OTUs will be very small (not zero!), so they will remain in the OTU table.

We can find these OTUs:

# Remove taxonomy table to speed up psmelt
p.rf@tax_table <- NULL
p.rf.1@tax_table <- NULL

# Convert p.rf.1 to relative abundances
p.rf.1 <- transform_sample_counts(p.rf.1, function(x) x / sum(x) )

multr <- psmelt(p.rf)
singr <- psmelt(p.rf.1)

# Compare OTU abundances in p.rf & p.rf.1
compare <- multr
compare$Samp_OTU <- with(compare, interaction(Sample, OTU))
singr$Samp_OTU <- with(singr, interaction(Sample, OTU))
compare$Abundance_R1 <- singr[match(x = compare$Samp_OTU, table = singr$Samp_OTU), "Abundance"]
compare$Abundance_R1[ is.na(compare$Abundance_R1) ] <- 0
compare <- compare[-which(compare$Abundance == 0 & compare$Abundance_R1 == 0), ]


ggplot(data = compare, aes(x = Abundance_R1, y = Abundance)) + geom_point() + 
  labs(x = "Single rarefaction", y = "Averaged across multiple rarefactions")

# Extract OTUs that are missing in single-rarefied data, but present in multiple rarefactions
diffs <- compare[ compare$Abundance_R1 == 0, ]
length(unique(diffs$OTU))

summary(diffs$Abundance)
#      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
# 0.00000100 0.00001800 0.00003400 0.00005179 0.00006525 0.00054000

# Here is a long tail of rare OTUs which were absent in single-rarefied data
ggplot(data = compare, aes(x = Abundance_R1, y = Abundance)) + geom_point() + 
  labs(x = "Single rarefaction", y = "Averaged across multiple rarefactions") +
  ylim(c(0, max(diffs$Abundance)))

Relative abundances of single rarefaction iteration vs averaged across multiple rarefaction iterations:

Tail with rare OTUs which were absent in single-rarefied data:

vmikk closed this as completed Mar 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phyloseq_mult_raref_avg: Perform rarefaction and average relative OTU abundance issue #13

phyloseq_mult_raref_avg: Perform rarefaction and average relative OTU abundance issue #13

Salineraptor commented Mar 10, 2020

vmikk commented Mar 11, 2020

Salineraptor commented Mar 12, 2020 via email

vmikk commented Mar 12, 2020

Salineraptor commented Mar 12, 2020 via email

vmikk commented Mar 12, 2020

Salineraptor commented Mar 12, 2020

vmikk commented Mar 12, 2020

phyloseq_mult_raref_avg: Perform rarefaction and average relative OTU abundance issue #13

phyloseq_mult_raref_avg: Perform rarefaction and average relative OTU abundance issue #13

Comments

Salineraptor commented Mar 10, 2020

Parameter Phys1

1 Number of samples 108.0000

2 Number of OTUs 7981.0000

3 Total number of reads 4781965.0000

4 Average number of reads per OTU 599.1687

5 Average number of reads per sample 44277.4537

Parameter Phys1

1 Number of samples 36.0000

2 Number of OTUs 7981.0000

3 Total number of reads 4781965.0000

4 Average number of reads per OTU 599.1687

5 Average number of reads per sample 132832.3611

vmikk commented Mar 11, 2020

Salineraptor commented Mar 12, 2020 via email

vmikk commented Mar 12, 2020

Salineraptor commented Mar 12, 2020 via email

vmikk commented Mar 12, 2020

Salineraptor commented Mar 12, 2020

vmikk commented Mar 12, 2020