Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phyloseq_mult_raref_avg: Perform rarefaction and average relative OTU abundance issue #13

Closed
Salineraptor opened this issue Mar 10, 2020 · 7 comments

Comments

@Salineraptor
Copy link

Hi all

I've an issue with the phyloseq_mult_raref_avg function; it works on this phyloseq object.
phyloseq_summary(ps, cols = NULL, more_stats = FALSE,
+ long = FALSE)

Parameter Phys1

1 Number of samples 108.0000

2 Number of OTUs 7981.0000

3 Total number of reads 4781965.0000

4 Average number of reads per OTU 599.1687

5 Average number of reads per sample 44277.4537

works<-phyloseq_mult_raref_avg(ps,replace = T, SampSize = 10000, iter = 3)
..Multiple rarefaction
|=====================================================================================| 100%
..Sample renaming
..Rarefied data merging
..Splitting by sample
..OTU abundance averaging within rarefaction iterations
|=====================================================================================| 100%
..Re-create phyloseq object

But not this phyloseq object;

phyloseq_summary(p.b.a.m.s.lab, cols = NULL, more_stats = FALSE,
long = FALSE)

Parameter Phys1

1 Number of samples 36.0000

2 Number of OTUs 7981.0000

3 Total number of reads 4781965.0000

4 Average number of reads per OTU 599.1687

5 Average number of reads per sample 132832.3611

fails<-phyloseq_mult_raref_avg(p.b.a.m.s.lab,,replace = T, SampSize = 10000, iter = 3)
..Multiple rarefaction
|=====================================================================================| 100%
..Sample renaming
..Rarefied data merging
..Splitting by sample
Error in validObject(.Object) : invalid class “otu_table” object:
OTU abundance data must have non-zero dimensions.

validotu_table(otu_table(p.b.a.m.s.lab))
[1] TRUE
sum(is.na(otu_table(p.b.a.m.s.lab)))
[1] 0

I've psmelted it etc and all looks good no irregularities. Makes zero sense. phyloseq_mult_raref works on both...

Regards Cameron

@vmikk
Copy link
Owner

vmikk commented Mar 11, 2020

Hello Cameron!

It is difficult to say why it's not working without seeing the data.
I think that it could be because of the small number of reads in some samples (probably after rarefaction). Could you post the results of:

sample_sums(p.b.a.m.s.lab)
taxa_are_rows(p.b.a.m.s.lab)

With best regards,
Vladimir

@Salineraptor
Copy link
Author

Salineraptor commented Mar 12, 2020 via email

@vmikk
Copy link
Owner

vmikk commented Mar 12, 2020

Hello Cameron,

By default, phyloseq_mult_raref does not remove OTUs with zero abundance (trimOTUs = FALSE).
So you may remove these OTUs after the averaging:

prune_taxa(taxa_sums(p.rf) > 0, p.rf)

Please let me know if it works for you.
With best regards,
Vladimir

@Salineraptor
Copy link
Author

Salineraptor commented Mar 12, 2020 via email

@vmikk
Copy link
Owner

vmikk commented Mar 12, 2020

Maybe you can me send me your phyloseq object and I'll take a look why it doesn't work as expected?
Just remove the metadata and anonymize or shuffle the labels.

@Salineraptor
Copy link
Author

Should be there.
Help.zip

@vmikk
Copy link
Owner

vmikk commented Mar 12, 2020

This discrepancy in the number of observed OTUs is due to the large number of OTUs with very small relative abundance (<= 0.054%).
So when you rarefy data multiple times, there is a small probability that rare OTUs will be present in some iterations, but not in the others. After the averaging, abundance of these OTUs will be very small (not zero!), so they will remain in the OTU table.

We can find these OTUs:

# Remove taxonomy table to speed up psmelt
p.rf@tax_table <- NULL
p.rf.1@tax_table <- NULL

# Convert p.rf.1 to relative abundances
p.rf.1 <- transform_sample_counts(p.rf.1, function(x) x / sum(x) )

multr <- psmelt(p.rf)
singr <- psmelt(p.rf.1)

# Compare OTU abundances in p.rf & p.rf.1
compare <- multr
compare$Samp_OTU <- with(compare, interaction(Sample, OTU))
singr$Samp_OTU <- with(singr, interaction(Sample, OTU))
compare$Abundance_R1 <- singr[match(x = compare$Samp_OTU, table = singr$Samp_OTU), "Abundance"]
compare$Abundance_R1[ is.na(compare$Abundance_R1) ] <- 0
compare <- compare[-which(compare$Abundance == 0 & compare$Abundance_R1 == 0), ]


ggplot(data = compare, aes(x = Abundance_R1, y = Abundance)) + geom_point() + 
  labs(x = "Single rarefaction", y = "Averaged across multiple rarefactions")

# Extract OTUs that are missing in single-rarefied data, but present in multiple rarefactions
diffs <- compare[ compare$Abundance_R1 == 0, ]
length(unique(diffs$OTU))

summary(diffs$Abundance)
#      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
# 0.00000100 0.00001800 0.00003400 0.00005179 0.00006525 0.00054000

# Here is a long tail of rare OTUs which were absent in single-rarefied data
ggplot(data = compare, aes(x = Abundance_R1, y = Abundance)) + geom_point() + 
  labs(x = "Single rarefaction", y = "Averaged across multiple rarefactions") +
  ylim(c(0, max(diffs$Abundance)))

Relative abundances of single rarefaction iteration vs averaged across multiple rarefaction iterations:
Raref_compare

Tail with rare OTUs which were absent in single-rarefied data:
Raref_tail

@vmikk vmikk closed this as completed Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants