-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot allocate vector of size ... #3
Comments
Hi,
Thanks for your interest! It seems to be a memory issue judging from the
error message, but a dataset with 721 features and 172 samples is far from
too big.
The largest intermediate variable produced by the procedure would be about
a 5000 by 5000 covariance matrix, and the memory size of which is less than
0.5Gb.
Could you please check if "abundances(phyloseq))" is a 721 by 172
dataframe, the "meta(phyloseq)" is a dataframe with appropriate dimensions
(the number of rows should be 172), and the "age" variable is not stored as
a factor? (otherwise the covariance matrix could actually be large). I will
think about what we can do next if none of these explains the error.
Thanks,
Huijuan
…On Tue, May 17, 2022 at 11:56 PM ubiminor ***@***.***> wrote:
Dear Huijuan,
I'm happy about this tool for many reasons, first of all because it is
fast, compared to other ones (DESeq2, corncob). However, I am having an
issue with a differential abundance run on 172 samples and 721 SGBs from
MetaPhlAn (each sample sums to 100):
linda(otu.tab = abundances(phyloseq)), meta = meta(phyloseq), formula = "~
age + sex + using_drugs + trait_of_interest, type = "proportion", adaptive
= TRUE)
This outputs: Error: cannot allocate vector of size 9.6 Gb.
Is this really that memory-intensive? Or am I doing something wrong?
The variables values are:
* age: integer
* sex: binary
* using_drugs: binary
* trait_of_interest: integer taking values 1, 2 or 3
—
Reply to this email directly, view it on GitHub
<#3>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALF4E4CKOT64HXDM66BAHXTVKO6TBANCNFSM5WFNPFZQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Dear Huijuan,
Best, |
Hi Giacomo,
I'm glad that it worked and thanks for your insightful feedback.
1. Yes. For LinDA, the proportion data and the count data are pretty
much the same as the data will be CLR (centered log-ratio) transformed
anyway. The only difference is that if the data is count, the sequencing
depth information is available, which we have utilized in LinDA to impute
zeros. If the data is proportion, we use the "half minimum approach" (the
half of the minimum proportion value among samples for a feature) to
replace zeros.
2. In the model that motivates LinDA, the absolute abundance in the
ecosystem is the dependent variable, to which the sequencing depth is not
related. So the number of total reads is not included as a regressor. But
you‘ve made a good point, in some methods (like those employing negative
binomial models), the sequencing depth is a component.
Although the real proportion (the proportion in the ecosystem) is not
related to the sequencing depth, the low sequencing depth would cause
under-sampling especially for the rare features, which means that the
sequencing depth does influence the observed proportions. I believe a
thoughtful procedure is required to address this issue.
Best,
Huijuan
…On Thu, May 19, 2022 at 3:30 PM ubiminor ***@***.***> wrote:
Dear Huijuan,
Thank you for your detailed reply. I guess it was a variable encoding
issue, i don't know.. because when I started from scratch with a cleaner
code it worked, Thank you for this!
Another point is: since metaphlan returns counts summing to 100, these are
proportions, which I made sure they were accounted for by each sample's
proportions sum to 1 and then using type = "proportion" in linda.
1. Is LinDA's approach still valid with this setup, even if it can't
leverage on sequencing depth?
2. Do you think it is a valid approach to include the number of reads
mapped per sample as a covariate, to mimic what LinDA would do internally
with counts?
—
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALF4E4FRABC5WDJ7AIXM37LVKXUY5ANCNFSM5WFNPFZQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thank you for the insight! Is the model's performance (accuracy/power/...) good without imputing zeros with the default method for counts data? Giacomo |
The zero treatment is necessary in LinDA as it involves logarithms. If data
is counts, we provide two zero-handling strategies: add pseudo counts
(e.g., 0.5) to all counts or impute zeros by some sort of ratios between
the sequencing depths. The choice of these two strategies affects the
performance (accuracy/power/...) indeed. We have supplied an
adaptive approach to choose between them.
If data is proportions, we use the "half minimum approach" to replace zeros.
Huijuan
…On Sat, May 21, 2022 at 3:07 AM ubiminor ***@***.***> wrote:
Thank you for the insight! Is the model's performance (accuracy/power/...)
good without imputing zeros with the default method for counts data?
Giacomo
—
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALF4E4BBQF7UGHIZF3JKG4LVK7PHDANCNFSM5WFNPFZQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Dear Huijuan,
I'm happy about this tool for many reasons, first of all because it is fast, compared to other ones (DESeq2, corncob). However, I am having an issue with a differential abundance run on 172 samples and 781 SGBs from MetaPhlAn (each sample sums to 100):
linda(otu.tab = abundances(phyloseq)), meta = meta(phyloseq), formula = "~ age + sex + using_drugs + trait_of_interest", type = "proportion", adaptive = TRUE)
This outputs:
Error: cannot allocate vector of size 9.6 Gb
.Is this really that memory-intensive? Or am I doing something wrong?
The variables values are:
* age: integer
* sex: binary
* using_drugs: binary
* trait_of_interest: integer taking values 1, 2 or 3
The text was updated successfully, but these errors were encountered: