New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug on estimateDispersionsGeneEst when niter is larger than 1 #64
Conversation
Hi Alejandro, Thanks for filling the bug report. I guess I never tested the DESeq2 / glmGamPoi integration with Do I understand correctly that the problem Best, Constantin |
Hi @const-ae, Yes, that's also fixed in my PR by the following line: No rush, and best of luck with your deadline! Best regards, |
Thanks Alejandro and Constantin. I will pull this and send to devel branch for now, as scanning the code it looks like you are correct. I can add a test case at some point for glmGamPoi and niter > 1. |
Hi all, I posted a question relating to this on the Bioconductor support site. I think this fix is providing the # fitMu
[1] "matrix" "array"
num [1:11774, 1:49] 1.834 0.999 3.028 1 0.52 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:11774] "Mrpl15" "Lypla1" "Tcea1" "Atp6v1h" ...
..$ : chr [1:49] "AGGCCGTTCACCTCGT-1" "AGGGATGAGTGCGATG-1" "ATGCGATTCGTGTAGT-1" "CAACCAAAGAGTCGGT-1" ... The time Below is my original post on the support site: Hi, I recently upgraded to latest R 4.3 from R 4.1. While running the DEA workflow on scRNA-seq data, I noticed it was dramatically slower with my latest setup. I wonder if anyone are aware of changes that may have contributed to this, and how I can make the dispersion estimation step to run as fast as before? Using a small dataset with 49 cells as demo (22 vs. 27 in two tissues), here's how I run dds <- DESeqDataSetFromMatrix(countData = counts(sce),
colData = droplevels(colData(sce)[,c("Barcode","tissue")]),
rowData = rowData(sce), design = ~ tissue)
sizeFactors(dds) <- sizeFactors(sce)
dds
system.time({
dds <- DESeq(dds, test = "LRT", useT = TRUE, minmu = 1e-6, minReplicatesForReplace = Inf,
fitType = "glmGamPoi", sfType = "poscounts", reduced = ~1, quiet = FALSE)
}) # dds
class: DESeqDataSet
dim: 17881 49
metadata(1): version
assays(1): counts
rownames(17881): Xkr4 Gm37381 ... CAAA01147332.1 AC149090.1
rowData names(5): ID Symbol Type SEQNAME is_mito
colnames(49): LR03_AGGCCGTTCACCTCGT-1 LR03_AGGGATGAGTGCGATG-1 ... LR04_TTGACTTAGGTGTTAA-1 LR04_TTGACTTGTCTGCGGT-1
colData names(3): Barcode tissue sizeFactor When using
When using
Edit: More digging into the codes themselves, there seems to a change in how In the current version, the whole initial_lp <- vapply(which(fitidx), function(idx){
sum(dnbinom(Counts[idx, ], mu = fitMu, size = 1 / alpha_hat[idx], log = TRUE))
}, FUN.VALUE = 0.0)
last_lp <- vapply(which(fitidx), function(idx){
sum(dnbinom(Counts[idx, ], mu = fitMu, size = 1 / alpha_hat_new[idx], log = TRUE))
}, FUN.VALUE = 0.0) Whereas in the previous version, the per-gene vector initial_lp <- vapply(which(fitidx), function(idx){
sum(dnbinom(Counts[idx, ], mu = fitMu[idx, ], size = 1 / alpha_hat[idx], log = TRUE))
}, FUN.VALUE = 0.0)
last_lp <- vapply(which(fitidx), function(idx){
sum(dnbinom(Counts[idx, ], mu = fitMu[idx, ], size = 1 / alpha_hat_new[idx], log = TRUE))
}, FUN.VALUE = 0.0) Also, why is # New version
dispersion_fits <- glmGamPoi::overdispersion_mle(Counts[fitidx, ], mean = fitMu,
model_matrix = modelMatrix, verbose = ! quiet)
# Old version
dispersion_fits <- glmGamPoi::overdispersion_mle(Counts[fitidx, ], mean = fitMu[fitidx, ],
model_matrix = modelMatrix, verbose = ! quiet) |
From looking this over I agree with @ycl6 that there is a mismatch between the argument provided to @areyesq89 and @const-ae do you agree? https://github.com/thelovelab/DESeq2/blob/devel/R/core.R#L796-L805 I can work on this but not until next week probably. |
I haven't tested this (sorry just overloaded right now) but I think the fix would look something like this: |
Hi @mikelove I tested it and it works fine, albeit with a missing comma ;) |
I think I found a minor bug in estimateDispersionsGeneEst when the parameter niter is set to > 1 and
fitType="glmGamPoi"
.fitMu[idx, ]
inside theelse if (type == "glmGamPoi")
chunk was not necessary asfitMu
already has the correct dimensions specified in the line that generates it (i.e.fit <- fitNbinomGLMs(objectNZ[fitidx,,drop=FALSE], ...
).niter
, whenalpha_hat
andalpha_hat_new
had both zero values, it returned NA values. This seemed to happen when one of the groups has all zero counts and glmGamPoi is used.Hope I'm not chasing my own tail here.
Here is the code that generated the error message: