glmGamPoi fit fails if normMatrix is provided #29

mschubert · 2020-11-19T22:44:04Z

Looking at the glmGamPoi test case:

dds <- makeExampleDESeqDataSet(n=100, m=8)
dds$group <- factor(rep(1:2,times=4))
design(dds) <- ~group + condition
dds <- DESeq(dds, test="LRT", reduced=~1, fitType="glmGamPoi")

This also works when I call estimateSizeFactors and estimateDispersions manually:

dds <- makeExampleDESeqDataSet(n=100, m=8)
dds$group <- factor(rep(1:2,times=4))
design(dds) <- ~group + condition
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds, fitType="glmGamPoi")
dds <- DESeq(dds, test="LRT", reduced=~1, fitType="glmGamPoi")

However, when I supply the normMatrix argument the subsequent glmGamPoi fails:

dds <- makeExampleDESeqDataSet(n=100, m=8)
nm = matrix(2, nrow=nrow(dds), ncol=ncol(dds))
dds$group <- factor(rep(1:2,times=4))
design(dds) <- ~group + condition
dds <- estimateSizeFactors(dds, normMatrix=nm)
dds <- estimateDispersions(dds, fitType="glmGamPoi")
dds <- nbinomLRT(dds, reduced=~1, type="glmGamPoi")
# Error in combine_size_factors_and_offset(offset, size_factors, Y, verbose = verbose) :
#   is.numeric(size_factors) && (length(size_factors) == 1 || length(size_factors) ==  .... is not TRUE
#dds <- DESeq(dds, test="LRT", reduced=~1, fitType="glmGamPoi") # same if used instead of nbinomLRT

Is it possible to use normMatrix with glmGamPoi?

The reason why I want to do that is that I'm looking at single-cell gene expression changes with chromosome aneuploidies.

(Filing this here instead of Bioc support because I think it's a bug.)

The text was updated successfully, but these errors were encountered:

const-ae · 2020-11-20T10:09:22Z

Hi Michael,

thanks for filing a bug report. I wasn't aware of the normMatrix argument and that's why I missed it, when I integrated glmGamPoi into DESeq2.

Is it possible to use normMatrix with glmGamPoi?

In theory yes. glmGamPoi::glm_gp() has an offset parameter which works as the log(normalizationFactors(dds)). So if you are willing to use glmGamPoi directly, this would be an option.

@mikelove, for fixing this bug. I think the solution would be to check here if is.null(normalizationFactors(objectNZ)) and only if true get sizeFactors(objectNZ), right? I could draft a PR to fix the problem, if you want.

Best,
Constantin

mschubert · 2020-11-20T11:11:27Z

Hi Constantin,

Thanks a lot for your answer!

It would be great if this worked from within DESeq2, then I can use an API I'm already familiar with

mikelove · 2020-11-20T11:42:11Z

Thanks Michael and Constantin, I took a shot at addressing this here: 40d3cba

Let me know if that makes sense on your end(s).

mikelove · 2020-11-20T11:55:02Z

You'll actually need to use 1.31.3 : bc03212

@const-ae pointed out that I needed to add size_factors = FALSE to make it work correctly.

mschubert · 2020-11-20T15:39:41Z

Wow, that was quick! Thank you so much.

I'm running a large test case now and will let you know if there are any issues. So far it's looking good

mschubert · 2020-11-23T14:38:20Z

I have now run this for 45 treatment vs. 2947 control cells, and the normMatrix parameter behaves es expected: for instance, I got a log2FoldChange of -0.289 with and 0.267 without normMatrix, which is consistent with a normalization value of 3 vs. 2 for that gene 👍

However, I'm also confused that in contrast to the test case above, I get NA for all stat fields (maybe because many p-values are 0?) And it would be great for me if I had an estimate for the lfcSE, which is always NA including in the test case above.

const-ae · 2020-11-23T14:57:47Z

I have now run this for 45 treatment vs. 2947 control cells, and the normMatrix parameter behaves es expected: for instance, I got a log2FoldChange of -0.289 with and 0.267 without normMatrix, which is consistent with a normalization value of 3 vs. 2 for that gene 👍

That is great to hear :)

It is common that you get very small p-values for a single cell analysis, because you have so many cells. However, keep in mind that cells from the same sample are not independent replicates, so you should be careful how you interpret the results.

I get NA for all stat fields

That is weird. Can you say a bit more how you exactly do the test? Or provide some example data or subset of the data, so that I can reproduce the problem?

And it would be great for me if I had an estimate for the lfcSE, which is always NA including in the test case above.

Indeed, glmGamPoi doesn't store the standard error of the coefficients, that's why I fill the lfcSE column with NA's in DESeq2 (https://github.com/mikelove/DESeq2/blob/master/R/core.R#L1909). The reason that I don't store the standard error of the coefficients in glmGamPoi is that I don't need for the likelihood-ratio test. Could you say a bit more for what you use the lfcSE?

mschubert · 2020-11-23T16:25:50Z

It is common that you get very small p-values

In this case, I was surprised at the difference between the "normal" DESeq2 LRT (smallest p-values around 1e-30) vs. the glmGamPoi (300 genes < 1e-300)

I get NA for all stat fields

That was my mistake. I tried to calculate the Wald statistic from the log2FoldChange / lfcSE, which of course didn't work without lfcSE (where I use the LRT results for e.g. MA/volcano plots, but the Wald stat to compare 2 samples on x/y axis)

Could you say a bit more for what you use the lfcSE?

One of the analyses I do downstream of the DE is to model whether fold changes have a discontinuity along the genome (e.g. due to an amplification). The "raw" log2FCs are too noisy, in order to get anything out I need to weigh them by their SE when using them e.g. in a change point analysis.

const-ae · 2020-11-23T16:40:54Z

In this case, I was surprised at the difference between the "normal" DESeq2 LRT (smallest p-values around 1e-30) vs. the glmGamPoi (300 genes < 1e-300)

There can be several reasons for the discrepancy:

the minmu argument can influence how low the estimate for a particular group gets
I previous observed that the kind of test (glmGamPoi=quasi-likelihood ratio vs DESeq2=likelihood ratio) influences the result (see attached picture which is Suppl. Fig 7 from our preprint).
It's not that glmGamPoi will always calculate smaller p-values, but glmGamPoi and DESeq2 use different tests and this can have a large effect on the log-scale (e.g. p-value = 1e-30 vs 1e-300), because with so many replicates the exact behavior of the tail of the distribution matters greatly.
there was a bug in a recent version of glmGamPoi which meant that it sometimes converged to bad coefficient estimates. This in turn caused an unreasonable large deviance and thus very small p-values. I did fix it in version 1.3.1, which you can install from GitHub.

const-ae · 2020-11-23T16:50:23Z

I get NA for all stat fields

That was my mistake. I tried to calculate the Wald statistic from the log2FoldChange / lfcSE, which of course didn't work without lfcSE (where I use the LRT results for e.g. MA/volcano plots, but the Wald stat to compare 2 samples on x/y axis)

Okay that is reassuring to hear :)

One of the analyses I do downstream of the DE is to model whether fold changes have a discontinuity along the genome (e.g. due to an amplification). The "raw" log2FCs are too noisy, in order to get anything out I need to weigh them by their SE when using them e.g. in a change point analysis.

Oh, that is very interesting and a cool use-case I haven't considered so far. When you say, that you weight the log-fold changes by the standard error, do you do something like lfc / se, i.e. use the test-statistic? In that case, could you try to plug-in the test-statistic that glmGamPoi calculates? (You would still need to incorporate the sign of the LFC. It's just an idea, not sure it would work.)

mschubert · 2020-11-23T20:00:47Z

There can be several reasons for the discrepancy

Thanks for the linked figure! It seems to be a DESeq/glmGamPoi difference, no change with setting minmu or or updating glmGamPoi.

When you say, that you weight the log-fold changes by the standard error, do you do something like lfc / se, i.e. use the test-statistic?

No, I want to keep them separate to be able to call break points for different ploidies. In the case of the ecp package, their change point detection takes a weights parameter

In that case, could you try to plug-in the test-statistic that glmGamPoi calculates?

I don't think so. If the FCs are not changing, I still want to separate strong evidence for no change (lfc=0, low se, low stat) vs. weak evidence for no change (lfc=0, high se, low stat) (unless I misunderstood and the LRT stat is not testing H0)

const-ae · 2020-11-24T11:05:15Z

I opened an issue at glmGamPoi about the standard error estimates.

I am open to add a way to calculate the standard error and provide those estimates, but I would need to spend more time on this, than I currently can spare. I would, however, be open for a PR or any other input, but let's continue the discussion at const-ae/glmGamPoi#12

mschubert · 2020-11-25T11:49:13Z

Coming back on-topic: It seems I was too quick to judge the normMatrix argument as working. The log2FoldChange seems correct, as I noted before, e.g. if you look at the correlation between a DESeq run where I do not set it vs. one where I set it to 1 everywhere (maybe with a small deviation on very large negative lfcs):

However, when I look at the correlation of the pvalue, I get the following:

This doesn't look right.

The plots are from my own data, but you also see the discrepancy when plotting it for the normMatrix test case:

library(DESeq2)
makeExampleDESeqDataSet = DESeq2:::makeExampleDESeqDataSet

dds <- makeExampleDESeqDataSet(n=100, m=8)
dds$group <- factor(rep(1:2,times=4))
design(dds) <- ~group + condition

res1 <- DESeq(dds, test="LRT", reduced=~1, fitType="glmGamPoi")

nm = matrix(2, nrow=nrow(dds), ncol=ncol(dds))
dds <- estimateSizeFactors(dds, normMatrix=nm)
dds <- estimateDispersions(dds, fitType="glmGamPoi")
res2 <- nbinomLRT(dds, reduced=~1, type="glmGamPoi")

plot(log10(results(res1)$pvalue), log10(results(res2)$pvalue))

const-ae · 2020-11-25T12:54:40Z

Okay, I can reproduce the problem. Here is a bit shorter code that shows that the problem

library(glmGamPoi)
Y <- matrix(rnbinom(n = 30 * 10, mu = 4, size = 0.3), nrow = 30, ncol  =10)
annot <- data.frame(group = sample(c("A", "B"), size = 10, replace = TRUE),
                    cont1 = rnorm(10), cont2 = rnorm(10, mean = 30))
design <- model.matrix(~ group + cont1 + cont2, data = annot)
reduced_des <- model.matrix(~ group + cont1, data = annot)

fit1 <- glm_gp(Y, design = design)
offset <- matrix(log(fit1$size_factors), byrow = TRUE, nrow = nrow(Y), ncol = ncol(Y))
fit2 <- glm_gp(Y, design = design, offset = offset, size_factors = FALSE)

# The size factor is at the 6th position
testthat::expect_equal(fit1[-6], fit2[-6])

res1 <- test_de(fit1, reduced_design = reduced_des)
res2 <- test_de(fit2, reduced_design = reduced_des)

testthat::expect_equal(res1, res2)
#> Error: `res1` not equal to `res2`.
#> Component "pval": Mean relative difference: 0.5928957
#> Component "adj_pval": Mean relative difference: 0.4650072
#> Component "f_statistic": Mean relative difference: 0.9238129

^{Created on 2020-11-25 by the reprex package (v0.3.0.9001)}

The problem is that I currently, ignore the offset when I do the fit with the reduced design

const-ae · 2020-11-25T14:00:19Z

I fixed the issue in const-ae/glmGamPoi@0deacb3#diff-e923364f1bda63722e222d7080a9a0dd30a5cd77712563c71b0d18eea3ffa239R230

To test the new version, you will need to install glmGamPoi version 1.3.4

library(DESeq2)
dds <- DESeq2:::makeExampleDESeqDataSet(n=100, m=8)
dds$group <- factor(rep(1:2,times=4))
design(dds) <- ~group + condition

res1 <- DESeq(dds, test="LRT", reduced=~1, fitType="glmGamPoi")

nm = matrix(2, nrow=nrow(dds), ncol=ncol(dds))
dds <- estimateSizeFactors(dds, normMatrix=nm)
dds <- estimateDispersions(dds, fitType="glmGamPoi")


plot(log10(results(res1)$pvalue), log10(results(res2)$pvalue)); abline(0,1)

^{Created on 2020-11-25 by the reprex package (v0.3.0.9001)}

Thank you again @mschubert for your effort to track down the bug :)

mikelove · 2020-11-25T14:09:06Z

Thanks Michael and Constantin!

mikelove · 2020-12-02T00:09:01Z

I'm going to close this for now, but please feel free to re-open as needed.

const-ae mentioned this issue Nov 24, 2020

Return standard error of coefficient estimates const-ae/glmGamPoi#12

Closed

mschubert mentioned this issue Nov 25, 2020

P-values returned for low read counts sometimes seem too significant const-ae/glmGamPoi#13

Closed

mikelove closed this as completed Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glmGamPoi fit fails if normMatrix is provided #29

glmGamPoi fit fails if normMatrix is provided #29

mschubert commented Nov 19, 2020

const-ae commented Nov 20, 2020 •

edited

Loading

mschubert commented Nov 20, 2020

mikelove commented Nov 20, 2020

mikelove commented Nov 20, 2020

mschubert commented Nov 20, 2020

mschubert commented Nov 23, 2020

const-ae commented Nov 23, 2020

mschubert commented Nov 23, 2020

const-ae commented Nov 23, 2020

const-ae commented Nov 23, 2020

mschubert commented Nov 23, 2020

const-ae commented Nov 24, 2020

mschubert commented Nov 25, 2020

const-ae commented Nov 25, 2020

const-ae commented Nov 25, 2020

mikelove commented Nov 25, 2020

mikelove commented Dec 2, 2020

glmGamPoi fit fails if normMatrix is provided #29

glmGamPoi fit fails if normMatrix is provided #29

Comments

mschubert commented Nov 19, 2020

const-ae commented Nov 20, 2020 • edited Loading

mschubert commented Nov 20, 2020

mikelove commented Nov 20, 2020

mikelove commented Nov 20, 2020

mschubert commented Nov 20, 2020

mschubert commented Nov 23, 2020

const-ae commented Nov 23, 2020

mschubert commented Nov 23, 2020

const-ae commented Nov 23, 2020

const-ae commented Nov 23, 2020

mschubert commented Nov 23, 2020

const-ae commented Nov 24, 2020

mschubert commented Nov 25, 2020

const-ae commented Nov 25, 2020

const-ae commented Nov 25, 2020

mikelove commented Nov 25, 2020

mikelove commented Dec 2, 2020

const-ae commented Nov 20, 2020 •

edited

Loading