Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colesky factorization failed #11

Closed
eugenegardner opened this issue May 23, 2022 · 9 comments
Closed

Colesky factorization failed #11

eugenegardner opened this issue May 23, 2022 · 9 comments

Comments

@eugenegardner
Copy link

Hello,

I am having some issues with running STAAR using a pre-computed GRM. When running fit_null_glmmkin using the following:

obj_nullmodel <- fit_null_glmmkin(bmi ~ age + age_squared + sex + wes_batch + PC1 + PC2 + PC3 + 
    PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10, data=data_for_STAAR, id="FID", family=gaussian(link="identity"), kins = sparse_kinship, verbose = T)

Where data_for_STAAR is a simple data.table of covariates listed in the formula for ~410k European-ancestry individuals, I get the following error:

[1] "kins is a sparse matrix."
Fixed-effect coefficients:
  (Intercept)           age   age_squared           sex wes_batch450k  wes_batch50k           PC1           PC2           PC3           PC4           PC5           PC6           PC7           PC8 
-1.7322581420  0.0544036344 -0.0004376397  0.1763296286  0.0160212893  0.0141815840  0.0010930668  0.0001362612  0.0009042656  0.0046527898  0.0026141738  0.0001132027  0.0010270634  0.0006944696 
          PC9          PC10 
-0.0022512038  0.0015808292 

Iteration  1 :
Variance component estimates:
[1] 0.4312874 0.5479367
Fixed-effect coefficients:
 [1] -1.756333e+00  5.511107e-02 -4.426256e-04  1.757527e-01  1.550372e-02  1.367916e-02  1.083162e-03  1.036081e-04  7.883396e-04  4.539559e-03  2.682355e-03  5.708448e-05  1.023603e-03  6.912150e-04
[15] -2.213463e-03  1.590685e-03

Iteration  2 :
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'chol2inv': internal_chm_factor: Cholesky factorization failed
In addition: Warning message:
In .local(x, ...) :
  Cholmod warning 'not positive definite' at file ../Cholesky/t_cholmod_rowfac.c, line 430

This GRM being used is identical to the one provided in Bycroft et al. for the main UK Biobank genetic publication from 2018 (https://doi.org/10.1038/s41586-018-0579-z) that ranges from 0.0442 - 0.5. I have also tried simply multiplying the kinship factor by 2 to bring it in-line with that generated by the kinship2 package, but that failed as well.

Do you have any suggestions/advice to get this working? Oddly I was able to run STAAR perfectly fine with a previous GRM that was calculated using a (poorly optimised) GRM from SAIGE, but I wanted to use a more accurate representation.

Thanks!

@xihaoli
Copy link
Owner

xihaoli commented May 23, 2022

Hi Eugene,
Thanks for your message. This error seems to be caused by some numerical issues when fitting the null model. One quick consideration is to use the fitNullModel() function from the GENESIS package to fit the null model (with the current GRM), and then use the genesis2staar_nullmodel() function from the STAARpipeline package to convert the GENESIS null model object to a STAAR null model object. Both STAAR and GENESIS share the same statistical null model when performing association tests.
Please feel free to let me know if this works.
Best,
Xihao

@eugenegardner
Copy link
Author

Hello Xihao,

Thank you for the quick reply. Unfortunately, when I implemented the null model via GENESIS as suggested:

obj_nullmodel <- fitNullModel(x = data_for_STAAR, outcome = pheno_name, covars = covariates, cov.mat = sparse_kinship, family = "gaussian", rescale = "none")

I get a very similar error:

Computing Variance Component Estimates...
Sigma^2_A     log-lik     RSS
[1]  4.958985e-01  4.958985e-01 -5.903154e+05  1.313380e+00
[1]  9.029733e-01  4.702472e-01 -5.901148e+05  1.061299e+00
Error in .local(x, ...) : 
  internal_chm_factor: Cholesky factorization failed
In addition: Warning message:
In .local(x, ...) :
  Cholmod warning 'not positive definite' at file ../Cholesky/t_cholmod_rowfac.c, line 430

Any suggestions?

I would hope this is very easy to reproduce. I simply acquired the sparse matrix (.rel) from The UK Biobank website, and converted it into a symmetric dsTMatrix from the Matrix package. I then limited it to individuals included in the covariate table (given to 'x' above).

@xihaoli
Copy link
Owner

xihaoli commented May 24, 2022

Hi Eugene,

Thank you very much for letting me know.

In this case, I think you could try some different thresholds of the sparse GRM by gradually increasing the threshold from 0.0442 to something slightly higher and see if the null model would fit. For this route, it would be ideal to keep all pairwise (estimated) relatedness within a cluster, even if they are below the threshold, rather than setting all entries below the threshold to 0 in the sparse GRM. To do this, the fit_null_glmmkin() function from the STAAR package has a parameter thresh to convert an input dense GRM to a sparse GRM as defined above and then fit the null model. Equivalently, you could use the makeSparseMatrix() function from the GENESIS package to directly operate on the GRM before fitting the null model.

Hope this helps!

Best,
Xihao

@eugenegardner
Copy link
Author

Hi Xihao,

I don't think there is a 'thresh' parameter? Did you mean kins_cutoff?

I will try the other method as well.

@eugenegardner
Copy link
Author

eugenegardner commented May 24, 2022

Hi Xihao,

I tried both approaches and still got the same error. Is markSparseMatrix() really the solution given that the matrix is already sparse?

I used a slightly different (external) approach to threshold and it seemed to work:

sparse_kinship@x <- sapply(sparse_kinship@x, function(x) if_else(x < 0.05, 0.05, x))

Where 0.05 is the threshold. This seems a bit like a suboptimal solution since it misrepresents the relatedness between individuals. Did I misunderstand your comment of:

For this route, it would be ideal to keep all pairwise (estimated) relatedness within a cluster

?

@xihaoli
Copy link
Owner

xihaoli commented May 24, 2022

Hi Eugene,

  1. My apologies, the thresh parameter refers to kins_cutoff in the fit_null_glmmkin() function;
  2. Your understanding is exactly correct. The approach that "keeps all pairwise relatedness within a cluster, even if they are below the threshold" is considered to be more optimal than the "direct thresholding" approach, since the latter misrepresents the relatedness between individuals;
  3. The makeSparseMatrix() function implements the "optimal" approach as described above, however, it assumes a dense GRM as input. To do this, you could first convert the sparse GRM back to a dense GRM and use that as the input of either the fit_null_glmmkin() function with different values of kins_cutoff (or the makeSparseMatrix() function, with different values of thresh).

Hope this is more clear.

Best,
Xihao

@eugenegardner
Copy link
Author

Hello Xihao,

I've recently noticed via testing additional phenotypes that this seems very dependent on the case/control imbalance of the phenotype being studied. In a set of ~200,000 individuals, phenotypes with a prevalence of < ~0.2% seem to always run into the Cholesky issue I mentioned in my original post. Just wondering if you have any advice on how to handle this as the thresholding issue now seems suboptimal when it is not possible to know a priori what the threshold should be in the first place.

@xihaoli
Copy link
Owner

xihaoli commented Jun 14, 2022

Hi Eugene,
Thanks for letting me know. We will take a look at this and get back to you.
Best,
Xihao

@hanchenphd
Copy link

Hello Xihao,

I've recently noticed via testing additional phenotypes that this seems very dependent on the case/control imbalance of the phenotype being studied. In a set of ~200,000 individuals, phenotypes with a prevalence of < ~0.2% seem to always run into the Cholesky issue I mentioned in my original post. Just wondering if you have any advice on how to handle this as the thresholding issue now seems suboptimal when it is not possible to know a priori what the threshold should be in the first place.

Did you always see the Cholesky issue in this subset (regardless of which phenotypes), or just sometimes? The kinship coefficients provided by the UK Biobank do not give a positive definite kinship matrix (due to the cut-off at 0.0442), but this may or may not be the issue for you, depending on which subset you were using. If the kinship matrix for your subset is not positive definite, I think there was really just one block of a few hundred individuals that caused the problem. You could drop these individuals to avoid the Cholesky issue. A better solution would be the UK Biobank fixes this block of individuals by filling in values that have been zeroed out (e.g., 0.0441) to make it positive definite, and release a new kinship matrix.

If you only saw this issue sometimes, and got a counter-example that successfully converged in the same subset (with a different or a simulated phenotype), then it was likely a convergence problem related to your case/control phenotypes.

Best,
Han

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants