Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add fixed (definition variable) covariates to umxACE #21

Closed
tbates opened this issue Aug 22, 2017 · 11 comments
Closed

add fixed (definition variable) covariates to umxACE #21

tbates opened this issue Aug 22, 2017 · 11 comments
Assignees
Labels
enhancement top5 marked as an active goal: close before working on other issues
Milestone

Comments

@tbates
Copy link
Owner

tbates commented Aug 22, 2017

Currently, users wanting to use covariates are encouraged to use umx_residualize on their data. This doesn't work for ordinal variables (it's not good turn sex from a binary to a continuous-bimodal distribution...) and also it's nice to have the means in the model, and to retain the raw data.

v 2.0 of umx should support covariates, report how many rows were lost, have the means and covariate betas printed in summary,

@tbates tbates self-assigned this Aug 22, 2017
@tbates tbates added this to the version 2.0 milestone Aug 22, 2017
@tbates tbates modified the milestones: version 2.0, Version 2.5 Mar 20, 2018
@tbates tbates added this to In progress in new models Nov 19, 2018
@tbates tbates added the top5 marked as an active goal: close before working on other issues label Mar 11, 2019
@tbates tbates added this to TODO in incremental features May 11, 2019
@tbates
Copy link
Owner Author

tbates commented Mar 24, 2020

So to do this across all twin models, we need a method that can handle ordinal and continuous data, still handle models with no covariates, and works at the level of xmu_assemble_twin_supermodel and xmu_make_top_twin so that models using these inherit the new capability.

data.definition variables can't be added to model$top (because they involve data that is present only in the data models (i.e., model$MZ).

So... need to add average effects matrices and beta matrices to top.

  • Add betas matrix to top
  • Add data.def matrix to each data group
  • Delete the (currently shared) expMean matrix in top
  • Add an expMean algebra to each data group.
  • Figure out bounds for betas...
  • Implement without mucking binary data (fixed mean and variance with one movable threshold)
  • Wrap as much as possible into handlers
  • ... more side effects to be handled.
  • Figure out a nice reporting method to not fuck up all summary and plots etc, but still make the betas accessible.

@mcneale
Copy link
Collaborator

mcneale commented Mar 24, 2020

Binary variables can be included. The trick is that the means formula contains only the regressions on the covariates, with no grand mean/intercept parameter. In other cases, the mean can be a free parameter (assuming that one uses the Mehta et al trick of fixing two adjacent thresholds).

Yes this would be very nice to have!

@tbates
Copy link
Owner Author

tbates commented Mar 28, 2020

  • create new xmu_make_TwinSuperModel function
  • factor out all the combinations of data (cont, all cont WLS, mix inc. bin, mix inc. ord, etc.) into separate helpers for making top, MZ, DZ vary in the ways they each require
  • merge xmu_assemble_twin_supermodel into new xmu_make_TwinSuperModel

@tbates
Copy link
Owner Author

tbates commented Apr 19, 2020

so...

twinData$cohort1 = twinData$cohort2 =twinData$part
 mzData = twinData[twinData$zygosity %in% "MZFF", ]
 dzData = twinData[twinData$zygosity %in% "DZFF", ]

 m2 = umxACE(selDVs = "ht", selCovs = c("age", "cohort"), sep = "", dzData = dzData, mzData = mzData)

umxSummaryACE(m2,digits=3)

ACE -2 × log(Likelihood) = 5944.831
Standardized solution

a1 c1 e1
ht 0.929 0.083 0.36

Means: Intercept and (raw) betas from model$top$intercept and model$top$meansBetas

ht1 ht2
intercept 16.534 16.534
age -0.005 -0.005
cohort -0.046 -0.046

@tbates tbates closed this as completed Apr 19, 2020
incremental features automation moved this from TODO to Done Apr 19, 2020
new models automation moved this from In progress to Done Apr 19, 2020
@tbates
Copy link
Owner Author

tbates commented Apr 19, 2020

Interesting downside: having def vars in a model increases model run time 20-fold... 4sec ACE -> 90s with def.covariates in the means model. But: All working, and now

  • umxCP
  • umxIP
  • umxACEv can haz selCovs are go

@mcneale
Copy link
Collaborator

mcneale commented Apr 20, 2020

Great that selCovs is working more broadly! It is unsurprising that using definition variables slows things down. Remember that with FIML, each row of the data has its own set of path coefficients (some may be the same across different rows, others may differ on an individual basis). So computationally, the expected covariance matrix has to be rebuilt and inverted for each data row. OpenMx has some economies in doing this, looking at whether the definition variables or the pattern of observed variables differs from the previous row, and not bothering to reconstruct or invert if the result is already known. So the slow down largely depends on the number of unique covariance matrices the algorithm has to invert.

So it seems that covariates with ordinal variables analyzed by FIML is good to go. Of course, there's a limit to the number of variables that can reasonably be jointly analyzed as ordinal, due to the curse of dimensionality. I'd probably not go further than about a dozen total.

@tbates
Copy link
Owner Author

tbates commented Apr 20, 2020

yes, multiple covariates is working for most models and for binary, ordinal, continuous variables and for mixtures of these.

@tbates
Copy link
Owner Author

tbates commented Apr 20, 2020

Speed comment more to consider implementing regression based method under the hood for the all continuous case, or at least note to user that umx_residualize will be many times faster

@mcneale
Copy link
Collaborator

mcneale commented Apr 20, 2020

Yep. I note that it would be possible to residualize the continuous variables and only apply the definition approach to the ordinal ones. residualizeContinuousVars=TRUE or some such argument. In practice this would make the modeling steps faster because there would be fewer parameters to optimize. It would not permit testing of whether different variables' regressions on covariates are equal, although I don't think I've seen such usage. In factor analysis a Rasch model essentially equates factor loadings, but it's not the situation here.

@tbates
Copy link
Owner Author

tbates commented Apr 20, 2020

Yeah: will do that - not always a win, but for the “lots of ord and lots of cont” it would be dealmaker. Good suggestion!

@mcneale
Copy link
Collaborator

mcneale commented Apr 21, 2020

Great. Situations with many continuous and only a few ordinal variables would see the greatest performance improvements. Neuroimaging & diagnostic outcome analyses are good examples of the need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement top5 marked as an active goal: close before working on other issues
Projects
new models
  
Done
Development

No branches or pull requests

2 participants