add fixed (definition variable) covariates to umxACE #21

tbates · 2017-08-22T01:30:22Z

Currently, users wanting to use covariates are encouraged to use umx_residualize on their data. This doesn't work for ordinal variables (it's not good turn sex from a binary to a continuous-bimodal distribution...) and also it's nice to have the means in the model, and to retain the raw data.

v 2.0 of umx should support covariates, report how many rows were lost, have the means and covariate betas printed in summary,

tbates · 2020-03-24T00:08:15Z

So to do this across all twin models, we need a method that can handle ordinal and continuous data, still handle models with no covariates, and works at the level of xmu_assemble_twin_supermodel and xmu_make_top_twin so that models using these inherit the new capability.

data.definition variables can't be added to model$top (because they involve data that is present only in the data models (i.e., model$MZ).

So... need to add average effects matrices and beta matrices to top.

Add betas matrix to top
Add data.def matrix to each data group
Delete the (currently shared) expMean matrix in top
Add an expMean algebra to each data group.
Figure out bounds for betas...
Implement without mucking binary data (fixed mean and variance with one movable threshold)
Wrap as much as possible into handlers
... more side effects to be handled.
Figure out a nice reporting method to not fuck up all summary and plots etc, but still make the betas accessible.

mcneale · 2020-03-24T11:36:20Z

Binary variables can be included. The trick is that the means formula contains only the regressions on the covariates, with no grand mean/intercept parameter. In other cases, the mean can be a free parameter (assuming that one uses the Mehta et al trick of fixing two adjacent thresholds).

Yes this would be very nice to have!

tbates · 2020-03-28T23:33:48Z

create new xmu_make_TwinSuperModel function
factor out all the combinations of data (cont, all cont WLS, mix inc. bin, mix inc. ord, etc.) into separate helpers for making top, MZ, DZ vary in the ways they each require
merge xmu_assemble_twin_supermodel into new xmu_make_TwinSuperModel

tbates · 2020-04-19T02:20:12Z

so...

twinData$cohort1 = twinData$cohort2 =twinData$part
 mzData = twinData[twinData$zygosity %in% "MZFF", ]
 dzData = twinData[twinData$zygosity %in% "DZFF", ]

 m2 = umxACE(selDVs = "ht", selCovs = c("age", "cohort"), sep = "", dzData = dzData, mzData = mzData)

umxSummaryACE(m2,digits=3)

ACE -2 × log(Likelihood) = 5944.831
Standardized solution

	a1	c1	e1
ht	0.929	0.083	0.36

Means: Intercept and (raw) betas from model$top$intercept and model$top$meansBetas

	ht1	ht2
intercept	16.534	16.534
age	-0.005	-0.005
cohort	-0.046	-0.046

tbates · 2020-04-19T20:38:14Z

Interesting downside: having def vars in a model increases model run time 20-fold... 4sec ACE -> 90s with def.covariates in the means model. But: All working, and now

umxCP
umxIP
umxACEv can haz selCovs are go

mcneale · 2020-04-20T13:28:32Z

Great that selCovs is working more broadly! It is unsurprising that using definition variables slows things down. Remember that with FIML, each row of the data has its own set of path coefficients (some may be the same across different rows, others may differ on an individual basis). So computationally, the expected covariance matrix has to be rebuilt and inverted for each data row. OpenMx has some economies in doing this, looking at whether the definition variables or the pattern of observed variables differs from the previous row, and not bothering to reconstruct or invert if the result is already known. So the slow down largely depends on the number of unique covariance matrices the algorithm has to invert.

So it seems that covariates with ordinal variables analyzed by FIML is good to go. Of course, there's a limit to the number of variables that can reasonably be jointly analyzed as ordinal, due to the curse of dimensionality. I'd probably not go further than about a dozen total.

tbates · 2020-04-20T14:01:10Z

yes, multiple covariates is working for most models and for binary, ordinal, continuous variables and for mixtures of these.

tbates · 2020-04-20T15:25:05Z

Speed comment more to consider implementing regression based method under the hood for the all continuous case, or at least note to user that umx_residualize will be many times faster

mcneale · 2020-04-20T15:30:26Z

Yep. I note that it would be possible to residualize the continuous variables and only apply the definition approach to the ordinal ones. residualizeContinuousVars=TRUE or some such argument. In practice this would make the modeling steps faster because there would be fewer parameters to optimize. It would not permit testing of whether different variables' regressions on covariates are equal, although I don't think I've seen such usage. In factor analysis a Rasch model essentially equates factor loadings, but it's not the situation here.

tbates · 2020-04-20T15:33:05Z

Yeah: will do that - not always a win, but for the “lots of ord and lots of cont” it would be dealmaker. Good suggestion!

mcneale · 2020-04-21T14:33:54Z

Great. Situations with many continuous and only a few ordinal variables would see the greatest performance improvements. Neuroimaging & diagnostic outcome analyses are good examples of the need.

tbates self-assigned this Aug 22, 2017

tbates added this to the version 2.0 milestone Aug 22, 2017

tbates added the enhancement label Aug 22, 2017

tbates modified the milestones: version 2.0, Version 2.5 Mar 20, 2018

tbates added this to In progress in new models Nov 19, 2018

tbates added the top5 marked as an active goal: close before working on other issues label Mar 11, 2019

tbates added this to TODO in incremental features May 11, 2019

tbates closed this as completed Apr 19, 2020

incremental features automation moved this from TODO to Done Apr 19, 2020

new models automation moved this from In progress to Done Apr 19, 2020

tbates mentioned this issue Apr 20, 2020

means: Allow residualizeContinuousVars=TRUE #116

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add fixed (definition variable) covariates to umxACE #21

add fixed (definition variable) covariates to umxACE #21

tbates commented Aug 22, 2017

tbates commented Mar 24, 2020 •

edited

Loading

mcneale commented Mar 24, 2020

tbates commented Mar 28, 2020 •

edited

Loading

tbates commented Apr 19, 2020

tbates commented Apr 19, 2020

mcneale commented Apr 20, 2020

tbates commented Apr 20, 2020

tbates commented Apr 20, 2020

mcneale commented Apr 20, 2020

tbates commented Apr 20, 2020

mcneale commented Apr 21, 2020

add fixed (definition variable) covariates to umxACE #21

add fixed (definition variable) covariates to umxACE #21

Comments

tbates commented Aug 22, 2017

tbates commented Mar 24, 2020 • edited Loading

mcneale commented Mar 24, 2020

tbates commented Mar 28, 2020 • edited Loading

tbates commented Apr 19, 2020

tbates commented Apr 19, 2020

mcneale commented Apr 20, 2020

tbates commented Apr 20, 2020

tbates commented Apr 20, 2020

mcneale commented Apr 20, 2020

tbates commented Apr 20, 2020

mcneale commented Apr 21, 2020

tbates commented Mar 24, 2020 •

edited

Loading

tbates commented Mar 28, 2020 •

edited

Loading