Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about state_correlation_error variance estimation. #10

Closed
tcbegley opened this issue Jul 23, 2020 · 4 comments
Closed

Question about state_correlation_error variance estimation. #10

tcbegley opened this issue Jul 23, 2020 · 4 comments

Comments

@tcbegley
Copy link

tcbegley commented Jul 23, 2020

Hello again,

I'm a little confused about what's going on here in the find_sigma2_value function. Specifically this line

y <- MASS::mvrnorm(100000, rep(0.5,10), Sigma = cov_matrix(10, par^2, 1) )

why is the correlation set to 1 in cov_matrix? This results in the 10 columns of y being identical, and hence aggregations like apply(y, MARGIN = 2, mean) result in a constant vector, which seems to make the subsequent call to mean redundant. (EDIT - actually while we're on the subject, what's the significance of the choice of 10 here?)

I notice a few lines down when state_correlation_error is created a correlation of 0.9 is used, and similarly for state_correlation_mu_b_walk and state_correlation_mu_b_T. Is there a reason the variance is estimated using a different correlation than is used to create the covariance matrices themselves?

A related question, why is the mean of y set to 0.5? Applying inv.logit to standard normal samples would result in transformed samples with mean 0.5, but on the logit scale does mean 0 make more sense? (I don't think I understand what's going on here properly yet, so I might be wrong, the question is basically just motivated by symmetry 🙂)

@elliottmorris
Copy link
Contributor

If you have some faster code we would love to use it! Paste it here or fork, whatever works.

On the specifics of the function, we just used those defaults to get the function working. The inverse logit transformation shouldn't matter. But if we're introducing bugs do let us know.

@tcbegley
Copy link
Author

Hey @elliottmorris

Thanks for responding. My question wasn't so much about the speed of the code, rather the logic.

My current understanding is that the prior for state by state democratic support is specified as the inverse logit of a multivariate normal. The covariance matrix of the normal distribution is assumed to have a correlation of 0.9 (this line) and we want to choose the right variance on the logit scale so that our prior has the desired variance on the probability scale. Which is what find_sigma2_value is for. (If that's wrong then skip the rest of the question 😅)

Estimating the right variance you seem to be doing with a monte carlo estimate, drawing normal samples on the logit scale for some fixed variance, computing the resulting variance / standard deviation on the probability scale, then optimising the observed error with respect to the variance. The things I didn't understand were:

  • When drawing the normal samples, shouldn't the correlation match the correlation we will ultimately use? I.e. 0.9 rather than 1.
  • Should normal samples be centered at the origin? The transformed samples (after inverse logit) will have a mean of 0.5 in that case, but I can't otherwise see why you would draw samples with mean 0.5
  • What's the significance of 10? I think you have 9 state variables, and maybe originally it was 10 in an earlier model? Is that all it is?
  • Why not calculate standard deviation from the transformed samples, rather than calculating the standard deviation on the samples then transforming it?

I guess some of these could be considered "bugs" if my understanding is correct, but I am reluctant to submit any patches / changes when I don't actually know for sure that I've understood what's going on. Oh and let me know if there's a more appropriate venue for these sorts of questions.

@elliottmorris
Copy link
Contributor

I think some of this could have been bad practice but ultimately the program was providing stable estimates. Anyway it doesn't matter as the function has been removed from the latest code. Thanks for flagging.

@tcbegley
Copy link
Author

tcbegley commented Aug 3, 2020

Ok, thanks for responding, will take a look at the new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants