-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correlation matrix [cholesky] bijector #400
Comments
I have some ideas on how to achieve this using a UnitNorm and TransformAxis bijector, if this is still up for grabs? |
@bloops I'm happy if you do it, so I can write about it earlier :-) |
Hi @bloops can you please let me know as soon as there's anything I can test (even if it's still a bit of work in progress)? |
Sure, I will keep you updated. Planning to take a stab at this this
weekend. Thanks!
…On Fri, May 10, 2019, 5:53 AM Sigrid Keydana ***@***.***> wrote:
Hi @bloops <https://github.com/bloops> can you please let me know as soon
as there's anything I can test (even if it's still a bit work in progress)?
I'm very curious to test with my dataset .-)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#400 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAG2DZAZS6ICAPX2QH7XLVTPUVV5JANCNFSM4HLVH2FQ>
.
|
great, thanks :-) |
Hi @skeydan, I have a prototype for this now. This might have bugs and the performance has a lot of room for improvement. So far, some initial testing looks promising. :) The LKJ bijector is combination of two bijectors. The first one maps New BijectorsHalfSphere bijector
TransformRowTriL
Putting it all togetherGenerate fake data with a known correlation matrix.Centered at
Define our simple model's log posterior using
|
Hi Anudhyan, thanks so much for the quick implementation! Unfortunately I have problems testing this locally...
I assume I need to integrate the new files into an actual build instead of sourcing them locally, could that be? (((Unfortunately I also need to figure out how to make my local TF build working again, as it seems like current TF does not work with Anyway, thanks again! Here is the code that throws that error (on actual sampling):
|
Ah, I think I forgot to import numpy in the original code. Did you already try adding this import: |
Yeah, I did... |
Hi @skeydan I have modified your code and added on top of your earlier colab to include the new bijector. (Also, I've fixed a dtype bug in the new bijector.) https://colab.sandbox.google.com/drive/1BTnfyl_XtDHW-GpFawWOsQw4O3Gukukx It runs the chain now. Although i haven't checked whether it mixes well. There were (probably unrelated) problems with getting trace_fn to work correctly so I removed that part in the colab for now. |
Great, thanks!! Can’t test now but I’ll checktomorrow! |
Hi @bloops thanks again. I've also inserted the actual data into the colab if you'd like to test for yourself? (((As an aside, the As to the sampling behavior, I see the following now. The upper row, columns 1 to 4, has the components of Comparing this to how it looked without the bijectors it's visible that the Would you have an idea why this could be? |
Oops I may be comparing apples with oranges here. The non-bijector version used 1000 steps and burnin steps each (as chains were only starting to converge after ~ 400 steps overall), while the bijector one uses 500. I'll re-test using 1000 now. |
Me again. I see that Stan too arrives at a reconstructed correlation of -0.5 - so this should be okay :-) Now comparing 500 steps in Stan against 500 steps with TFP, I see the following pooling behavior: First, these are the empirical estimates and shrunk parameters for intercept and slope, which look ia bit different between TFP and Stan: But on the final outcome scale the results look very similar: Overall I's I'd say this runs well, although I think I'd like to experiment with the number of steps more, and with initial step sizes. Do you think the bijectors could be added to |
Hi Sigrid, that's awesome progress and I'm glad you got it working locally! It's exciting to see this come together :)
Regarding why the reconstructed correlation is -0.5 while the real correlation is -0.7. Could this be because your LKJ concentration prior is too strong? If you set the concentration to 1.0 (or say 1.1) then it should be less strong of a regularizer and the reconstruction should be closer to the real one. I didn't understand what you mean by empirical estimates and shrunk parameters of the slope and intercept. Did you mean the posterior estimates of the One reason for the discrepancy between Stan and TFP could be, that for 500 steps, TFP chains doesn't look like they're mixing well. Especially the correlation matrix ( Although, it is also possible that the new LKJ bijector is not as good as Stan's and that's causing the TFP version to perform worse. Yes, I think I can get this checked into master, pending code reviews etc. It should take a week or so. |
Hi Anudhyan, thanks, that sounds great! I can perform further tests on Wednesday.
I took that from the Stan code this is based on, and Stan too arrives at ~ 0.5... So in that regard the behavior is similar, although I need to do some more testing (for example, on one run, the posterior correlation ended up at ~ -0.35, which again is different...)
Yes, I have that impression too (((somehow it's a bit ambiguous now, partly the results look similar to those in Stan and partly they don't ...))) ... need to do more systematic tests and will try your suggestions...
Yes, I inserted the data in the colab you modified last (originally created by Chris). |
I don't think I can see your edits in the colab. Here is the link to the colab I was using. Perhaps you can copy it and share the link to your edited version? |
Oh, sorry, get it... I saved it to some sandbox repo and added the simulated data there: https://github.com/skeydan/sb/blob/master/jds/cafe_wait_times_with_LKJ_bijector.ipynb (((everything else is unchanged there right now, I made the edits in a text editor))) I also wanted to add the Stan model I'm comparing to here:
|
For a more systematic comparison to the results I'm trying to reproduce, here first are the most relevant characteristics of the above Stan model on the data. Posterior summary statistics (excerpt) The rhos here are reported in terms of correlation (although internally Stan works with the cholesky).
Trace plots (excerpt) I think the chains mix very well already during the burnin phase (grey). Shrinkage on parameter and outcome scales |
Here are the corresponding (in spirit) plots from TFP. Run 1 Posterior summary statistics (excerpt) Rhos here are in cholesky terms. Mean rho as correlation is: -0.2463341.
Traceplots (in different order, starting with rhos, the sigma, sigma_cafe [1:2], b, a) These are from the "real phase" only (not showing the burnin phase) Shrinkage plots The second looks fine I think, and shrinkage is visible in both, but the negative correlation between intercept and slope (a and b) could perhaps be a bit stronger in plot 2. |
Hey, wow!!! I think I can skip the planned experiments!! :-) Look: Mean rho is -0.4283794.
Bottom line, @bloops you're a genius, I'll start writing my post, and I'll publish it as soon as your bijectors are on master :-) |
That's awesome! :)
Great detective work with identifying the problem and fixing the chain! Yes
I will send out for review today and I think we can get it checked in soon.
…On Wed, May 15, 2019, 12:57 AM Sigrid Keydana ***@***.***> wrote:
Hey, wow!!! I think I can skip the planned experiments!! :-)
What I did now was trying again to add in the Exp bijectors for the
sigmas which threw errors in a prior version (don't recall exactly; -
probably this was without your bijectors) - so now this is what I have :-)
Look:
Mean rho is -0.4283794.
key mean sd lower upper ess rhat
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 rho_1 1 0 1 1 NaN NaN
2 rho_2 0 0 0 0 NaN NaN
3 rho_3 -0.551 0.172 -0.818 -0.144 37.9 1.04
4 rho_4 0.809 0.109 0.628 1.000 31.3 1.04
5 sigma 0.474 0.0268 0.426 0.529 440. 1.01
6 sigma_cafe_1 0.970 0.170 0.669 1.32 53.6 1.02
7 sigma_cafe_2 0.592 0.123 0.364 0.813 60.1 1.05
8 b -1.14 0.138 -1.39 -0.856 99.6 1.00
9 a 3.66 0.212 3.27 4.07 62.5 1.02
10 a_cafe_1 4.21 0.206 3.78 4.59 47.0 1.01
[image: image]
<https://user-images.githubusercontent.com/469371/57758253-8876d680-76f7-11e9-8cdb-1238da7c46fb.png>
[image: image]
<https://user-images.githubusercontent.com/469371/57758264-93316b80-76f7-11e9-805c-98b8f21b0c43.png>
[image: image]
<https://user-images.githubusercontent.com/469371/57758278-9c223d00-76f7-11e9-94d0-36450ef67183.png>
Bottom line, @bloops <https://github.com/bloops> you're a genius, I'll
start writing my post, and I'll publish it as soon as your bijectors are on
master :-)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#400?email_source=notifications&email_token=AAG2DZCLX5LTJVVVMHRSUE3PVO65XA5CNFSM4HLVH2F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVN3DXY#issuecomment-492548575>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAG2DZE2VYLIR3SWPGSPAE3PVO65XANCNFSM4HLVH2FQ>
.
|
wonderful :-) |
Although, mean |
Hm, wait ... perhaps there is still room for improvement? https://github.com/skeydan/sb/blob/master/jds/cafe_wait_times_with_LKJ_bijector.ipynb , if you'd like to test) look like this - the diagonal elements are != 1, instead, they add to 1:
|
Ah, that's strange. Just to double check, are the 500 samples of the cholesky-space correlations first matmul-ed (with their transpose) and then you are taking the mean of the correlation matrices? So there would be 500 matmuls for each chain, these can be combined into one batch matmul to make the total time a bit faster. If it's done in the other order, first take mean of cholesky-space matrices and then create the correlation matrix, then such artifacts could show up. |
It's not that (my way is way one, or should be) but there must be something wrong in my calculation - the notebook has the correct correlation!! So my bad, sorry! (Just need to find where my stuff is wrong - I'm doing all the post-processing on the R side, and one has to be super careful with shape manipulations...) back soon... |
Oh, no!!! found the mistake (((won't tell though ;-))) so, all my bad, your bijector is perfect |
Awesome! Thanks and great work! Looking forward to the blog post 🙂 |
Thanks to you!! |
Hi @bloops might I ask when the bijectors will arrive on master? Blog post's ready :-) |
It's currently in review... I am expecting it to land by end of this week! |
great, thanks!! |
This is now checked in. But it is a bit different from the code I had shared earlier. Can you add the wrappers and make this work with your new blog post? |
Thanks @bloops I already saw earlier today, by chance :-) For I-don't-quite-understand-yet-why reasons I can't seem to be able to make a manual build of TFP work together with the most recent TF nightly (when calling from R, specifically, I mean) so I may have to test tomorrow when I can get the nightly builds for TF and TFP both together (also, starting a build of TF now but that takes a while on my laptop). But yeah, very much looking forward to testing, - as long as the functionality is as it was it should work :-) |
Ah, I see. Good luck with the import issues. I don't have much experience with the calling code from R, but perhaps others would be able to help -- let us know! |
Hi @bloops thank you again, but unfortunately now this throws an error for me (which was not there with the 2 original bijectors):
Here is the Python version of the code that throws the error:
In case you would want to reproduce this with the actual data, I hardcoded them here https://github.com/skeydan/sb/blob/master/jds/cafe_wait_times_with_LKJ_bijector.ipynb Here's a more detailed stack trace
|
@bloops just wanted to let you know, I'll be offline for > 20 hours soon so I won't be able to respond immediately if you answer this in the meantime :-) |
I think I found the underlying issue -- I'm working on a fix. Meanwhile, the problem appears to be only in graph mode, so if you remove the @tf.function annotation for now, it should work. |
great, thanks!! I'm leaving in 20 mins so can't test & publish now but then I can perhaps still publish Friday late afternoon this week :-) |
Hi, I've submitted a proposed fix: |
Thank you! I tested and I get the same results / performance as with the 2 bijectors before, so I'm happy :-) ... and I just published: https://blogs.rstudio.com/tensorflow/posts/2019-05-24-varying-slopes/ :-) |
Blog post looks great! Awesome :) |
Thank you!! And thanks for coding the bijector so quickly! It was great working with you :-) |
As discussed on https://groups.google.com/a/tensorflow.org/d/msg/tfprobability/JYNa3_g33qo/asqjrRs0BAAJ it would be nice to have a bijector to go from unconstrained vectors to LKJ distributed correlation matrices. Some guessing that our existing LKJ might already have the right forward transformation implemented, but would need to add the inverse and the log det of deformation.
The text was updated successfully, but these errors were encountered: