Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical issue with random walk for Dirichlet #12

Closed
xukai92 opened this issue Jan 3, 2019 · 11 comments
Closed

Numerical issue with random walk for Dirichlet #12

xukai92 opened this issue Jan 3, 2019 · 11 comments

Comments

@xukai92
Copy link
Member

xukai92 commented Jan 3, 2019

I've noticed a numeric issue when using this package with random walk algorithm. The code below is the minimal example to produce the bug:

using Distributions, Bijectors
dim = 10
dist = Dirichlet(dim, 2.0)
stepsize = 1e10
# Below gives DomainError with high probability
[logpdf_with_trans(dist, invlink(dist, link(dist, rand(dist)) .+ randn(dim) .* stepsize), true) for _ in 1:1_000]

I don't know if a stepsize of 10 is too high or not but it seems to be possible to have this kind of values during adaptation.

@yebai
Copy link
Member

yebai commented Jan 3, 2019

stepsize of 10 is quite big actually, did you look at adapted epsilon in HMC?

@xukai92
Copy link
Member Author

xukai92 commented Jan 3, 2019

Just checked - this happens when the step size is tuned to something like 5.75.

@yebai
Copy link
Member

yebai commented Jan 3, 2019

I think NUTS adaption needs some further improvement. Before that happens, perhaps we can heuristically upper bound step size, e.g. < 2?

Dirichlet distribution is a very nasty one for transforms, I suspect it'll be hard to make it stable in all cases. If it occasionally throws off the edges, we can try to reject a Hamiltonian simulation.

@xukai92
Copy link
Member Author

xukai92 commented Jan 3, 2019

I think NUTS adaption needs some further improvement. Before that happens, perhaps we can heuristically upper bound step size, e.g. < 2?

2 still seems to be too high. 1 works.

Dirichlet distribution is a very nasty one for transforms, I suspect it'll be hard to make it stable in all cases. If it occasionally throws off the edges, we can try to reject a Hamiltonian simulation.

Then we need to do a catch-exception which is slow. Alternatively we can make logpdf_with_trans return -Inf by checking terms inside log before evaluation?

@yebai
Copy link
Member

yebai commented Jan 3, 2019

Then we need to do a catch-exception which is slow. Alternatively, we can make logpdf_with_trans return -Inf by checking terms inside log before evaluation?

Sounds good to me!

@mohamed82008
Copy link
Member

We can just do log(max(0, x)) everywhere there is log(x). The output may then be -Inf or NaN if we divide 2 Infs by each other for some reason. So outside we can check the output with isnan and isinf.

@yebai
Copy link
Member

yebai commented Jan 9, 2019

This sounds good to me. Perhaps give it a try and see whether it can fix TuringLang/Turing.jl#621?

@xukai92
Copy link
Member Author

xukai92 commented Jan 10, 2019

@yebai I will take a try today

@xukai92
Copy link
Member Author

xukai92 commented Jan 11, 2019

@mohamed82008 @yebai great it works

@mohamed82008
Copy link
Member

mohamed82008 commented Jan 12, 2019

Ok the root cause of this error AFAICT is https://github.com/TuringLang/Bijectors.jl/blob/master/src/Bijectors.jl#L221. The StatsFuns.logistic(x) function is way too small even for x = -1e2, and is pretty much 0 for -1e4 and higher. The easiest fix is to offset it by eps. I can try that and see if it replaces the need for the workaround in #13.

@mohamed82008
Copy link
Member

mohamed82008 commented Jan 12, 2019

Ok, so after trying so many things, I came to a realization that we cannot support all of the Eucledian space with the invlink transformation while keeping the inverse relationship between link and invlink. As of #9 , we have been using ϵ to help stay away from the ugly -Inf and Inf of logit. This inevitably means that we are mapping all valid points in the support of the Dirichlet distribution to a finite subspace of the Eucledian space. So then what happens when you try to walk back from a point very far in the Eucledian space to the support of the distribution? The unthinkable. If that point is far beyond the friendly part of the Eucledian space, the logistic function starts yelling at us by returning 0 and 1. This is alarming, because inverting that with logit will give us the ugly -Inf and Inf back. In order words, one fix to the error in this issue is to truncate the output of logistic to be between ϵ and 1 - ϵ. And we have to truncate, not recenter, if we want to keep the inverse relationship between link and invlink valid for the friendly part of the Eucledian space. If we are willing to make the inverse property of link and invlink approximate, then we can recenter the output of logistic using z = z * (1 - 2ϵ) + ϵ. More directly, and I found that it works better in practice, is to truncate all x[k]s to be between 0 and 1. This condition is a direct consequence of sticking to the friendly part of the Eucledian space, but can be violated when using values outside the friendly region. Excuse my layman terms.

I will make a PR to get your feedback.

@yebai yebai closed this as completed in ecb7e94 Jan 15, 2019
yebai added a commit that referenced this issue Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants