Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numeric fixes for extremely large shape parameters #305

Merged
merged 10 commits into from
Aug 1, 2023

Conversation

nspope
Copy link
Contributor

@nspope nspope commented Jul 25, 2023

  • A bit better handling of cancellation error in custom 2F1
  • Use high-precision mpmath.hyp2f1 to match mean and variance as a failsafe
  • Project approximations with large shape parameters (threshold set by max_shape) to a predetermined value of the shape parameter.

The cap on the shape parameter basically says, "this approximation is extremely precise, but we'll only let it get this precise so that the distribution doesn't degenerate". This is akin to setting a minimum variance, except that the shape parameterization is scale invariant. By default, max_shape should be None, because if the shape parameters do get extremely large (e.g. the approximation degenerates) then that tells us that there's some inconsistency in the topology that makes dating difficult. For example, this can occur when there are unary nodes from tsinfer that span nearly the entire tree sequence (e.g. all root nodes would have to be older than these "global" unary nodes) which introduce very unnatural constraints. However, for debugging purposes it's nice to be able to cap the gamma shape so the program doesn't error out.

    * Refactor DLMF15.8.3 to avoid cancellation

    * Match mean and variance as backup
@hyanwong
Copy link
Member

I agree with the max_shape approach, and with the default of None. It should be easy to document that if you get a failure to do with high shape parameters, you should (a) check the "sensibleness" of your tree sequence, esp with regards to lengths of unary regions of nodes and (b) re-run with a max_shape set to something other than None.

If an overly large shape parameter causes the routines to fail in a consistent way, perhaps we can catch that in an except block, and add a message to the effect above?

@nspope nspope marked this pull request as ready for review July 31, 2023 20:07
@nspope
Copy link
Contributor Author

nspope commented Jul 31, 2023

This is good to go @hyanwong

@hyanwong
Copy link
Member

This is good to go @hyanwong

Thanks. LGTM, from what I understand of how it all works. Happy to merge.

@hyanwong
Copy link
Member

P.s. do you think that it is worth mentioning in the error message that you can re-try with max_shape set, but that it is likely that the error is telling you that the tree sequence topology is likely to be faulty?

@nspope
Copy link
Contributor Author

nspope commented Aug 1, 2023

Hmm, with inferred tree sequences there will generally be some nodes that have funky updates and need the failsafe routine. But, there's no need to set max_shape in these cases -- really only when mpmath is invoked frequently and it requires lots of terms to converge. Two options are:

  • Set max_shape to None by default, and set the mpmath precision/maxterms to be on the fast side (e.g. more likely to terminate on problematic updates rather than hang); then catch the error, terminate, and say something about max_shape.
  • Set max_shape to something reasonable (1000), and if it's exceeded by a particular update, print a warning. This shouldn't interfere with simplified tree sequences (max_shape will rarely be hit, and when it is it won't change the answers) or other well-behaved cases. When there's problems, it'll manifest as a wall of warnings rather than an error.

At some point (when we have a control dict implemented) there'll be a lot of flexibility wrt mpmath settings, as well as the homegrown 2F1.

@nspope
Copy link
Contributor Author

nspope commented Aug 1, 2023

I went with the first option-- does the error blurb seem reasonable @hyanwong?

@hyanwong
Copy link
Member

hyanwong commented Aug 1, 2023

SGTM. Merging.

@hyanwong hyanwong merged commit 41c2689 into tskit-dev:main Aug 1, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants