Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BENCH: stats.unuran: Write some benchmarks for all UNU.RAN methods #10

Closed
wants to merge 41 commits into from

Conversation

tirthasheshpatel
Copy link
Owner

@tirthasheshpatel tirthasheshpatel commented Jul 15, 2021

What does this implement/fix?

@chrisb83 Let's track the progress and results of benchmarks here. I have merged the gsoc-unuran branch and wrote benchmarks for some of the slowest continuous distributions in SciPy. Also I have changed both the x and y-axis to use the log scale.

cc @mckib2

Here are the results:

Beta Distribution:

beta(2, 3)

Gamma Distribution:

gamma(2)

Gauss Hypergeometric Distribution:

gausshyper

Generalized Exponential Distribution:

genexpon

Generalized Normal Distribution:

gennorm(3)

Inverse Gamma Distribution:

invgamma(4 066899613699307)

KS-One Distribution:

ksone(1000)

KS-Two Distribution:

kstwo(10)

Nakagami Distribution:

nakagami(4 967379486666624)

Normal Distribution:

norm(0, 1)

Studentized Range Distribution:

studentized_range(3, 10)

This commit refactors the entire UNU.RAN wrapper to use `MessageStream`
to report errors instead of non-local returns. One of the downside of this
approach is that UNU.RAN uses a global thread-unsafe file streams which
makes error reporting thread unsafe. Hence, a module-level lock is used
to hold the file stream to report the errors to and achieve thread-safety.
This commit adds UNU.RAN source tree to scipy.stats with Makefiles and configuration
files removed. The "uniform" directory has also been removed as it contains third-party
uniform random number generator which we aren't allowed to use. Other files have been
kept intact.
This commit removes all the deprecated files from UNU.RAN source tree.
Declarations of the deprecated functions have been removed from `unuran.h`.
This commit replaces all the `.ch` files with `.h` files as SciPy's build system
doesn't understand `.ch` files. The C files including these `.ch` have been edited
to now include the coressponding `.h` file.
…as default

This commit refactors functions `unur_get_default_urng` and `unur_get_default_urng_aux` to
allow NumPy BitGenerator as the default URNG. Previously, UNU.RAN defined a macro to get its
own URNG from the `uniform` directory but as it is now removed, that macro will fail with an
error. Hence, the macro (present in `unuran_config.h`) has been changed to successfully obtain
a default URNG (which is set during `scipy.stats` import).
This commit adds a Cython wrapper wrapping methods Transformed Density Rejection
and Discrete Alias-Urn from UNU.RAN. A few tests and docs have been added. I have
written a small C API for aquiring and releasing Python callbacks during calls to
UNU.RAN C functions.
…thod parameters.

This refactors the public API of the methods TransformedDensityRejection and DiscreteAliasUrn
method to accept a single `dist` object that contains all the required methods. This closely
follows the design of scipygh-13319. So, for example, previously one had to write required functions
seperately and pass them to the method for setup:

    >>> from scipy.stats import TransformedDensityRejection
    >>>
    >>> pdf = lambda x: 1-x*x
    >>> dpdf = lambda x: -2*x
    >>>
    >>> rng = TransformedDensityRejection(pdf, dpdf, domain=(-1,1), seed=123)

But, with the new API, one can just pass a `dist` object with all those methods bundled in it:

    >>> class MyDist:
    ...     def pdf(self, x):
    ...         return 1-x*x
    ...     def dpdf(self, x):
    ...         return -2*x
    ...
    >>> dist = MyDist()
    >>> rng = TransformedDensityRejection(dist, domain=(-1,1), seed=123)

Tests have been alterned to use the new API. New tests for the API itself need to be added.
The tests suite has been refactored to be more readable. Some common tests and data
has been centralized. Justifications for a few tests in the form of comments have
been added. Some of the validations in the API have also been centralized.
This implements some suggestions from the code review:
  * use UNU.RAN manual references in our doc
  * update documentation for RVS method
  * document default value of center as 0 and c as -0.5
  * remove underscores from rvs numbers
  * mention that the variants have been explained in notes
  * "w.r.t the variate" --> "w.r.t x (i.e. the variate)"
* split arguments in separate lines for readability
* remove the file * lines from ukraine.pxd
* remove unused/unvalidated arguments from _validate_args for readability
Relax the chi-squared tests to pass if p-value is > 0.1 instead of trying
to assert if the p-value is 0.999 as the latter case occurs only 0.1% of the
times which is very rare.
UNU.RAN was unable to recognize inf, nan and all zeros in the PV and threw unhelpful
"unknown error". Hence, the DAU method has been refactored to evaluate the PMF during
argument validation and check the values in the PV to report helpful errors. This is
possible to do as DAU only works for distribution with finite domain.

The tests have also been changed to incorporate this refactor.

Docs of the DAU method have also been updated.
Two tests failed:
  1. TestDiscreteAliasUrn::test_basic
  2. Both `test_seed` tests

The former was because a non-string distribution name was present in `distdiscrete` which led
to problems with getattr. The straight-forward fix for this was to skip for non-string `distname`s

The latter was due to a more subtle reason. I used global variable to store the NumPy RNG for sampling
when `np.__version__ < 1.19`. Because of this, when a new NumPy RNG was set, all the methods would start
using that RNG and seeding would immediately break. This has been fixed by using a seperate function in the
base-class `Method` and a global function for default RNG.
UNU.RAN seg faults when nans are present in the data. The behaviour for quantiles
less than 0 and greater than 1 is also different than that of SciPy's. So, code
and tests have been added to match the SciPy's behaviour and handle nans properly.
…g.py

Stubs were not getting exported as no __init__.pyi file was found by mypy.
This file has been added and other errors (false positives) have been ignored.
This commit adds a tutorial on Universal Non-Uniform Random Number Generators
in SciPy.
'randint' was failing with an "unknown error" on 32-bit Ubuntu with NumPy 1.16.5.
The error is occurring on line 723 in file src/methods/dau.c. Looks like it might
be due to floating-point errors. As this is not inside SciPy (and also only exists
on a very specific platform and numpy version), I have skipped this test case.

Also, previously, a test was skipped if the distribution name wasn't string. This
has been corrected to run the test on that test case.
UNU.RAN tests TDR with some special custom distributions. Those tests have
been added to SciPy. UNU.RAN's tests DAU with a few random and geometric
PVs. Our test suite tests for all the discrete distributions and is stronger
than UNU.RAN's. So, no tests for DAU have been added
* Seperate different methods in different files
* Add citation for Jon Von Neumann's work on the rejection method in 1951
* "sampler" -> "generator"
* Emphasize that the methods are black-box and universal
* Add that computing PPF in closed form is difficult
* As TDR methods takes a lot of parameters, benchmark it with the most important ones.
* Use a Beta(2, 3) distribution to benchmark TDR method.
* Use only a subset of discrete distributions with finite domain to save computation time.
* Add references to TDR and DAU tutorial
* Add an explaination of other attributes of the TDR method
* Add relation between rv_continuous/rv_discrete ``rvs`` method
  and the ``rvs`` method of the UNU.RAN generators.
* Add the fact that the RVS of rv_continuous distributions and
  UNU.RAN generators might differ even if the same URNG is used.
* Fix a double `the the` mistake in tutorial.
* Use the `norm` distribution in TDR for which PPF is available
  easily. Compare `rng.ppf_hat` and `norm.ppf`.
* Add the number of expected PDF evaluations in tutorial of the TDR method.
* Correct the relation between rvs method of the generators and distributions
  in scipy.
The previous wrapper had memory leaks due to non-local returns.
This PR eliminates the problem by using the `MessageStream` API
(originally used in Qhull to handle errors occurring in `qhull`
C API).

UNU.RAN uses a global `FILE *` stream which makes it thread-unsafe
to use those streams without first acquiring a lock. To expound on
the problem, we have to call `unur_set_stream` under a lock otherwise
some other thread could change the global `FILE *` variable and all
the errors will be redirected to that wrong file.

Moreover, as non-local jumps are not allowed, `PyErr_Occurred` is
used to catch errors inside callbacks once the UNU.RAN function has
been executed.


def get_rng(methodname, dist):
# parse the method string
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a toy parser that I wrote initially. Need to change this to something that allows us to change the default parameters of the methods more flexibly. Maybe, we can use functools.partial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants