BENCH: stats.unuran: Write some benchmarks for all UNU.RAN methods #10

tirthasheshpatel · 2021-07-15T10:02:31Z

What does this implement/fix?

@chrisb83 Let's track the progress and results of benchmarks here. I have merged the gsoc-unuran branch and wrote benchmarks for some of the slowest continuous distributions in SciPy. Also I have changed both the x and y-axis to use the log scale.

cc @mckib2

Here are the results:

Beta Distribution:

Gamma Distribution:

Gauss Hypergeometric Distribution:

Generalized Exponential Distribution:

Generalized Normal Distribution:

Inverse Gamma Distribution:

KS-One Distribution:

KS-Two Distribution:

Nakagami Distribution:

Normal Distribution:

Studentized Range Distribution:

This commit refactors the entire UNU.RAN wrapper to use `MessageStream` to report errors instead of non-local returns. One of the downside of this approach is that UNU.RAN uses a global thread-unsafe file streams which makes error reporting thread unsafe. Hence, a module-level lock is used to hold the file stream to report the errors to and achieve thread-safety.

This commit adds UNU.RAN source tree to scipy.stats with Makefiles and configuration files removed. The "uniform" directory has also been removed as it contains third-party uniform random number generator which we aren't allowed to use. Other files have been kept intact.

This commit removes all the deprecated files from UNU.RAN source tree. Declarations of the deprecated functions have been removed from `unuran.h`.

This commit replaces all the `.ch` files with `.h` files as SciPy's build system doesn't understand `.ch` files. The C files including these `.ch` have been edited to now include the coressponding `.h` file.

…as default This commit refactors functions `unur_get_default_urng` and `unur_get_default_urng_aux` to allow NumPy BitGenerator as the default URNG. Previously, UNU.RAN defined a macro to get its own URNG from the `uniform` directory but as it is now removed, that macro will fail with an error. Hence, the macro (present in `unuran_config.h`) has been changed to successfully obtain a default URNG (which is set during `scipy.stats` import).

This commit adds a Cython wrapper wrapping methods Transformed Density Rejection and Discrete Alias-Urn from UNU.RAN. A few tests and docs have been added. I have written a small C API for aquiring and releasing Python callbacks during calls to UNU.RAN C functions.

…thod parameters. This refactors the public API of the methods TransformedDensityRejection and DiscreteAliasUrn method to accept a single `dist` object that contains all the required methods. This closely follows the design of scipygh-13319. So, for example, previously one had to write required functions seperately and pass them to the method for setup: >>> from scipy.stats import TransformedDensityRejection >>> >>> pdf = lambda x: 1-x*x >>> dpdf = lambda x: -2*x >>> >>> rng = TransformedDensityRejection(pdf, dpdf, domain=(-1,1), seed=123) But, with the new API, one can just pass a `dist` object with all those methods bundled in it: >>> class MyDist: ... def pdf(self, x): ... return 1-x*x ... def dpdf(self, x): ... return -2*x ... >>> dist = MyDist() >>> rng = TransformedDensityRejection(dist, domain=(-1,1), seed=123) Tests have been alterned to use the new API. New tests for the API itself need to be added.

The tests suite has been refactored to be more readable. Some common tests and data has been centralized. Justifications for a few tests in the form of comments have been added. Some of the validations in the API have also been centralized.

This implements some suggestions from the code review: * use UNU.RAN manual references in our doc * update documentation for RVS method * document default value of center as 0 and c as -0.5 * remove underscores from rvs numbers * mention that the variants have been explained in notes * "w.r.t the variate" --> "w.r.t x (i.e. the variate)"

* split arguments in separate lines for readability * remove the file * lines from ukraine.pxd * remove unused/unvalidated arguments from _validate_args for readability

Relax the chi-squared tests to pass if p-value is > 0.1 instead of trying to assert if the p-value is 0.999 as the latter case occurs only 0.1% of the times which is very rare.

UNU.RAN was unable to recognize inf, nan and all zeros in the PV and threw unhelpful "unknown error". Hence, the DAU method has been refactored to evaluate the PMF during argument validation and check the values in the PV to report helpful errors. This is possible to do as DAU only works for distribution with finite domain. The tests have also been changed to incorporate this refactor. Docs of the DAU method have also been updated.

Two tests failed: 1. TestDiscreteAliasUrn::test_basic 2. Both `test_seed` tests The former was because a non-string distribution name was present in `distdiscrete` which led to problems with getattr. The straight-forward fix for this was to skip for non-string `distname`s The latter was due to a more subtle reason. I used global variable to store the NumPy RNG for sampling when `np.__version__ < 1.19`. Because of this, when a new NumPy RNG was set, all the methods would start using that RNG and seeding would immediately break. This has been fixed by using a seperate function in the base-class `Method` and a global function for default RNG.

UNU.RAN seg faults when nans are present in the data. The behaviour for quantiles less than 0 and greater than 1 is also different than that of SciPy's. So, code and tests have been added to match the SciPy's behaviour and handle nans properly.

…parameter

…g.py Stubs were not getting exported as no __init__.pyi file was found by mypy. This file has been added and other errors (false positives) have been ignored.

This commit adds a tutorial on Universal Non-Uniform Random Number Generators in SciPy.

'randint' was failing with an "unknown error" on 32-bit Ubuntu with NumPy 1.16.5. The error is occurring on line 723 in file src/methods/dau.c. Looks like it might be due to floating-point errors. As this is not inside SciPy (and also only exists on a very specific platform and numpy version), I have skipped this test case. Also, previously, a test was skipped if the distribution name wasn't string. This has been corrected to run the test on that test case.

UNU.RAN tests TDR with some special custom distributions. Those tests have been added to SciPy. UNU.RAN's tests DAU with a few random and geometric PVs. Our test suite tests for all the discrete distributions and is stronger than UNU.RAN's. So, no tests for DAU have been added

* Seperate different methods in different files * Add citation for Jon Von Neumann's work on the rejection method in 1951 * "sampler" -> "generator" * Emphasize that the methods are black-box and universal * Add that computing PPF in closed form is difficult

* As TDR methods takes a lot of parameters, benchmark it with the most important ones. * Use a Beta(2, 3) distribution to benchmark TDR method. * Use only a subset of discrete distributions with finite domain to save computation time.

* Add references to TDR and DAU tutorial * Add an explaination of other attributes of the TDR method * Add relation between rv_continuous/rv_discrete ``rvs`` method and the ``rvs`` method of the UNU.RAN generators. * Add the fact that the RVS of rv_continuous distributions and UNU.RAN generators might differ even if the same URNG is used. * Fix a double `the the` mistake in tutorial. * Use the `norm` distribution in TDR for which PPF is available easily. Compare `rng.ppf_hat` and `norm.ppf`.

* Add the number of expected PDF evaluations in tutorial of the TDR method. * Correct the relation between rvs method of the generators and distributions in scipy.

The previous wrapper had memory leaks due to non-local returns. This PR eliminates the problem by using the `MessageStream` API (originally used in Qhull to handle errors occurring in `qhull` C API). UNU.RAN uses a global `FILE *` stream which makes it thread-unsafe to use those streams without first acquiring a lock. To expound on the problem, we have to call `unur_set_stream` under a lock otherwise some other thread could change the global `FILE *` variable and all the errors will be redirected to that wrong file. Moreover, as non-local jumps are not allowed, `PyErr_Occurred` is used to catch errors inside callbacks once the UNU.RAN function has been executed.

tirthasheshpatel · 2021-07-15T10:13:52Z

unuran_perf.py

+
+
+def get_rng(methodname, dist):
+    # parse the method string


This was just a toy parser that I wrote initially. Need to change this to something that allows us to change the default parameters of the methods more flexibly. Maybe, we can use functools.partial.

tirthasheshpatel added 30 commits July 3, 2021 20:18

BENCH: stats: add script to generate UNU.RAN benchmarks

03cd2b4

BENCH: stats: add more distributions and multiprocessing

caa7c79

DEP: stats: remove deprecated functionality from UNU.RAN source tree

cfcf099

This commit removes all the deprecated files from UNU.RAN source tree. Declarations of the deprecated functions have been removed from `unuran.h`.

MAINT: stats: replace all the .ch files with .h files

5444c63

This commit replaces all the `.ch` files with `.h` files as SciPy's build system doesn't understand `.ch` files. The C files including these `.ch` have been edited to now include the coressponding `.h` file.

MAINT: stats: split arguments in separate lines for readability

ac0ed97

* split arguments in separate lines for readability * remove the file * lines from ukraine.pxd * remove unused/unvalidated arguments from _validate_args for readability

TST: stats: use threshold p-value 0.1 instead of 0.999

08d7d9e

Relax the chi-squared tests to pass if p-value is > 0.1 instead of trying to assert if the p-value is 0.999 as the latter case occurs only 0.1% of the times which is very rare.

TYP: stats: update type hints and resolve some mypy errors

0883a5a

DOC: stats: fix refguide failures

63d2d70

TYP: stats: use overlod and property decorators

be49caf

TST: stats: add tests for rvs size parameter

39b33e9

MAINT: stats: use np.isfinite to detect both infinite and nan values

d6dc54f

DOC: stats: remove normalizing constant

94bf36c

TST: stats: ignore RuntimeWarnings in tests

b0728eb

MAINT: stats: simplify _unpack_dist and write tests for the dist …

87684ab

…parameter

MAINT, TYP: stats: add __init__.pyi and ignore errors in test_samplin…

3b1dec1

…g.py Stubs were not getting exported as no __init__.pyi file was found by mypy. This file has been added and other errors (false positives) have been ignored.

TYP: stats: remove __init__.pyi

3f3a6e7

DOC: stats: add tutorial

bea7534

This commit adds a tutorial on Universal Non-Uniform Random Number Generators in SciPy.

BENCH: stats: add a benchmark suite for TDR and DAU

69c216a

tirthasheshpatel added 11 commits July 15, 2021 09:08

MAINT: stats: add some comments to explain changes

7e48c7d

ENH: stats: separate UNU.RAN in its own submodule

e137495

BENCH: stats: improve benchmarks

16f9513

* As TDR methods takes a lot of parameters, benchmark it with the most important ones. * Use a Beta(2, 3) distribution to benchmark TDR method. * Use only a subset of discrete distributions with finite domain to save computation time.

DOC: stats: add suggestions from the code review

f4facd7

* Add the number of expected PDF evaluations in tutorial of the TDR method. * Correct the relation between rvs method of the generators and distributions in scipy.

DOC: stats: make scalar PDF requirement a note

8ce9e75

MAINT: stats: fix lint and mypy checks

89c9a98

MAINT: stats: use log scale for benchmarking

dabdfee

Merge branch 'gsoc-unuran' into gsoc-unuran-bench

54f7121

tirthasheshpatel commented Jul 15, 2021

View reviewed changes

tirthasheshpatel force-pushed the gsoc-unuran branch from a46e162 to e437733 Compare August 3, 2021 15:19

tirthasheshpatel force-pushed the gsoc-unuran branch from 1e60ee0 to 2bdc299 Compare August 16, 2021 17:47

tirthasheshpatel closed this Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BENCH: stats.unuran: Write some benchmarks for all UNU.RAN methods #10

BENCH: stats.unuran: Write some benchmarks for all UNU.RAN methods #10

tirthasheshpatel commented Jul 15, 2021 •

edited

Loading

tirthasheshpatel Jul 15, 2021

BENCH: stats.unuran: Write some benchmarks for all UNU.RAN methods #10

BENCH: stats.unuran: Write some benchmarks for all UNU.RAN methods #10

Conversation

tirthasheshpatel commented Jul 15, 2021 • edited Loading

What does this implement/fix?

tirthasheshpatel Jul 15, 2021

Choose a reason for hiding this comment

tirthasheshpatel commented Jul 15, 2021 •

edited

Loading