forked from scipy/scipy
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BENCH: stats.unuran: Write some benchmarks for all UNU.RAN methods #10
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit refactors the entire UNU.RAN wrapper to use `MessageStream` to report errors instead of non-local returns. One of the downside of this approach is that UNU.RAN uses a global thread-unsafe file streams which makes error reporting thread unsafe. Hence, a module-level lock is used to hold the file stream to report the errors to and achieve thread-safety.
This commit adds UNU.RAN source tree to scipy.stats with Makefiles and configuration files removed. The "uniform" directory has also been removed as it contains third-party uniform random number generator which we aren't allowed to use. Other files have been kept intact.
This commit removes all the deprecated files from UNU.RAN source tree. Declarations of the deprecated functions have been removed from `unuran.h`.
This commit replaces all the `.ch` files with `.h` files as SciPy's build system doesn't understand `.ch` files. The C files including these `.ch` have been edited to now include the coressponding `.h` file.
…as default This commit refactors functions `unur_get_default_urng` and `unur_get_default_urng_aux` to allow NumPy BitGenerator as the default URNG. Previously, UNU.RAN defined a macro to get its own URNG from the `uniform` directory but as it is now removed, that macro will fail with an error. Hence, the macro (present in `unuran_config.h`) has been changed to successfully obtain a default URNG (which is set during `scipy.stats` import).
This commit adds a Cython wrapper wrapping methods Transformed Density Rejection and Discrete Alias-Urn from UNU.RAN. A few tests and docs have been added. I have written a small C API for aquiring and releasing Python callbacks during calls to UNU.RAN C functions.
…thod parameters. This refactors the public API of the methods TransformedDensityRejection and DiscreteAliasUrn method to accept a single `dist` object that contains all the required methods. This closely follows the design of scipygh-13319. So, for example, previously one had to write required functions seperately and pass them to the method for setup: >>> from scipy.stats import TransformedDensityRejection >>> >>> pdf = lambda x: 1-x*x >>> dpdf = lambda x: -2*x >>> >>> rng = TransformedDensityRejection(pdf, dpdf, domain=(-1,1), seed=123) But, with the new API, one can just pass a `dist` object with all those methods bundled in it: >>> class MyDist: ... def pdf(self, x): ... return 1-x*x ... def dpdf(self, x): ... return -2*x ... >>> dist = MyDist() >>> rng = TransformedDensityRejection(dist, domain=(-1,1), seed=123) Tests have been alterned to use the new API. New tests for the API itself need to be added.
The tests suite has been refactored to be more readable. Some common tests and data has been centralized. Justifications for a few tests in the form of comments have been added. Some of the validations in the API have also been centralized.
This implements some suggestions from the code review: * use UNU.RAN manual references in our doc * update documentation for RVS method * document default value of center as 0 and c as -0.5 * remove underscores from rvs numbers * mention that the variants have been explained in notes * "w.r.t the variate" --> "w.r.t x (i.e. the variate)"
* split arguments in separate lines for readability * remove the file * lines from ukraine.pxd * remove unused/unvalidated arguments from _validate_args for readability
Relax the chi-squared tests to pass if p-value is > 0.1 instead of trying to assert if the p-value is 0.999 as the latter case occurs only 0.1% of the times which is very rare.
UNU.RAN was unable to recognize inf, nan and all zeros in the PV and threw unhelpful "unknown error". Hence, the DAU method has been refactored to evaluate the PMF during argument validation and check the values in the PV to report helpful errors. This is possible to do as DAU only works for distribution with finite domain. The tests have also been changed to incorporate this refactor. Docs of the DAU method have also been updated.
Two tests failed: 1. TestDiscreteAliasUrn::test_basic 2. Both `test_seed` tests The former was because a non-string distribution name was present in `distdiscrete` which led to problems with getattr. The straight-forward fix for this was to skip for non-string `distname`s The latter was due to a more subtle reason. I used global variable to store the NumPy RNG for sampling when `np.__version__ < 1.19`. Because of this, when a new NumPy RNG was set, all the methods would start using that RNG and seeding would immediately break. This has been fixed by using a seperate function in the base-class `Method` and a global function for default RNG.
UNU.RAN seg faults when nans are present in the data. The behaviour for quantiles less than 0 and greater than 1 is also different than that of SciPy's. So, code and tests have been added to match the SciPy's behaviour and handle nans properly.
…g.py Stubs were not getting exported as no __init__.pyi file was found by mypy. This file has been added and other errors (false positives) have been ignored.
This commit adds a tutorial on Universal Non-Uniform Random Number Generators in SciPy.
'randint' was failing with an "unknown error" on 32-bit Ubuntu with NumPy 1.16.5. The error is occurring on line 723 in file src/methods/dau.c. Looks like it might be due to floating-point errors. As this is not inside SciPy (and also only exists on a very specific platform and numpy version), I have skipped this test case. Also, previously, a test was skipped if the distribution name wasn't string. This has been corrected to run the test on that test case.
UNU.RAN tests TDR with some special custom distributions. Those tests have been added to SciPy. UNU.RAN's tests DAU with a few random and geometric PVs. Our test suite tests for all the discrete distributions and is stronger than UNU.RAN's. So, no tests for DAU have been added
* Seperate different methods in different files * Add citation for Jon Von Neumann's work on the rejection method in 1951 * "sampler" -> "generator" * Emphasize that the methods are black-box and universal * Add that computing PPF in closed form is difficult
* As TDR methods takes a lot of parameters, benchmark it with the most important ones. * Use a Beta(2, 3) distribution to benchmark TDR method. * Use only a subset of discrete distributions with finite domain to save computation time.
* Add references to TDR and DAU tutorial * Add an explaination of other attributes of the TDR method * Add relation between rv_continuous/rv_discrete ``rvs`` method and the ``rvs`` method of the UNU.RAN generators. * Add the fact that the RVS of rv_continuous distributions and UNU.RAN generators might differ even if the same URNG is used. * Fix a double `the the` mistake in tutorial. * Use the `norm` distribution in TDR for which PPF is available easily. Compare `rng.ppf_hat` and `norm.ppf`.
* Add the number of expected PDF evaluations in tutorial of the TDR method. * Correct the relation between rvs method of the generators and distributions in scipy.
The previous wrapper had memory leaks due to non-local returns. This PR eliminates the problem by using the `MessageStream` API (originally used in Qhull to handle errors occurring in `qhull` C API). UNU.RAN uses a global `FILE *` stream which makes it thread-unsafe to use those streams without first acquiring a lock. To expound on the problem, we have to call `unur_set_stream` under a lock otherwise some other thread could change the global `FILE *` variable and all the errors will be redirected to that wrong file. Moreover, as non-local jumps are not allowed, `PyErr_Occurred` is used to catch errors inside callbacks once the UNU.RAN function has been executed.
|
||
|
||
def get_rng(methodname, dist): | ||
# parse the method string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just a toy parser that I wrote initially. Need to change this to something that allows us to change the default parameters of the methods more flexibly. Maybe, we can use functools.partial
.
tirthasheshpatel
force-pushed
the
gsoc-unuran
branch
from
August 3, 2021 15:19
a46e162
to
e437733
Compare
tirthasheshpatel
force-pushed
the
gsoc-unuran
branch
from
August 16, 2021 17:47
1e60ee0
to
2bdc299
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this implement/fix?
@chrisb83 Let's track the progress and results of benchmarks here. I have merged the
gsoc-unuran
branch and wrote benchmarks for some of the slowest continuous distributions in SciPy. Also I have changed both the x and y-axis to use the log scale.cc @mckib2
Here are the results:
Beta Distribution:
Gamma Distribution:
Gauss Hypergeometric Distribution:
Generalized Exponential Distribution:
Generalized Normal Distribution:
Inverse Gamma Distribution:
KS-One Distribution:
KS-Two Distribution:
Nakagami Distribution:
Normal Distribution:
Studentized Range Distribution: