GitHub - erdogant/distfit: distfit is a python library for probability density fitting.

distfit is a Python package for probability density fitting of univariate distributions for random variables. The distfit library can determine the best fit for over 90 theoretical distributions. The goodness-of-fit test is used to score for the best fit and after finding the best-fitted theoretical distribution, the loc, scale, and arg parameters are returned. It can be used for parametric, non-parametric, and discrete distributions. ⭐️Star it if you like it⭐️

Key Features

Feature	Description
Parametric Fitting	Fit distributions on empirical data X.
Non-Parametric Fitting	Fit distributions on empirical data X using non-parametric approaches (quantile, percentiles).
Discrete Fitting	Fit distributions on empirical data X using binomial distribution.
Predict	Compute probabilities for response variables y.
Synthetic Data	Generate synthetic data.
Plots	Varoius plotting functionalities.

Resources and Links

Example Notebooks: Examples
Blog Posts: Medium
Documentation: Website
Bug Reports and Feature Requests: GitHub Issues

Background

For the parametric approach, The distfit library can determine the best fit across 89 theoretical distributions. To score the fit, one of the scoring statistics for the good-of-fitness test can be used used, such as RSS/SSE, Wasserstein, Kolmogorov-Smirnov (KS), or Energy. After finding the best-fitted theoretical distribution, the loc, scale, and arg parameters are returned, such as mean and standard deviation for normal distribution.
For the non-parametric approach, the distfit library contains two methods, the quantile and percentile method. Both methods assume that the data does not follow a specific probability distribution. In the case of the quantile method, the quantiles of the data are modeled whereas for the percentile method, the percentiles are modeled.
In case the dataset contains discrete values, the distift library contains the option for discrete fitting. The best fit is then derived using the binomial distribution.

Installation

Install distfit from PyPI

pip install distfit

Install from Github source

pip install git+https://github.com/erdogant/distfit

Imort Library

import distfit
print(distfit.__version__)

# Import library
from distfit import distfit

Examples

Example: Quick start to find best fit for your input data

# [distfit] >INFO> fit
# [distfit] >INFO> transform
# [distfit] >INFO> [norm      ] [0.00 sec] [RSS: 0.00108326] [loc=-0.048 scale=1.997]
# [distfit] >INFO> [expon     ] [0.00 sec] [RSS: 0.404237] [loc=-6.897 scale=6.849]
# [distfit] >INFO> [pareto    ] [0.00 sec] [RSS: 0.404237] [loc=-536870918.897 scale=536870912.000]
# [distfit] >INFO> [dweibull  ] [0.06 sec] [RSS: 0.0115552] [loc=-0.031 scale=1.722]
# [distfit] >INFO> [t         ] [0.59 sec] [RSS: 0.00108349] [loc=-0.048 scale=1.997]
# [distfit] >INFO> [genextreme] [0.17 sec] [RSS: 0.00300806] [loc=-0.806 scale=1.979]
# [distfit] >INFO> [gamma     ] [0.05 sec] [RSS: 0.00108459] [loc=-1862.903 scale=0.002]
# [distfit] >INFO> [lognorm   ] [0.32 sec] [RSS: 0.00121597] [loc=-110.597 scale=110.530]
# [distfit] >INFO> [beta      ] [0.10 sec] [RSS: 0.00105629] [loc=-16.364 scale=32.869]
# [distfit] >INFO> [uniform   ] [0.00 sec] [RSS: 0.287339] [loc=-6.897 scale=14.437]
# [distfit] >INFO> [loggamma  ] [0.12 sec] [RSS: 0.00109042] [loc=-370.746 scale=55.722]
# [distfit] >INFO> Compute confidence intervals [parametric]
# [distfit] >INFO> Compute significance for 9 samples.
# [distfit] >INFO> Multiple test correction method applied: [fdr_bh].
# [distfit] >INFO> Create PDF plot for the parametric method.
# [distfit] >INFO> Mark 5 significant regions
# [distfit] >INFO> Estimated distribution: beta [loc:-16.364265, scale:32.868811]

Example: Plot summary of the tested distributions

After we have a fitted model, we can make some predictions using the theoretical distributions. After making some predictions, we can plot again but now the predictions are automatically included.

Example: Make predictions using the fitted distribution

Example: Test for one specific distributions

The full list of distributions is listed here: https://erdogant.github.io/distfit/pages/html/Parametric.html

Example: Test for multiple distributions

The full list of distributions is listed here: https://erdogant.github.io/distfit/pages/html/Parametric.html

Example: Fit discrete distribution

from scipy.stats import binom
# Generate random numbers

# Set parameters for the test-case
n = 8
p = 0.5

# Generate 10000 samples of the distribution of (n, p)
X = binom(n, p).rvs(10000)
print(X)

# [5 1 4 5 5 6 2 4 6 5 4 4 4 7 3 4 4 2 3 3 4 4 5 1 3 2 7 4 5 2 3 4 3 3 2 3 5
#  4 6 7 6 2 4 3 3 5 3 5 3 4 4 4 7 5 4 5 3 4 3 3 4 3 3 6 3 3 5 4 4 2 3 2 5 7
#  5 4 8 3 4 3 5 4 3 5 5 2 5 6 7 4 5 5 5 4 4 3 4 5 6 2...]

# Import distfit
from distfit import distfit

# Initialize for discrete distribution fitting
dfit = distfit(method='discrete')

# Run distfit to and determine whether we can find the parameters from the data.
dfit.fit_transform(X)

# [distfit] >fit..
# [distfit] >transform..
# [distfit] >Fit using binomial distribution..
# [distfit] >[binomial] [SSE: 7.79] [n: 8] [p: 0.499959] [chi^2: 1.11]
# [distfit] >Compute confidence interval [discrete]

Example: Make predictions on unseen data for discrete distribution

Example: Generate samples based on the fitted distribution

Contributors

Setting up and maintaining bnlearn has been possible thanks to users and contributors. Thanks to:

Maintainer

Erdogan Taskesen, github: erdogant
Contributions are welcome.
Yes! This library is entirely free but it runs on coffee! :) Feel free to support with a Coffee.

Name		Name	Last commit message	Last commit date
Latest commit History 579 Commits
.github		.github
distfit		distfit
docs		docs
experiments		experiments
notebooks		notebooks
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
make_clean.sh		make_clean.sh
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Key Features

Resources and Links

Background

Installation

Install distfit from PyPI

Install from Github source

Imort Library

Examples

Example: Quick start to find best fit for your input data

Example: Plot summary of the tested distributions

Example: Make predictions using the fitted distribution

Example: Test for one specific distributions

Example: Test for multiple distributions

Example: Fit discrete distribution

Example: Make predictions on unseen data for discrete distribution

Example: Generate samples based on the fitted distribution

Contributors

Maintainer

About

Uh oh!

Releases 55

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

Uh oh!

License

erdogant/distfit

Folders and files

Latest commit

History

Repository files navigation

Key Features

Resources and Links

Background

Installation

Install distfit from PyPI

Install from Github source

Imort Library

Examples

Contributors

Maintainer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Uh oh!

Uh oh!

Languages