Name		Name	Last commit message	Last commit date
parent directory ..
__init__.py		__init__.py
difference.py		difference.py
distributions.py		distributions.py
ks.py		ks.py
metric_pca.py		metric_pca.py
modes.py		modes.py
plotting.py		plotting.py
rank.py		rank.py
readme.md		readme.md
samples.py		samples.py
test.py		test.py

`util.stats`

All sorts of useful statistics, mostly nonparametric. Compute effect, fit CDF's, compute metric principle components, rank probability, or quantify sample statistics.

`util.stats.difference`

`def effect`

Compute the effect between two sequences. Use the following:

number vs. number - Correlation coefficient between two sequences
category vs. number - Compute method (mean) L1-norm difference between full distribution and conditional distributions for each possible category.
category vs. category - Compute the method (mean) total difference between full distribution of other sequence given values of one sequence.

`util.stats.distributions`

`class Distribution`

A class for handling standard operators over distributions. It supports addition with another distribution, subtraction from another distribution (KS statistic), and multiplication by a constant.

`def gauss_smooth`

Produce a new function that uses a normal distribution to weight nearby function values. Only works for 1-dimensional distributions.

`def cdf_fit`

Fit a CDF (cumulative distribution function) to given data. Offers fit of None (piecewise constant), linear (interpolation), and cubic (interpolation) options.

Fit a PDF (probability density function) to given data. Offers fit of None (piecewise constant), linear (interpolation), and cubic (interpolation) options. By default, it uses Gaussian smoothing over the CDF.

`util.stats.ks`

`def ks_diff`

Compute the Kolmogorov-Smirnov statistic between two provided functions, mathematically known as the max-norm difference. Has options for the method of computing the statistic, usually relies on optimization.

`def ks_p_value`

Given a KS statistic and the sample sizes of the two distributions, return the largest confidence with which the two distributions an be said to be the same. More technically, this is the largest confidence for which the null hypothesis (two distributions are same) cannot be refuted.

`def normal_confidence`

Return the "confidence" that the provided distribution is normal. Uses the KS statistic and the ks_p_value function to compute value.

`util.stats.metric_pca`

`def pca`

Use sklearn to compute the principle components of a matrix of row-vectors. Return the components and their magnitudes.

`def mpca`

Compute the metric principle components of a set of points with associated values and a given metric. Scale all vectors between points by the magnitude difference between associated values. Use sklearn to compute the principle components of these metric between vectors. Return the vectors and their magnitudes.

`util.stats.plotting`

`def plot_percentiles`

Given a Plot object (from util.plot), add in functions for a "percentile cloud" (shaded regions darker in middle) for stacked y values associated with each x value.

`util.stats.rank`

`def insert_sorted`

Insert a value into a sorted list of tuples (defaults to looking at the last element of the tuple). Maintains sorted order after insert.

`def product`

Compute the product of all values in an iterable.

`def rank_probability`

Given a target list of numbers (empirical distribution A) and a list of lists of numbers (empirical distributions B1, B2, ...), determine the probability that items from the target list will have rank k when selecting randomly (assuming order doesn't matter). AKA, return the probability that a random draw from (A, B1, B2, ... ) leaves the value from A in position k.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stats

stats

readme.md

`util.stats`

`util.stats.difference`

`def effect`

`util.stats.distributions`

`class Distribution`

`def gauss_smooth`

`def cdf_fit`

`def pdf_fit`

`util.stats.ks`

`def ks_diff`

`def ks_p_value`

`def normal_confidence`

`util.stats.metric_pca`

`def pca`

`def mpca`

`util.stats.plotting`

`def plot_percentiles`

`util.stats.rank`

`def insert_sorted`

`def product`

`def rank_probability`

`def performance_profile`

`def data_profile`

`util.stats.samples`

`def samples`

Files

stats

Directory actions

More options

Directory actions

More options

Latest commit

History

stats

Folders and files

parent directory

util.stats

`util.stats`