Skip to content

Latest commit

 

History

History

stats

util.stats

All sorts of useful statistics, mostly nonparametric. Compute effect, fit CDF's, compute metric principle components, rank probability, or quantify sample statistics.

Compute the effect between two sequences. Use the following:

  • number vs. number - Correlation coefficient between two sequences
  • category vs. number - Compute method (mean) L1-norm difference between full distribution and conditional distributions for each possible category.
  • category vs. category - Compute the method (mean) total difference between full distribution of other sequence given values of one sequence.

A class for handling standard operators over distributions. It supports addition with another distribution, subtraction from another distribution (KS statistic), and multiplication by a constant.

Produce a new function that uses a normal distribution to weight nearby function values. Only works for 1-dimensional distributions.

Fit a CDF (cumulative distribution function) to given data. Offers fit of None (piecewise constant), linear (interpolation), and cubic (interpolation) options.

Fit a PDF (probability density function) to given data. Offers fit of None (piecewise constant), linear (interpolation), and cubic (interpolation) options. By default, it uses Gaussian smoothing over the CDF.

Compute the Kolmogorov-Smirnov statistic between two provided functions, mathematically known as the max-norm difference. Has options for the method of computing the statistic, usually relies on optimization.

Given a KS statistic and the sample sizes of the two distributions, return the largest confidence with which the two distributions an be said to be the same. More technically, this is the largest confidence for which the null hypothesis (two distributions are same) cannot be refuted.

Return the "confidence" that the provided distribution is normal. Uses the KS statistic and the ks_p_value function to compute value.

Use sklearn to compute the principle components of a matrix of row-vectors. Return the components and their magnitudes.

Compute the metric principle components of a set of points with associated values and a given metric. Scale all vectors between points by the magnitude difference between associated values. Use sklearn to compute the principle components of these metric between vectors. Return the vectors and their magnitudes.

Given a Plot object (from util.plot), add in functions for a "percentile cloud" (shaded regions darker in middle) for stacked y values associated with each x value.

Insert a value into a sorted list of tuples (defaults to looking at the last element of the tuple). Maintains sorted order after insert.

Compute the product of all values in an iterable.

Given a target list of numbers (empirical distribution A) and a list of lists of numbers (empirical distributions B1, B2, ...), determine the probability that items from the target list will have rank k when selecting randomly (assuming order doesn't matter). AKA, return the probability that a random draw from (A, B1, B2, ... ) leaves the value from A in position k.

Returns the proportion of items from a list are less than or equal to a value.

Computes the data profiles for a given set of performances. This is the likelihood that the performance is within some factor of the best possible performance.

Given optionally a sample size, max error, and confidence, compute whichever of these values was not provided and return it. If all three parameters are given, then return True or False to determine if that sample meets that criteria.

This estimate is provided distribution free, using only combinatorics. It will not be as strong of an estimate as could be made with a parametric form.