## Statistics
 - This module was introducted in [python 3.4](https://docs.python.org/3/library/statistics.html)
 - As of python 3.8, these functions support int, float, Decimal and Fraction.

In [2]:
import statistics

In [3]:
print(dir(statistics))

['Counter', 'Decimal', 'Fraction', 'LinearRegression', 'NormalDist', 'StatisticsError', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_coerce', '_convert', '_exact_ratio', '_fail_neg', '_find_lteq', '_find_rteq', '_isfinite', '_normal_dist_inv_cdf', '_ss', '_sum', 'bisect_left', 'bisect_right', 'correlation', 'covariance', 'erf', 'exp', 'fabs', 'fmean', 'fsum', 'geometric_mean', 'groupby', 'harmonic_mean', 'hypot', 'itemgetter', 'linear_regression', 'log', 'math', 'mean', 'median', 'median_grouped', 'median_high', 'median_low', 'mode', 'multimode', 'namedtuple', 'numbers', 'pstdev', 'pvariance', 'quantiles', 'random', 'repeat', 'sqrt', 'stdev', 'tau', 'variance']


In [4]:
print(statistics.__doc__)


Basic statistics module.

This module provides functions for calculating statistics of data, including
averages, variance, and standard deviation.

Calculating averages
--------------------

Function            Description
mean                Arithmetic mean (average) of data.
fmean               Fast, floating point arithmetic mean.
geometric_mean      Geometric mean of data.
harmonic_mean       Harmonic mean of data.
median              Median (middle value) of data.
median_low          Low median of data.
median_high         High median of data.
median_grouped      Median, or 50th percentile, of grouped data.
mode                Mode (most common value) of data.
multimode           List of modes (most common values of data).
quantiles           Divide data into intervals with equal probability.

Calculate the arithmetic mean ("the average") of data:

>>> mean([-1.0, 2.5, 3.25, 5.75])
2.625


Calculate the standard median of discrete data:

>>> median([2, 3, 4, 5])
3.5


Calculate t

### Averages and measures of central location

```
mean()               Arithmetic mean (“average”) of data.
fmean()              Fast, floating point arithmetic mean.
geometric_mean()     Geometric mean of data.
harmonic_mean()      Harmonic mean of data.
median()             Median (middle value) of data.
median_low()         Low median of data.
median_high()        High median of data.
median_grouped()     Median, or 50th percentile, of grouped data.
mode()               Single mode (most common value) of discrete or nominal data.
multimode()          List of modes (most common values) of discrete or nomimal data.
quantiles()          Divide data into intervals with equal probability.
```

#### 1. statistics.mean
    - Arithmetic mean (or  average)
    - sum of the data divided by the number of data points.
    - It is a measure of the central location of the data.

In [5]:
statistics.mean([1, 2, 3, 4, 4])

2.8

In [6]:
statistics.mean([-1.0, 2.5, 3.25, 5.75])

2.625

In [7]:
statistics.mean([1, 2, 3, 4, -1.0, 2.5, 3.25, 5.75])

2.5625

In [8]:
from fractions import Fraction as F

print(F(3, 7))

3/7


In [9]:
statistics.mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])

Fraction(13, 21)

In [10]:
from decimal import Decimal as D

print(D("0.5"))
print(D("0.53432423434234343243434"))

0.5
0.53432423434234343243434


In [11]:
statistics.mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])

Decimal('0.5625')

#### 2. statistics.fmean
    - Convert data to floats and compute the arithmetic mean.
    - This runs faster than the mean() function and it always returns a float. 
    - The data may be a sequence or iterable.

In [12]:
statistics.fmean([1, 2, 3, 4, 4])

2.8

In [13]:
statistics.fmean([-1.0, 2.5, 3.25, 5.75])

2.625

#### 3.statistics.geometric_mean
    - Convert data to floats and compute the geometric mean.
    - Indicates the central tendency or typical value of the data using the product 
       of the values (as opposed to the arithmetic mean which uses their sum).
    - Raises a StatisticsError if the input dataset is empty, if it contains a zero,
      or if it contains a negative value. 
    - The data may be a sequence or iterable.

In [14]:
statistics.mean([1, 2, 3, 4, 4])

2.8

In [15]:
statistics.geometric_mean([1, 2, 3, 4, 4])

2.491461879231035

In [16]:
round(statistics.geometric_mean([1, 2, 3, 4, 4]), 1)

2.5

In [17]:
statistics.mean([1, 2, 3, 4, 4, 0])

2.3333333333333335

In [18]:
try:
    statistics.geometric_mean([1, 2, 3, 4, 4, 0])
except statistics.StatisticsError as ex:
    print(ex)

geometric mean requires a non-empty dataset containing positive numbers


#### 4.statistics.harmonic_mean
    - Return the harmonic mean of data, a sequence or iterable of real-valued numbers.
    - Also, called as subcontrary mean
    - reciprocal of the arithmetic mean() of the reciprocals of the data.
    -  harmonic mean of three values a, b and c is 3/(1/a + 1/b + 1/c)
    - StatisticsError is raised if data is empty, or any element is less than zero.

    - harmonic mean is a type of average, a measure of the central location of the data. 
    - It is often appropriate when averaging rates or ratios

__Question:__ Suppose a car travels 10 km at 40 km/hr, then another 10 km at 60 km/hr. What is the average speed?

In [19]:
statistics.harmonic_mean([40, 60])

48.0

__Question:__ Suppose an investor purchases an equal value of shares in each of three companies, with P/E (price/earning) ratios of 2.5, 3 and 10. What is the average P/E ratio for the investor’s portfolio?

In [20]:
statistics.harmonic_mean([2.5, 3, 10])  # For an equal investment portfolio.

3.6

#### 5.statistics.median
    - Return the median (middle value) of numeric data, using the common “mean of middle two” method.
    - This is suited for when your data is discrete, and you don’t mind that the median may not be an actual data point.

Calculate the standard median of discrete data:

In [21]:
statistics.median([1, 2, 3, 8, 9])

3

In [22]:
statistics.median([2, 3, 4, 5])

3.5

#### 6.statistics.median_low
    - Return the low median of numeric data.
    - The low median is always a member of the data set. 
    - When the number of data points is odd, the middle value is returned. 
    - When it is even, the smaller of the two middle values is returned.

In [23]:
statistics.median_low([1, 2, 3, 8, 9])

3

In [24]:
statistics.median_low([2, 3, 4, 5])

3

#### 7.statistics.median_high
    - Return the high median of data.
    - The high median is always a member of the data set. 
    - When the number of data points is odd, the middle value is returned. 
    - When it is even, the larger of the two middle values is returned.
    

In [25]:
statistics.median_high([1, 2, 3, 8, 9])

3

In [26]:
statistics.median_high([2, 3, 4, 5])

4

#### 8.statistics.median_grouped(data, interval=1)
    - Return the median of grouped continuous data, calculated as the 50th percentile, using interpolation. 
    - If data is empty, StatisticsError is raised. 
    - Data can be a sequence or iterable.

In [27]:
statistics.median_grouped([1, 2, 3, 8, 9])

3.0

In [28]:
statistics.median_grouped([2, 3, 4, 5])

3.5

Optional argument interval represents the class interval, and defaults to 1. Changing the class interval naturally will change the interpolation:

In [29]:
statistics.median_grouped([2, 3, 4, 5], interval=1)

3.5

In [30]:
statistics.median_grouped([2, 3, 4, 5], interval=2)

3.0

In [31]:
statistics.median_grouped([2, 3, 4, 5], interval=3)

2.5

__NOTE:__ This function does not check whether the data points are at least interval apart.

#### 9.statistics.mode(data)
    - Return the single most common data point from discrete or nominal data.
    - The mode (when it exists) is the most typical value and serves as a measure of central location.

In [32]:
statistics.mode([1, 1, 2, 3, 3, 3, 3, 4])  # Its single mode data

3

In [33]:
statistics.mode(
    [1, 1, 2, 2, 3, 3, 4]
)  # Its multi-mode data ( 1, 2 & 3 each are two times)

1

In [34]:
statistics.mode(
    [2, 2, 3, 1, 1, 3, 4]
)  # Its multi-mode data ( 1, 2 & 3 each are two times)

2

In [35]:
statistics.mode(["red", "blue", "blue", "red", "green", "red", "red"])

'red'

In [36]:
statistics.mode(["blue", "blue", "red", "green", "red"])

'blue'

#### 10.statistics.multimode(data)
    - Return a list of the most frequently occurring values in the order they were 
      first encountered in the data. 
    - Will return more than one result if there are multiple modes or an empty list if the data is empty:

In [37]:
statistics.multimode(["red", "blue", "blue", "red", "green", "red", "red"])

['red']

In [38]:
statistics.multimode(["red", "blue", "blue", "red", "green"])

['red', 'blue']

In [39]:
statistics.multimode(
    [1, 1, 2, 2, 3, 3, 4]
)  # Its multi-mode data ( 1, 2 & 3 each are two times)

[1, 2, 3]

In [40]:
# To get the minimum mode data point
min(statistics.multimode([1, 1, 2, 2, 3, 3, 4]))

1

In [41]:
# To get the maximum mode data point
max(statistics.multimode([1, 1, 2, 2, 3, 3, 4]))

3

#### 11.statistics.variance(data, xbar=None)
    - Variance, or second moment about the mean, is a measure of the variability (spread or dispersion) of data. 
    - A large variance indicates that the data is spread out; a small variance indicate viceversa

In [42]:
statistics.variance([2, 5, 3, 2, 8, 3, 9, 4, 2, 5, 6])

5.872727272727273

#### 12.statistics.pvariance(data, mu=None)
    - Return the population variance of data, a non-empty sequence or iterable of real-valued numbers.
    - In variance(), only the sample mean is taken into consideration, while 
      during pvariance(), the mean of entire population is taken into consideration.

In [43]:
statistics.pvariance([2, 5, 3, 2, 8, 3, 9, 4, 2, 5, 6])

5.338842975206612

In [44]:
data = [2, 5, 3, 2, 8, 3, 9, 4, 2, 5, 6]
m = statistics.mean(data)
statistics.pvariance(data, mu=m)

5.338842975206612

__NOTE:__ Apparently, the Python interpreter doesn't even check whether the value entered for mu 
    is the actual mean of data-set or not. Thus providing incorrect value would  lead to impossible answers 

In [45]:
data = [1, 2, 2, 4, 4, 4, 5, 6]
mu = statistics.mean(data)
print(f"mean value:{mu}")

pv = statistics.pvariance(data, mu)
print(f"pvariance :{pv}")

mean value:3.5
pvariance :2.5


#### 13.statistics.stdev(data, xbar=None)

In [46]:
statistics.stdev([2.5, 3.25, 5.5, 11.25, 11.75])

4.389618434442793

#### 14.statistics.pstdev(data, mu=None)

#### 15.statistics.quantiles(data, *, n=4, method='exclusive')

If you have previously calculated the mean, you can pass it as the optional second argument to the four "spread" functions to avoid recalculating it:

The harmonic mean, sometimes called the subcontrary mean, is the
reciprocal of the arithmetic mean of the reciprocals of the data,
and is often appropriate when averaging quantities which are rates
or ratios, for example speeds. Example:
        
        Suppose an investor purchases an equal value of shares in each of
        three companies, with P/E (price/earning) ratios of 2.5, 3 and 10.
        What is the average P/E ratio for the investor's portfolio?

In [47]:
help(statistics)

Help on module statistics:

NAME
    statistics - Basic statistics module.

MODULE REFERENCE
    https://docs.python.org/3.10/library/statistics.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module provides functions for calculating statistics of data, including
    averages, variance, and standard deviation.
    
    Calculating averages
    --------------------
    
    Function            Description
    mean                Arithmetic mean (average) of data.
    fmean               Fast, floating point arithmetic mean.
    geometric_mean      Geometric mean of data.
    harmonic_mean       Harmonic mean of data.
    median              Median (middle value) of data.
    median_low  