
# Statistics Functions Demonstration

This notebook demonstrates various functions from the `statistics` module in Python. Each cell contains a function with a detailed description and example usage.


In [None]:

import statistics
import numpy as np
import pandas as pd

# Prepare data for demonstration
data = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
data_with_duplicates = [2, 2, 3, 4, 5, 5, 6, 7, 8, 8, 8, 9, 10, 11, 11]
data_with_negatives = [2, -3, 4, -5, 6, -7, 8, -9, 10, -11]


In [None]:
statistics.Fraction(4, 4)

Fraction(1, 1)

In [None]:

# 1. Mean
# The mean() function calculates the arithmetic mean (average) of data.
# The arithmetic mean is the sum of the data divided by the number of data points.

mean_value = statistics.mean(data)
print(f"Mean: {mean_value}")


Mean: 6.5


In [None]:

# 2. Median
# The median() function calculates the median (middle value) of data.
# The median is the value separating the higher half from the lower half of a data sample.

median_value = statistics.median(data)
print(f"Median: {median_value}")


Median: 6.5


In [None]:

# 3. Mode
# The mode() function finds the most common value in the data.
# The mode is the value that appears most frequently in the data set.

mode_value = statistics.mode(data_with_duplicates)
print(f"Mode: {mode_value}")


Mode: 8


In [None]:

# 4. Standard Deviation
# The stdev() function calculates the standard deviation of data.
# The standard deviation is a measure of the amount of variation or dispersion in a set of values.

stdev_value = statistics.stdev(data)
print(f"Standard Deviation: {stdev_value}")


Standard Deviation: 3.0276503540974917


In [None]:


How `statistics.stdev(data)` calculates the standard deviation.

**1. Calculate the Mean:**

   - The function first computes the arithmetic mean (average) of the data. You've already seen this in your code with `statistics.mean(data)`.

**2. Find the Squared Differences:**

   - For each data point in your list (`data`), it subtracts the mean from that data point. This gives you the difference between each data point and the average.
   - Then, it squares each of these differences. This is done to ensure that both positive and negative deviations contribute equally to the overall variability.

**3. Calculate the Variance:**

   - The variance is the average of these squared differences. It's computed by summing up all the squared differences and then dividing by the number
    of data points (or `n - 1` for sample standard deviation). The `statistics.stdev()` function uses `n - 1` by default, which is the sample standard
     deviation and is more accurate for smaller datasets where you're trying to estimate the population standard deviation.

**4. Take the Square Root:**

   - Finally, it takes the square root of the variance. This gives you the standard deviation, which is a measure of how spread out the data is around the mean.


**In simpler terms:**

The standard deviation measures how much the data points typically deviate from the average. A larger standard deviation means the data is more spread out, while a smaller standard deviation means the data is more clustered around the average.


**Example:**

Let's say your data is `[2, 4, 6]`.

1. **Mean:** (2 + 4 + 6) / 3 = 4
2. **Squared Differences:**
   - (2 - 4)^2 = 4
   - (4 - 4)^2 = 0
   - (6 - 4)^2 = 4
3. **Variance:** (4 + 0 + 4) / (3 - 1) = 4
4. **Standard Deviation:** √4 = 2



In [None]:

# 5. Variance
# The variance() function calculates the variance of data.
# The variance is the expectation of the squared deviation of a random variable from its mean.

variance_value = statistics.variance(data)
print(f"Variance: {variance_value}")


Variance: 9.166666666666666


In [None]:

# 6. Median Low
# The median_low() function finds the low median of data.
# The low median is the lower of two middle values if the data size is even.

median_low_value = statistics.median_low(data)
print(f"Median Low: {median_low_value}")


Median Low: 6


In [None]:

# 7. Median High
# The median_high() function finds the high median of data.
# The high median is the higher of two middle values if the data size is even.

median_high_value = statistics.median_high(data)
print(f"Median High: {median_high_value}")


Median High: 7


In [None]:

# 8. Median Grouped
# The median_grouped() function estimates the median of grouped continuous data.
# It is useful for data presented in the form of a frequency distribution.

median_grouped_value = statistics.median_grouped(data)
print(f"Median Grouped: {median_grouped_value}")


Median Grouped: 6.5


In [None]:

# 9. Harmonic Mean
# The harmonic_mean() function calculates the harmonic mean of data.
# The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of the data.

harmonic_mean_value = statistics.harmonic_mean(data)
print(f"Harmonic Mean: {harmonic_mean_value}")


Harmonic Mean: 4.950795663588791


In [None]:

# 10. Geometric Mean
# The geometric_mean() function calculates the geometric mean of data.
# The geometric mean is the nth root of the product of n numbers, useful for data with exponential growth.

geometric_mean_value = statistics.geometric_mean(data)
print(f"Geometric Mean: {geometric_mean_value}")


Geometric Mean: 5.755930902871147


In [None]:

# 11. Pvariance (Population Variance)
# The pvariance() function calculates the population variance of data.
# The population variance measures the spread of data points in a population.

pvariance_value = statistics.pvariance(data)
print(f"Population Variance: {pvariance_value}")


Population Variance: 8.25


In [None]:

# 12. Pstdev (Population Standard Deviation)
# The pstdev() function calculates the population standard deviation of data.
# The population standard deviation measures the dispersion of a dataset relative to its mean.

pstdev_value = statistics.pstdev(data)
print(f"Population Standard Deviation: {pstdev_value}")


Population Standard Deviation: 2.8722813232690143


In [None]:

# 13. Quantiles
# The quantiles() function divides data into intervals with equal probability.
# It is useful for understanding the distribution of data.

quantiles_value = statistics.quantiles(data, n=4)
print(f"Quantiles: {quantiles_value}")


Quantiles: [3.75, 6.5, 9.25]


In [None]:

# 14. Fmean (Floating Point Mean)
# The fmean() function calculates the floating point mean of data.
# It is similar to the mean() function but avoids loss of precision for large datasets.

fmean_value = statistics.fmean(data)
print(f"Floating Point Mean: {fmean_value}")


Floating Point Mean: 6.5


In [None]:

# 15. Multimode
# The multimode() function finds the most common values in the data.
# It returns a list of the modes if there are multiple values with the same highest frequency.

multimode_value = statistics.multimode(data_with_duplicates)
print(f"Multimode: {multimode_value}")


Multimode: [8]


In [None]:
# 15. Create a dictionary to capture all results
results = {
    "Mean": statistics.mean(data),
    "Median": statistics.median(data),
    "Mode": statistics.mode(data_with_duplicates),
    "Standard Deviation": statistics.stdev(data),
    "Variance": statistics.variance(data),
    "Median Low": statistics.median_low(data),
    "Median High": statistics.median_high(data),
    "Median Grouped": statistics.median_grouped(data),
    "Harmonic Mean": statistics.harmonic_mean(data),
    "Geometric Mean": statistics.geometric_mean(data),
    "Population Variance": statistics.pvariance(data),
    "Population Standard Deviation": statistics.pstdev(data),
    "Quantiles": statistics.quantiles(data, n=4),
    "Floating Point Mean": statistics.fmean(data),
    "Multimode": statistics.multimode(data_with_duplicates)
}

# Convert results to a DataFrame for better readability
results_df = pd.DataFrame(list(results.items()), columns=["Statistic", "Value"])

# import ace_tools as tools; tools.display_dataframe_to_user(name="Statistics Functions Results", dataframe=results_df)
# The line above has been commented out because the ace_tools module was not found.  Use the pandas display function instead.
display(results_df)

Unnamed: 0,Statistic,Value
0,Mean,6.5
1,Median,6.5
2,Mode,8
3,Standard Deviation,3.02765
4,Variance,9.166667
5,Median Low,6
6,Median High,7
7,Median Grouped,6.5
8,Harmonic Mean,4.950796
9,Geometric Mean,5.755931
