# Statistics Tutorial - Lesson 1
# Mean, Median and Mode

## Mean
[Mean](https://en.wikipedia.org/wiki/Mean), or usually referred as arithmetic mean, is one of the basic concepts in statistics. It represents a central value of a finite set of numbers. The formula of mean is the sum of all numbers divided by the number of numbers, i.e.
$$\mu = \frac{\sum_{i=1}^{n}x_{i}}{n}$$

In [1]:
# Example 1
def get_mean(given_list):
    """
    Function for calculating arithmetic mean
    """
    return sum(given_list)/len(given_list)

# Calculate the mean of 5 stock price values
stock_prices =[40, 45, 46, 42, 39]
mean_value = get_mean(stock_prices)
print('Mean is {:.2f}'.format(mean_value))

Mean is 42.40


In [2]:
# Example 2
# by built-in statistics library
from statistics import mean
mean_value = mean(stock_prices)
print('Mean is {:.2f}'.format(mean_value))

Mean is 42.40


For data science, [NumPy](https://numpy.org/), a large collection of high-level mathematical functions, is often used.

In [3]:
# Example 3
# by NumPy
import numpy as np
stock_price_array = np.array(stock_prices)
mean_value = np.mean(stock_price_array)
print('Mean is {:.2f}'.format(mean_value))

Mean is 42.40


## Median
[Median](https://en.wikipedia.org/wiki/Median) is a number separating the higher half from the lower half of a data set. It is often thought as the midpoint value of a data set. If the data set has an odd number of numbers, the middle one is selected; if the data set has an even number of numbers, the median is usually defined as the average of the two middle values.

In [4]:
# Example 1
def get_median(sorted_list):
    """
    Calculate Median of given sorted list
    - the midpoint is meaningful when values are sorted
    """
    count = len(sorted_list)
    if count % 2 == 1:
        return sorted_list[count//2]
    else:
        return (sorted_list[count//2-1] + sorted_list[count//2]) / 2

stock_A_prices = sorted([40, 45, 46, 42, 39])    
median_A_value = get_median(stock_A_prices)
print('Median of A set is {}'.format(median_A_value))

stock_B_prices = sorted([30, 34, 36, 33, 42, 19])
median_B_value = get_median(stock_B_prices)
print('Median of B set is {}'.format(median_B_value))

Median of A set is 42
Median of B set is 33.5


In [5]:
# Example 2
# by built-in statistics library
from statistics import median

median_A_value = median(stock_A_prices)
print('Median of A set is {}'.format(median_A_value))

median_B_value = median(stock_B_prices)
print('Median of B set is {}'.format(median_B_value))

Median of A set is 42
Median of B set is 33.5


In [6]:
# Example 3
# by NumPy
stock_A_array = np.array(stock_A_prices) 
median_A_value = np.median(stock_A_array)
print('Median of A set is {}'.format(median_A_value))

stock_B_array = np.array(stock_B_prices) 
median_B_value = np.median(stock_B_array)
print('Median of B set is {}'.format(median_B_value))

Median of A set is 42.0
Median of B set is 33.5


## Mode
[Mode](https://en.wikipedia.org/wiki/Mode_%28statistics%29) is a number which appears most often in a data set. A data set is said as multimodal if no number in the set appears more than 1 time, so every number in the set is a valid mode.

In [7]:
# Example 1
def get_mode(given_list):
    """
    Get a mode from constructing unique numbers against their count
    """
    return max(set(given_list), key=given_list.count)

signal_list = [5, 4, 3, 2, 1, 2, 3, 3]
mode_value = get_mode(signal_list)
print('Mode is {}'.format(mode_value))

Mode is 3


In [8]:
# Example 2
# by built-in statistics library
from statistics import mode
mode_value = mode(signal_list)
print('Mode is {}'.format(mode_value))

Mode is 3


Another library often used in data science is [SciPy](https://scipy.org/). It contains common functions for scientific computing and technical computing.

In [9]:
# Example 3
# by SciPy
from scipy import stats
mode_result = stats.mode(signal_list)
print('Mode is {}'.format(mode_result.mode[0]))

Mode is 3
