# Unit 14 - Averages

## Mean

The formula for the **arithmetic mean** is

$$\text{Mean} = \frac{1}{N} \sum_i x_i$$

As an example, we calculate the mean price for a set of houses:

In [1]:
from __future__ import print_function, division
import numpy as np

houses = np.array([190000, 170000, 165000, 180000, 165000])
print("Mean price: {}".format(houses.sum()/houses.size))

Mean price: 174000.0


For a different set of houses, perhaps in a different part of town:

In [2]:
houses2 = np.array([2400000, 125000, 148000, 160000, 110000, 325000, 180000])
print("Mean price: {}".format(houses2.sum()/houses2.size))

Mean price: 492571.428571


Most of the houses are in the \$110,000 - \$180,000 range - two houses with larger prices shift the mean outside of this range.

## Median

To determine the median of a sample, you first need to sort it and then pick the one in the middle.

In [3]:
np.median(houses2)

160000.0

As a typical price of the houses in the area, this is a more reasonable value than the mean.

## Mode

Suppose we are at a children birthday party, with both children and their parents. The array below contains the ages of the people present.

In [4]:
ages = np.array([4, 3, 32, 33, 4, 32, 3, 38, 4])

The mean age of people at the party is

In [5]:
print("Mean age: {}".format(ages.mean()))

Mean age: 17.0


The mean is not very informative - there are no teenagers at this party. The median is

In [6]:
print("Median age: {}".format(np.median(ages)))

Median age: 4.0


Not really a useful statistic - does not capture the ages of the parents. If two more parents show up:

In [7]:
ages2 = np.array([4, 3, 32, 33, 4, 32, 3, 38, 4, 35, 36])
print("Median age: {}".format(np.median(ages2)))

Median age: 32.0


Now the median shifts to the parent's age range.

The **mode** is defined as the most frequent value in a sample. In the party example,

In [10]:
from scipy import stats

print("Mode: {}".format(stats.mode(ages2)[0][0]))

Mode: 4


Another example:

In [12]:
values = np.array([5, 9, 100, 9, 97, 6, 9, 98, 9])
print("Mean, median, mode: {} {} {}".format(values.mean(), np.median(values), stats.mode(values)[0][0]))

Mean, median, mode: 38.0 9.0 9


For the mode, ties may occur - those are broken at random. Yet another example:

In [13]:
values = np.array([3, 9, 3, 8, 2, 9, 1, 9, 2, 4])
print("Mean, median, mode: {} {} {}".format(values.mean(), np.median(values), stats.mode(values)[0][0]))

Mean, median, mode: 5.0 3.5 9


In this case there's an even number of values - there's nothing exactly at the centre. The value returned is the average of the two central values, 3 and 4.