# An Investigation into the Anscombe's Quartet 

## 1. Introduction 

The Anscombe’s Quartet was first published in 1973 by Francis J. Anscombe in a paper called "Graphs in Statistical Analysis". Anscombe released the paper to regale against the common textbooks of the day that were in his view indoctrinating statisticians with the view that “numerical calculations are exact, but graphs are rough"(1). His ultimate goal was to highlight that just relying on summary statistics and numerical calculations was not best practice. Summary statistics is invaluable in being able to describe a vast, complex dataset using just a few key numbers(3). This gives statisticians something easy to optimize against and use as a barometer for further analysis. Anscombe wanted to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties (2).

In order to present his viewpoint effectively Anscombe created a group of four datasets that appear to be similar when using typical summary statistics, yet tell four different stories when graphed (3). Each dataset contains the mean of the x values as 9.0, mean of y values as 7.5. They all have nearly identical variances, correlations, and regression lines (to at least two decimal places)(2). It is not known how Anscombe created his datasets. 

Another interesting feature of this dataset is how Anscombe deals with outliers in data analysis. An outlier is a figure that moves regression line away from the mainstream. Anscombe uses the outliers in his datasets to highlight how important visualising the dataset can be in developing a sensible statistical model (4). The Anscombe Quartet challenged the old statistical model of relying on summary statistics, creating more reliable and user friendly results through data visualisation. 

## 2. Plot the data set

In [15]:
# These are the important packages that I will import to plot the interesting aspects of this dataset

import pandas as pd  # Data manipulation
import matplotlib.pyplot as plt  # Plots
import seaborn as sns  # Powerful plots
from scipy import stats  # Linear regression
import numpy as np  # Quick summary statistics


# https://vknight.org/unpeudemath/mathematics/2016/10/29/anscombes-quartet-variability-and-ciw.html

# Special command to print to the notebook, 
# https://stackoverflow.com/questions/19410042/how-to-make-ipython-notebook-matplotlib-plot-inline
%matplotlib inline 

# First I will load the dataset 
# https://seaborn.pydata.org/examples/anscombes_quartet.html
anscombe = sns.load_dataset("anscombe")
print (anscombe)



   dataset     x      y
0        I  10.0   8.04
1        I   8.0   6.95
2        I  13.0   7.58
3        I   9.0   8.81
4        I  11.0   8.33
5        I  14.0   9.96
6        I   6.0   7.24
7        I   4.0   4.26
8        I  12.0  10.84
9        I   7.0   4.82
10       I   5.0   5.68
11      II  10.0   9.14
12      II   8.0   8.14
13      II  13.0   8.74
14      II   9.0   8.77
15      II  11.0   9.26
16      II  14.0   8.10
17      II   6.0   6.13
18      II   4.0   3.10
19      II  12.0   9.13
20      II   7.0   7.26
21      II   5.0   4.74
22     III  10.0   7.46
23     III   8.0   6.77
24     III  13.0  12.74
25     III   9.0   7.11
26     III  11.0   7.81
27     III  14.0   8.84
28     III   6.0   6.08
29     III   4.0   5.39
30     III  12.0   8.15
31     III   7.0   6.42
32     III   5.0   5.73
33      IV   8.0   6.58
34      IV   8.0   5.76
35      IV   8.0   7.71
36      IV   8.0   8.84
37      IV   8.0   8.47
38      IV   8.0   7.04
39      IV   8.0   5.25
40      IV  19.0

## References
1. https://eagereyes.org/criticism/anscombes-quartet 
2. https://en.wikipedia.org/wiki/Anscombe%27s_quartet
3. https://heapanalytics.com/blog/data-stories/anscombes-quartet-and-why-summary-statistics-dont-tell-the-whole-story
4. https://www.quora.com/What-is-the-significance-of-Anscombes-quartet
5. https://vknight.org/unpeudemath/mathematics/2016/10/29/anscombes-quartet-variability-and-ciw.html
6. https://seaborn.pydata.org/examples/anscombes_quartet.html
7. https://stackoverflow.com/questions/19410042/how-to-make-ipython-notebook-matplotlib-plot-inline