# Data visualizations 

Credits
-  Ben Hamner: (https://www.kaggle.com/benhamner/d/uciml/iris/python-data-visualizations)

### A Simple Example: the Iris Datase

As an example of a simple dataset, we're going to take a look at the iris data stored by scikit-learn.
The data consists of measurements of three different iris flower species.  There are three different species of iris
in this particular dataset as illustrated below:


<img src="Images/Iris_Type.jpg" width="100%">



### Loading the Iris Data with Scikit-learn

In [None]:
# First, we'll import pandas, a data processing and CSV file I/O library
import pandas as pd

In [None]:
# Next, we'll load the Iris flower dataset
iris = pd.read_csv("./Datasets/IRIS.csv") # the iris dataset is now a Pandas DataFrame


For example, scikit-learn has a very straightforward set of data on these iris species.  The data consist of
the following:

- Features in the Iris dataset:

  1. sepal length in cm
  2. sepal width in cm
  3. petal length in cm
  4. petal width in cm

- Target classes to predict:

  1. Iris Setosa
  2. Iris Versicolour
  3. Iris Virginica
  
  <img src="Images/Iris_Measure.png" width="100%">

In [None]:
# Let's see what's in the iris data - Jupyter notebooks print the result of the last thing you do
iris

In [None]:
# Let's see how many examples we have of each species
iris["Species"].value_counts()

In [None]:

import matplotlib.pyplot as plt
%matplotlib inline


In [None]:
import seaborn as sns
sns.set(style="white", color_codes=True)

In [None]:

# One piece of information missing in the plots above is what species each plant is
# We'll use seaborn's FacetGrid to color the scatterplot by species
sns.FacetGrid(iris, hue="Species",size=5) \
   .map(plt.scatter, "Sepal.Length", "Sepal.Width") \
   .add_legend()

In [None]:

# We can look at an individual feature in Seaborn through a boxplot
sns.boxplot(x="Species", y="Petal.Length", data=iris)

In [None]:
# One way we can extend this plot is adding a layer of individual points on top of
# it through Seaborn's striplot
# 
# We'll use jitter=True so that all the points don't fall in single vertical lines
# above the species
#
# Saving the resulting axes as ax each time causes the resulting plot to be shown
# on top of the previous axes
ax = sns.boxplot(x="Species", y="Petal.Length", data=iris)
ax = sns.stripplot(x="Species", y="Petal.Length", data=iris, jitter=True, edgecolor="black")

In [None]:
# A violin plot combines the benefits of the previous two plots and simplifies them
# Denser regions of the data are fatter, and sparser thiner in a violin plot
sns.violinplot(x="Species", y="Petal.Length", data=iris, size=6)

In [None]:
# A final seaborn plot useful for looking at univariate relations is the kdeplot,
# which creates and visualizes a kernel density estimate of the underlying feature
sns.FacetGrid(iris, hue="Species", size=6) \
   .map(sns.kdeplot, "Petal.Length") \
   .add_legend()

In [None]:

# Another useful seaborn plot is the pairplot, which shows the bivariate relation
# between each pair of features
# 
# From the pairplot, we'll see that the Iris-setosa species is separataed from the other
# two across all feature combinations
sns.pairplot(iris.drop("ID", axis=1), hue="Species", size=3)

In [None]:
# The diagonal elements in a pairplot show the histogram by default
# We can update these elements to show other things, such as a kde
sns.pairplot(iris.drop("ID", axis=1), hue="Species", size=3, diag_kind="kde")