# 1.7. Plotting With Python

### Warmup:
What is good or bad about the following data visualisations?

![Bad-Data-Visualization-Examples.jpg](attachment:Bad-Data-Visualization-Examples.jpg)
[source](https://blog.hubspot.com/marketing/great-data-visualization-examples) 

![Fox-News-bar-chart-1.jpeg](attachment:Fox-News-bar-chart-1.jpeg)
[source](https://flowingdata.com/2014/04/04/fox-news-bar-chart-gets-it-wrong/)

![unemployment-chart-by-fox-news.jpg](attachment:unemployment-chart-by-fox-news.jpg)
[source](https://flowingdata.com/2011/12/12/fox-news-still-makes-awesome-charts/)

![colours_map.png](attachment:colours_map.png)
[source](https://academy.datawrapper.de/article/140-what-to-consider-when-choosing-colors-for-data-visualization)

![mona_chalabi.jpeg](attachment:mona_chalabi.jpeg)
[source](https://medium.com/nightingale/power-to-the-powerless-an-interview-with-mona-chalabi-39d73647d80a)

### Why do we need plots? 



- not everybody is keen on numbers
- simple and efficient
- can immediately see trends 
- broader perspective
- to see all the data at once
- a good tool to present work
- to be able to draw conclusions/ see correlation 

### Anatomy of a Plot

![basic-elements-of-plot.png](attachment:basic-elements-of-plot.png)
[source](https://geo-python.github.io/2017/lessons/L7/plot-anatomy.html)

## Visualizing the penguin dataset

In [2]:
# Load the penguins dataset: two ways, either as a CSV or from Seaborn directly:


In [3]:
# check the shape


(344, 7)

In [None]:
# check the Nan-values


In [None]:
# drop Nan_values 


# This is equivalent to doing this:
#df = df.dropna()


In [None]:
# check the shape again


In [12]:
# The dataset is pre-installed with seaborn and can be accessed here:

In [None]:
#from Seaborn directly:
#list of the datasets in seaborn


In [None]:
# load directly from seaborn



## 1. Plotting with pandas

- plotting with pandas is based on matplotlib, you don't have to import plt!
- plotting with pandas is super practical, because there are in-built methods in the dataframe class
- plotting with pandas is limited, because you don't have all the possibilities to customise plots that matplotlib does


### Let's try to answer following questions!

1. Do larger penguins have longer flippers?

In [None]:
df.columns

In [None]:
# plot one column against another (body_mass_g','flipper_length_mm')


# Can also reference column numbers
#df.plot.scatter(x=df.columns[5], y='flipper_length_mm' ) 

2. Visualize the distribution of Bill Length

In [None]:
# we have multiple options: histogram, violin, kde (kernel density estimation), boxplot


3. Are male penguins heavier than female penguins?

In [None]:
plt.figure(figsize=(12,6))
# selecting by conditions, plotting two in one, introducing alpha, histogram
  
#df['body_mass_g'].loc[df['sex'] == 'male'].plot.hist(bins=20, alpha=0.5)


## 2. Plotting with Matplotlib

In [None]:
# print all available styles


In [9]:
# Call the function once in the beginning of your script/ notebook.


In [None]:
# Also set the figure parameters once at the start:
plt.figure(figsize=(12,6)) #Always first step! Set plot to 12, 6 inches!


# or: plt.plot, plt.bar, plt.boxplot, plt.hist,... 
 

#plt.show()  

### Add title and labels

In [None]:
plt.figure(figsize=(12,8)) 

plt.scatter(df['flipper_length_mm'], df['body_mass_g']) 
plt.title('Flipper Length and Body Mass')
plt.xlabel('Flipper Length')
plt.ylabel('Body Mass (g)')
#plt.show()



### More formatting

In [None]:
plt.figure(figsize=(12,8)) 

plt.scatter(df['flipper_length_mm'], df['body_mass_g'], s=100, c='blue', marker='v') 
plt.title('Flipper Length and Body Mass', fontsize=18)
plt.xlabel('Flipper Length')
plt.ylabel('Body Mass (g)')
#plt.show()



In [None]:
## Plot different species in different colors, which species do we have?


In [None]:
## Filtering by species


In [None]:
#set plot size
plt.figure(figsize= (12,8))



#plot first species
plt.scatter(df['body_mass_g'].loc[df['species'] == 'Gentoo'], df['flipper_length_mm'].loc[df['species'] == 'Gentoo'], label='Gentoo', c='red')

#plot second one

plt.scatter(df['body_mass_g'].loc[df['species'] == 'Adelie'], df['flipper_length_mm'].loc[df['species'] == 'Adelie'], label='Adelie', c='blue')

#plot third one

plt.scatter(df['body_mass_g'].loc[df['species'] == 'Chinstrap'], df['flipper_length_mm'].loc[df['species'] == 'Chinstrap'], label='Chinstrap', c='orange')


#plt title
plt.title('Flipper Length and Body Mass by Species', fontsize=18, loc='left')


#annotate
plt.xlabel('Body mass (g)')
plt.ylabel('Flipper length (mm)')

# rotate data ticks labels
plt.xticks(rotation=60)

#add legend

plt.legend(bbox_to_anchor=(1.02, 1), loc=2)

## Save the plot as a png file with relative path
#plt.savefig('plot.png') 

## Save the plot as a png file with path of choice
#save_results_to = '/home/parvin/Desktop/images/'
#plt.savefig(save_results_to + 'image.png', dpi = 300)

In [None]:
## 3. Plotting with Seaborn

*More examples in the course material*

In [None]:
df['island'].unique()

In [None]:
plt.figure(figsize= (12,8))

sns.scatterplot(data=df, x='bill_length_mm', y='bill_depth_mm', hue='sex')  

plt.title('Bill Length and Depth by Species')




# color palettes: https://seaborn.pydata.org/tutorial/color_palettes.html

#sns.despine()

In [None]:
sns.pairplot(df, hue ='species', palette='husl')
#sns.despine()

### How to use colors:



https://academy.datawrapper.de/article/140-what-to-consider-when-choosing-colors-for-data-visualization 

https://projects.susielu.com/viz-palette

## More advanced plotting
### Subplots
In matplotlib, the "figure" is like a container that holds plots (called "axes"). You can create a grid of plots, e.g. with two plots in one figure like this:

In [None]:
plt.subplots(nrows=1, ncols=2) 

In [None]:
my_plot, (ax1, ax2) = plt.subplots(nrows=2, ncols=1) 
ax1.scatter(df['body_mass_g'], df['bill_length_mm'])   #specifies first plot
ax2.scatter(df['body_mass_g'], df['bill_depth_mm'])     # specifies second plot

### 3D Plot

In [None]:
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
fig = plt.figure()
ax = fig.gca(projection='3d')
X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**4 + Y**2)
Z = np.tan(R)
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=5, cmap=cm.coolwarm)
plt.show()

# Article:
https://towardsdatascience.com/matplotlib-cheat-sheet-f441c43971c4

# Exercise:
- Pick one of the plot functions from the seaborn, 20 minutes



In [40]:
## Distplot

## Violinplot

## Countplot

## Boxplot

## Take Home Message

    1-Understand the context (who, what, how)
    2-Choose an appropriate display (lines, bars, ...)
    3-Eliminate clutter
    4-Draw attention where you want your audience to focus
    5-Think like a designer: Form follows function
    6-Tell a story