# Plotting with Matplotlib

## Programming and Data Management (EDI 3400)

### *Vegard H. Larsen (Department of Data Science and Analytics)*

## Plan for today: 

1. Introducing Matplotlib and Seaborn
3. Creating different types of plots 
4. In class exercises

# 1. Introducing Matplotlib

Matplotlib is a visualization library in Python, empowering users to create a vast array of plots and graphs to visually represent data. From simple line graphs and scatter plots to intricate histograms and 3D visualizations, Matplotlib offers tools to depict data in insightful ways. Matplotlib serves as the gateway to data visualization, a crucial skill in today's data-driven world. By translating numerical information into visual forms, you can uncover patterns, grasp complex concepts, and communicate findings more effectively. Through Matplotlib, you can transform raw data into compelling stories, enhancing both their analytical and presentation prowess.

## What is Matplotlib?

- The most popular Python library for visualizing and plotting data
- Integrates very well with NumPy and Pandas
- Can create publication quality plots
- Large scope for customization

In [None]:
## Importing Matplotlib

import matplotlib.pyplot as plt

## Let´s create a simple plot

In [None]:
## Let's create some data

import numpy as np

age = np.arange(5, 20, 1)
height = np.arange(110,170, 4)
print(age)
print(height)

In [None]:
plt.plot(age, height)
plt.show()

# Introducing Seaborn

Seaborn stands as a sophisticated visualization library built on top of Matplotlib, designed to create visually appealing and informative statistical graphics with ease. While Matplotlib excels in offering a granular control over plot elements, Seaborn simplifies complex visualization tasks with its high-level, data-centric interfaces. It comes with several built-in themes and color palettes to enhance the aesthetic appeal of plots, making them publication-ready. Seaborn offers a harmonious blend of simplicity and capability, enabling insightful visualizations without delving deep into customization details, thereby making the journey from raw data to insightful plots both efficient and enjoyable.

- Seaborn is a library for making statistical graphics in Python
- It builds on matplotlib and integrates nicely with Pandas
- Can be a faster way to get good-looking plots than relying on Matplotlib only
- Plotting can be used to understand the data better. When you get a new data set, always try to visualize it

In [None]:
# Importing seaborn 

import seaborn as sns

In [None]:
# Load an example data set for Seaborn (a Pandas DataFrame)

tips = sns.load_dataset("tips")

# Using .sample() is a good way of inspecting the DataFrame
tips.sample(5)

In [None]:
set(tips['size'])

## Can create very rich plots with very little code

In [None]:
# Create a visualization

sns.violinplot(data=tips, x="day", y="tip")
plt.show()

# 2. Different types of plots

## Line plots

In [None]:
## Let's create some data

import numpy as np

age = np.arange(5, 20, 1)
hight = np.arange(110,170, 4)
print(age)
print(hight)

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12,2))
plt.plot(age, height)
plt.show()

## Let's improve this plot

In [None]:
plt.figure(figsize=(8,3))
plt.title('Height is increasing with age', fontsize=18)
plt.plot(age, height, linestyle='-', lw=10, color='orange', label='Height by age')
plt.xlabel('Age', fontsize=14)
plt.ylabel('Hight',  fontsize=12)
plt.legend()
plt.show()

## Use markers instead of a line

In [None]:
plt.figure(figsize=(6,4))
plt.plot(age, height, 'o', lw=10)

# We can also set what values to include on the axis
plt.xlim(age.min()-1, age.max()+1)
plt.ylim(height.min()-10, height.max()+10)
plt.grid()

## Use two $y$-axis in the same plot 

In [None]:
# Load the tips data set again

tips = sns.load_dataset("tips")
tips.head()

In [None]:
# Group the data by the size of the dinner

tips_by_size = tips.groupby('size')[['total_bill', 'tip']].mean()
tips_by_size

In [None]:
# Plot the total bill and the tip on the same axis 

tips_by_size.plot(figsize=(8,3))

In [None]:
# Let's plot the total_bil and the tip on a different y-axis

fig, ax = plt.subplots(figsize=(8,3))

ax2 = ax.twinx()
ax2.grid(False)

p1, = ax.plot(tips_by_size.index, tips_by_size['total_bill'],
              lw=3, color='forestgreen', label='Total bill')
p2, = ax2.plot(tips_by_size.index, tips_by_size['tip'],
               lw=3, linestyle='--', label='Tip (right axis)')
ax.legend(handles=[p1, p2])
#ax.legend()
#ax2.legend()
plt.show()

In [None]:
# This is also possible with Pandas

tips_by_size['total_bill'].plot(figsize=(8,3), label='Total bill', lw=2, legend=True)
tips_by_size['tip'].plot(secondary_y=True, label='Tip', lw=2, legend=True)

## Scatter plots

In [None]:
plt.figure(figsize=(5,4))
plt.scatter(tips['total_bill'], tips['tip'], color='darkred', s=50, marker='^')
plt.xlabel('Value of total bill')
plt.ylabel('Value of tip')
plt.title('Total bill and tip')
plt.show()

## Bar plots

In [None]:
tip_by_day = tips.groupby('day')['tip'].mean()
tip_by_day

In [None]:
plt.figure(figsize=(5,3))
plt.title('Tip by day')
plt.bar(tip_by_day.index, tip_by_day)
plt.xlabel('Day of the week')
plt.ylabel('Average tip')
plt.show()

## Saving the figure 

In [None]:
plt.figure(figsize=(5,3))
plt.title('Tip by day')
plt.bar(tip_by_day.index, tip_by_day)
plt.xlabel('Day of the week')
plt.ylabel('Average tip')
plt.savefig('barPlotExample.png')
plt.show()

In [None]:
ls