# Plotting with Matplotlib

## Programming and Data Management (EDI 3400)

### *Vegard H. Larsen (Department of Data Science and Analytics)*

## Imperative vs. declarative ploting

### Imperative Plotting (we will focus on this):
**How it Works:** You explicitly instruct the computer step-by-step on how to create the plot. Essentially, you're giving a series of commands to produce the desired visualization.

**Control:** Offers detailed control over the plotting process. You decide how each step of the process is handled.

**Common in:** Lower-level plotting libraries like matplotlib in Python.

### Declarative Plotting:
**How it Works:** You describe what you want the plot to look like, and the library or framework takes care of the details. The focus is on the end result rather than the process to get there.

**Control:** While it abstracts away the detailed steps, it can be more intuitive and quicker for common tasks. Some customization might be less straightforward compared to the imperative approach.

**Common in:** Higher-level plotting libraries/frameworks like Altair.

# 1. Introducing Matplotlib

Matplotlib is a visualization library in Python, empowering users to create a vast array of plots and graphs to visually represent data. From simple line graphs and scatter plots to intricate histograms and 3D visualizations, Matplotlib offers tools to depict data in insightful ways. Matplotlib serves as the gateway to data visualization, a crucial skill in today's data-driven world. By translating numerical information into visual forms, you can uncover patterns, grasp complex concepts, and communicate findings more effectively. Through Matplotlib, you can transform raw data into compelling stories, enhancing both their analytical and presentation prowess.

## What is Matplotlib?

- The most popular Python library for visualizing and plotting data
- Integrates very well with NumPy and Pandas
- Can create publication quality plots
- Large scope for customization

In [None]:
## Importing Matplotlib

import matplotlib.pyplot as plt

## Let´s create a simple plot

In [None]:
## Let's create some data

import numpy as np

age = np.arange(5, 20, 1)
height = np.arange(110,170, 4)
print(age)
print(height)

In [None]:
plt.plot(age, height)
plt.show()

# 2. Introducing Seaborn

Seaborn stands as a sophisticated visualization library built on top of Matplotlib, designed to create visually appealing and informative statistical graphics with ease. While Matplotlib excels in offering a granular control over plot elements, Seaborn simplifies complex visualization tasks with its high-level, data-centric interfaces. It comes with several built-in themes and color palettes to enhance the aesthetic appeal of plots, making them publication-ready. Seaborn offers a harmonious blend of simplicity and capability, enabling insightful visualizations without delving deep into customization details, thereby making the journey from raw data to insightful plots both efficient and enjoyable.

- Seaborn is a library for making statistical graphics in Python
- It builds on matplotlib and integrates nicely with Pandas
- Can be a faster way to get good-looking plots than relying on Matplotlib only
- Plotting can be used to understand the data better. When you get a new data set, always try to visualize it
- Seaborn is a library that lies in between imperative and declarative plotting

In [None]:
# Importing seaborn 

import seaborn as sns

In [None]:
# Load an example data set for Seaborn (a Pandas DataFrame)

tips = sns.load_dataset("tips")

# Using .sample() is a good way of inspecting the DataFrame
tips.sample(5)

In [None]:
set(tips['size'])

## Can create very rich plots with very little code

In [None]:
# Create a visualization

sns.violinplot(data=tips, x="day", y="tip")
plt.show()

# Altair example

Altair is a declarative statistical visualization library for Python. It's built on a solid foundation of theorems about visualization and leverages the power of the Vega and Vega-Lite visualization grammars. With Altair, users specify what they want the visualization to represent in terms of data relationships, rather than specifying the intricate details of the rendering process. This provides a concise and consistent interface for creating a wide range of visualizations with minimal code.

In [None]:
import altair as alt
import seaborn as sns

# Load the tips dataset from Seaborn
tips = sns.load_dataset("tips")

# Create a scatter plot
chart = alt.Chart(tips).mark_circle(size=60).encode(
    x='total_bill:Q',
    y='tip:Q',
    color='sex:N',
    tooltip=['total_bill:Q', 'tip:Q', 'sex:N']
).properties(
    title="Relationship between Total Bill and Tip",
    width=500,
    height=400
)

chart.display()

# 3. Different types of plots

## Line plots

In [None]:
## Let's create some data

import numpy as np

age = np.arange(5, 20, 1)
hight = np.arange(110,170, 4)
print(age)
print(hight)

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12,2))
plt.plot(age, height)
plt.show()

## Let's improve this plot

In [None]:
plt.figure(figsize=(8,3))
plt.title('Height is increasing with age', fontsize=18)
plt.plot(age, height, linestyle='-', lw=10, color='orange', label='Height by age')
plt.xlabel('Age', fontsize=14)
plt.ylabel('Hight',  fontsize=12)
plt.legend()
plt.show()

## Use markers instead of a line

In [None]:
plt.figure(figsize=(6,4))
plt.plot(age, height, 'o', lw=10)

# We can also set what values to include on the axis
plt.xlim(age.min()-1, age.max()+1)
plt.ylim(height.min()-10, height.max()+10)
plt.grid()

## Use two $y$-axis in the same plot 

In [None]:
# Load the tips data set again

tips = sns.load_dataset("tips")
tips.head()

In [None]:
# Group the data by the size of the dinner

tips_by_size = tips.groupby('size')[['total_bill', 'tip']].mean()
tips_by_size

In [None]:
# Plot the total bill and the tip on the same axis 

tips_by_size.plot(figsize=(8,3))

In [None]:
# Let's plot the total_bil and the tip on a different y-axis

fig, ax = plt.subplots(figsize=(8,3))

ax2 = ax.twinx()
ax2.grid(False)

p1, = ax.plot(tips_by_size.index, tips_by_size['total_bill'],
              lw=3, color='forestgreen', label='Total bill')
p2, = ax2.plot(tips_by_size.index, tips_by_size['tip'],
               lw=3, linestyle='--', label='Tip (right axis)')
ax.legend(handles=[p1, p2])
#ax.legend()
#ax2.legend()
plt.show()

In [None]:
# This is also possible with Pandas

tips_by_size['total_bill'].plot(figsize=(8,3), label='Total bill', lw=2, legend=True)
tips_by_size['tip'].plot(secondary_y=True, label='Tip', lw=2, legend=True)

## Scatter plots

In [None]:
plt.figure(figsize=(5,4))
plt.scatter(tips['total_bill'], tips['tip'], color='darkred', s=50, marker='^')
plt.xlabel('Value of total bill')
plt.ylabel('Value of tip')
plt.title('Total bill and tip')
plt.show()

## Bar plots

In [None]:
tip_by_day = tips.groupby('day')['tip'].mean()
tip_by_day

In [None]:
plt.figure(figsize=(5,3))
plt.title('Tip by day')
plt.bar(tip_by_day.index, tip_by_day)
plt.xlabel('Day of the week')
plt.ylabel('Average tip')
plt.show()

## Saving the figure 

In [None]:
plt.figure(figsize=(5,3))
plt.title('Tip by day')
plt.bar(tip_by_day.index, tip_by_day)
plt.xlabel('Day of the week')
plt.ylabel('Average tip')
plt.savefig('barPlotExample.png')
plt.show()

# Linear equations

Let's consider two linear equations:

1) $ y = 2x + 1 $
2) $ y = -x + 5 $

The `np.linalg.solve` function is used to solve linear matrix equations, or more precisely, to solve a linear matrix equation $Ax = b$. When given a square 2D array (matrix) $A$ and a 1D array (vector) $b$, it returns the solution to the equation in terms of $x$.

To represent the system of equations as a matrix equation $Ax = b$, you'll first have to convert the system of equations into matrix form.

Given our two equations:

1) $y = 2x + 1$
2) $y = -x + 5$

We can rearrange the terms to get:
1) $-2x + y = 1$
2) $x + y = 5$

The matrix $A$ and vector $b$ for our system is:

$$ A = \begin{bmatrix} -2 & 1 \\ 1 & 1 \end{bmatrix} $$
$$ b = \begin{bmatrix} 1 \\ 5 \end{bmatrix} $$

Now we can solve for $x$ (which will contain both $x$ and $y$ values):

In [None]:
import numpy as np

# Define the coefficient matrix A and vector b
A = np.array([[-2, 1], [1, 1]])
b = np.array([1, 5])

# Solve for x using np.linalg.solve
x = np.linalg.solve(A, b)

print(f"The solution is x = {x[0]} and y = {x[1]}")

We'll plot both lines and then determine their intersection point. The intersection point is the solution to the system of equations.

Here's how you can do it using `matplotlib` and `numpy`:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Define the functions
def f1(x):
    return 2*x + 1

def f2(x):
    return -x + 5

# Generate x values
x = np.linspace(-2, 4, 400)

# Calculate y values
y1 = f1(x)
y2 = f2(x)

# Plotting both functions
plt.plot(x, y1, '-r', label='y=2x+1')
plt.plot(x, y2, '-b', label='y=-x+5')
plt.title('Intersection of two linear equations')
plt.xlabel('x', color='#1C2833')
plt.ylabel('y', color='#1C2833')
plt.axhline(0, color='black',linewidth=0.5)
plt.axvline(0, color='black',linewidth=0.5)
plt.grid(color = 'gray', linestyle = '--', linewidth = 0.5)

# Finding the intersection point using numpy
intersection_x = np.linalg.solve([[2, -1], [-1, -1]], [-1, -5])
plt.plot(intersection_x[0], intersection_x[1], 'go')  # intersection point

plt.legend(loc='upper left')
plt.show()

# Interactive ploting

In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from ipywidgets import interact

# Load the tips dataset
tips = sns.load_dataset('tips')

def interactive_scatter_plot(x_col, y_col, color_col):
    # Plotting
    plt.figure(figsize=(10, 6))
    sns.scatterplot(data=tips, x=x_col, y=y_col, hue=color_col)
    plt.title(f'Scatter Plot of {y_col} vs {x_col}')
    plt.show()

# Use ipywidgets interact
interact(
    interactive_scatter_plot, 
    x_col=['total_bill', 'tip', 'size'], 
    y_col=['total_bill', 'tip', 'size'], 
    color_col=['sex', 'day', 'time', 'smoker']
);
