# Development Essentials course

## Python for Data Analysis - part 3

### Matplotlib

[Matplotlib](https://matplotlib.org/) is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

#### Lines and dots

In [None]:
# Start with the baseLINE

plt.plot([1, 2, 3, 4, 5])
plt.ylabel('Some numbers')
plt.show()

In [None]:
# Here we use both 'x' and 'y' values
# to draw a line for two axes

xpoints = np.array([1, 50])
ypoints = np.array([3, 5])
plt.plot(xpoints, ypoints)
plt.show()

In [None]:
# Add some non-linearity...

xpoints = np.array([1, 2, 6, 8])
ypoints = np.array([3, 8, 1, 10])
plt.plot(xpoints, ypoints)
plt.show()

In [None]:
# ...or we can use `scatter` for points only

plt.scatter(xpoints, ypoints)
plt.show()

In [None]:
# `plot` has many parameters to change layout
# here we use `marker` parameter to specify points

plt.plot(xpoints, ypoints, marker = '*')
plt.show()

In [None]:
# More options to change line or color

plt.plot(xpoints, ypoints, 'x--r')
plt.grid()  # grid layout at the plot
plt.show()

In [None]:
# Size of the plot can also be changed

plt.figure(figsize=(2, 2))  # `figsize` sets the size of a plot
plt.plot(xpoints, ypoints, 'x--r')
plt.grid()  # grid layout at the plot
plt.show()

In [None]:
# With the `subplots()` function you can draw 
# multiple plots in one figure

plt.figure(figsize=(8, 2))

# plot 1
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)  # the total figure has 1 row, 2 columns, and this plot is the first plot
plt.plot(x,y)
plt.title('INCOME')

# plot 2
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)  # the total figure has 1 row, 2 columns, and this plot is the second plot
plt.plot(x,y)
plt.title('SALES')

plt.show()

#### Bars

In [None]:
x = np.array(['A', 'B', 'C', 'D'])
y = np.array([3, 8, 1, 10])
plt.bar(x, y)
plt.title('What a nice barplot')  # use it for title
plt.show()

In [None]:
x = np.array(['A', 'B', 'C', 'D'])
y = np.array([3, 8, 1, 10])
plt.barh(x, y, color='#4CAF50')  # horizontal bar
plt.title('I am lying on my side')
plt.show()

#### Pie charts

In [None]:
y = np.array([35, 25, 25, 15])
x = ['part 1', 'part 2', 'part 3', 'part 4']
plt.pie(y)
plt.title('Pie chart')
plt.legend(x)  # you may add a legend for the plot
plt.show()

In [None]:
y = np.array([35, 25, 25, 15])
my_labels = ['Apples', 'Bananas', 'Cherries', 'Dates']
my_colors = ['orange', 'hotpink', 'b', '#4CAF50']
my_explode = [.2, 0, 0, 0]  # makes 'Apples' fly away if needed
plt.pie(
    y, 
    labels=my_labels, 
    colors=my_colors, 
    explode=my_explode
)
#plt.legend(title='Four Fruits:', loc='upper left')  # uncomment for the legend to add
plt.show()

### Statistics

[This module](https://docs.python.org/3/library/statistics.html) provides functions for calculating mathematical statistics of numeric data.

In [None]:
import statistics as st

In [None]:
x = [1, 2, 3, 4, 4]
print('Mean for', x, 'is:', st.mean(x))

In [None]:
x = [1, 2, 3, 4, 10000]
print('Median for', x, 'is:', st.median(x))

In [None]:
x = [1, 1, 2, 3, 3, 3, 3, 4]
print('Mode for', x, 'is:', st.mode(x))

In [None]:
x = ['red', 'blue', 'blue', 'red', 'green', 'red', 'red']
print('Mode for', x, 'is:', st.mode(x))

In [None]:
x = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
print('Variance for', x, 'is:', st.variance(x))

### Seaborn

[Seaborn](https://seaborn.pydata.org/) is a Python data visualization library based on `matplotlib`. It provides a high-level interface for drawing attractive and informative statistical graphics.

In [None]:
import seaborn as sns

sns.__version__

#### Datasets

`Seaborn` has many built-in demo dataset that can be loaded with `load_dataset()` function to get quick access to an example dataset. There’s nothing special about these datasets: they are just pandas dataframes.

In [None]:
# List available datasets

print('Here are datasets:', sns.get_dataset_names())

In [None]:
tips = sns.load_dataset('tips')  # Load an example dataset
tips.head()  # It is just a pandas dataframe

In [None]:
tips.info()

In [None]:
tips.describe()

#### Visualizations

In [None]:
sns.set_theme(style='darkgrid')  # We can select theme 'whitegrid', 'white', 'dark'…
tips = sns.load_dataset('tips')  # Load an example dataset

In [None]:
# Create a visualization and
# start with couple of relational plots

sns.relplot(                 
    data=tips,
    x='total_bill', 
    y='tip', 
    col='time',
    hue='smoker', 
    size='size'
)
plt.show()

In [None]:
# Add some statistical analysis. 
# It is possible to enhance a scatterplot 
# by including a linear regression model 
# (and its uncertainty) using `lmplot()`

sns.lmplot(
    data=tips, 
    x='total_bill', 
    y='tip', 
    col='time', 
    hue='smoker'
)
plt.show()

In [None]:
# Distribution plots are very handy
# for the data analysis

sns.displot(
    data=tips, 
    x='total_bill', 
    col='time', 
    kde=True
)
plt.show()