# JOUR7280/COMM7780 Big Data Analytics for Media and Communication
# Tutorial: Introduction to matplotlib

# Introduction to matplotlib
The purpose of a plotting package is to assist the programmer visualizing data as easily as possible, with all the necessary
control, by using relatively high-level commands most of the time, and still have the ability to use the low-level commands when needed.

## Matplotlib: Standard Python Visualization Library

The primary plotting library we will explore in the course is [Matplotlib](http://matplotlib.org/).  As mentioned on their website: 
>Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shell, the jupyter notebook, web application servers, and four graphical user interface toolkits.

If you are aspiring to create impactful visualization with python, Matplotlib is an essential tool to have at your disposal.

### Matplotlib.Pyplot

One of the core aspects of Matplotlib is `matplotlib.pyplot`. It is Matplotlib's scripting layer. It is a collection of command style functions that make Matplotlib work like MATLAB. Each `pyplot` function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc. In this lab, we will work with the scripting layer to learn how to generate line charts.

Let's start by importing `Matplotlib.pyplot` as follows:

In [None]:
#important command to display IMMEDIATELY your plots
%matplotlib inline

#import libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

`%matplotlib inline` sets the backend of matplotlib to the 'inline' backend:
>With this backend, the output of plotting commands is displayed inline within frontends like the Jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document. When using the 'inline' backend, your matplotlib graphs will be included in your notebook, next to the code.


Parts of a Figure
=================

<img src="../figs/matplotlib_figure_parts.png" alt="drawing" width="550"/>

`axes.Axes`
------------------------------

This is what you think of as 'a plot', it is the region of the image
with the data space. A given figure
can contain many Axes, but a given `axes.Axes`
object can only be in one `figure.Figure` (the whole figure).  The
Axes contains two (or three in the case of 3D)
`axis.Axis` objects (be aware of the difference
between **Axes** and **Axis**) which take care of the data limits.
Each
`Axes` has a title, an x-label, and a y-label.


`axis.Axis`
------------------------------

They take care of setting the graph limits and generating the ticks (the marks
on the axis) and ticklabels (strings labeling the ticks).

`artist.Artist`
----------------------------------

Basically everything you can see on the figure is an artist (even the
`Figure`, `Axes`, and `Axis` objects).  This
includes `Text` objects, `Line2D` objects,...   When the figure is rendered, all of the artists are drawn to the **canvas**.  Most Artists are tied to an Axes; such an Artist
cannot be shared by multiple Axes, or moved from one to another.


Types of inputs to plotting functions
=====================================

All of plotting functions expect `np.array` or `np.ma.masked_array` as
input.  Classes that are 'array-like' such as `pandas` data objects
and `np.matrix` may or may not work as intended.  It is best to
convert these to `np.array` objects prior to plotting.

For example, to convert a `pandas.DataFrame`:


In [None]:
a = pd.DataFrame(np.random.rand(4,5), columns = list('abcde'))
a_asarray = a.to_numpy() # Convert the DataFrame to a NumPy array.
print(a)
print(type(a))
print('---------')
print(a_asarray)
print(type(a_asarray))

to convert a `np.matrix`:

In [None]:
b = np.matrix([[1,2],[3,4]])
b_asarray = np.asarray(b)
print(b)
print(type(b))
print('---------')
print(b_asarray)
print(type(b_asarray))

Matplotlib, pyplot and pylab: how are they related?
====================================================

`Matplotlib` is the whole package and `matplotlib.pyplot` is a module in
Matplotlib.

For functions in the pyplot module, there is always a "current" figure and
axes (which is created automatically on request).  For example, in the
following example, the first call to ``plt.plot`` creates the axes, then
subsequent calls to ``plt.plot`` add additional lines on the same axes, and
``plt.xlabel``, ``plt.ylabel``, ``plt.title`` and ``plt.legend`` set the
axes labels and title and add a legend.


In [None]:
x = np.linspace(0, 2, 100) # Return evenly spaced numbers over a specified interval.

plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')

plt.xlabel('x label')
plt.ylabel('y label')

plt.title("Simple Plot")

plt.legend()

plt.show()

In [None]:
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.show()


# Pyplot


An introduction to the pyplot interface.


Intro to pyplot
===============

`matplotlib.pyplot` is a collection of command style functions
that make matplotlib work like MATLAB.
Each ``pyplot`` function makes
some change to a figure: e.g., creates a figure, creates a plotting area
in a figure, plots some lines in a plotting area, decorates the plot
with labels, etc.

In `matplotlib.pyplot` various states are preserved
across function calls, so that it keeps track of things like
the current figure and plotting area, and the plotting
functions are directed to the current axes (please note that "axes" here
and in most places in the documentation refers to the *axes*
`part of a figure <figure_parts>`
and not the strict mathematical term for more than one axis).

Generating visualizations with pyplot is very quick:



In [None]:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4])
plt.ylabel('some numbers')
plt.show()

You may be wondering why the x-axis ranges from 0-3 and the y-axis
from 1-4.  If you provide a single list or array to the
`pyplot.plot` command, matplotlib assumes it is a
sequence of y values, and automatically generates the x values for
you.  Since python ranges start with 0, the default x vector has the
same length as y but starts with 0.  Hence the x data are
``[0,1,2,3]``.

`pyplot.plot` is a versatile command, and will take
an arbitrary number of arguments.  For example, to plot x versus y,
you can issue the command:



In [None]:
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])

Formatting the style of your plot
---------------------------------

For every x, y pair of arguments, there is an optional third argument
which is the format string that indicates the color and line type of
the plot.  The letters and symbols of the format string are from
MATLAB, and you concatenate a color string with a line style string.
The default format string is 'b-', which is a solid blue line.  For
example, to plot the above with red circles, you would issue



In [None]:
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], 'ro')
plt.axis([0, 6, 0, 20])
plt.show()

See the `pyplot.plot` [documentation](https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.pyplot.plot.html) for a complete
list of line styles and format strings.  The
`pyplot.axis` command in the example above takes a
list of ``[xmin, xmax, ymin, ymax]`` and specifies the viewport of the
axes.

If matplotlib were limited to working with lists, it would be fairly
useless for numeric processing.  Generally, you will use `numpy` arrays.  In fact, all sequences are
converted to numpy arrays internally.  The example below illustrates a
plotting several lines with different format styles **in one command**
using arrays.



In [None]:
import numpy as np

# evenly sampled time at 200ms intervals
t = np.arange(0., 5., 0.2)

# red dashes, blue squares and green triangles
plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
plt.show()

## Subplots

Often times we might want to plot multiple plots within the same figure. For example, we might want to perform a comparison of the pie chart with the line plot of immigration.

To visualize multiple plots together, we can create a **`figure`** (overall canvas) and divide it into **`subplots`**, each containing a plot. With **subplots**, we usually work with the **artist layer** instead of the **scripting layer**. 

Typical syntax is : <br>
```python
    fig = plt.figure() # create figure
    plt.subplot(nrows, ncols, plot_number) # create subplots
```
Where
- `nrows` and `ncols` are used to notionally split the figure into (`nrows` \* `ncols`) sub-axes,  
- `plot_number` is used to identify the particular subplot that this function is to create within the notional grid. `plot_number` starts at 1, increments across rows first and has a maximum of `nrows` * `ncols` as shown below.

<img src="../figs/Subplots.png" width=450 align="center">


### Plotting with categorical variables

It is also possible to create a plot using categorical variables.
Matplotlib allows you to pass categorical variables directly to
many plotting functions. For example:



In [None]:
names = ['group_a', 'group_b', 'group_c']
values = [1, 10, 100]

plt.figure(figsize=(9, 3))

plt.subplot(1,3,1)
plt.bar(names, values)
plt.subplot(1,3,2)
plt.scatter(names, values)
plt.subplot(1,3,3)
plt.plot(names, values)
plt.suptitle('Categorical Plotting') # Add a centered title to the figure
plt.show()