<center><h1> Plotting and Visualization Notebook </h1></center>

__Created by:__ Vardhan Dongre (for personal use)<br>
__Based on:__ Wes Mckinney's "Python for Data Analysis"


This notebook includes some basic examples for using matplotlib for visualizing data in a structured and organized format. matplotlib is a low-level plotting tool, various other plotting options are available in seaborn and pandas packages that use matplotlib for their underlying plotting. I created this notebook to develop a goto resource for my reference. If you wish to use the notebook for tutorial purposes kindly give credits to the original author and the creator. Thanks.

In [1]:
%matplotlib notebook

In [2]:
import numpy as np
import matplotlib.pyplot as plt

In [3]:
# Simple Line Plot
data = np.arange(10)
plt.plot(data)

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x11cf18748>]

In [4]:
# Creating a figure object and sub-plots
fig = plt.figure() # fig is the figure object
# fig.add_subplots creates AxesSubplot objects
ax1 = fig.add_subplot(2,2,1)
ax2 = fig.add_subplot(2,2,2)
ax3 = fig.add_subplot(2,2,3)

<IPython.core.display.Javascript object>

In [5]:
# Notice the location where the plots gets added to the figure
# They get cumulatively added to the same sub-plot 

fig = plt.figure()
ax1 = fig.add_subplot(2,2,1)
plt.plot(np.random.randn(50).cumsum(),'k--',color='r')
ax2 = fig.add_subplot(2,2,2)
ax3 = fig.add_subplot(2,2,3)
plt.plot(np.random.randn(50).cumsum(),'k--',color='g')
plt.plot(np.random.randn(50).cumsum(),'k--',color='k')
plt.plot(np.random.randn(50).cumsum(),'k--',color='r')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x11d07e550>]

In [6]:
# Lets add some histograms and scatterplots to the above sub-plots but in a different figure
fig = plt.figure()
ax1 = fig.add_subplot(2,2,1)
ax2 = fig.add_subplot(2,2,2)
ax3 = fig.add_subplot(2,2,3)
# Once we create the AxesSubplot objects we can directly use them to plot in the respective sub-plot
ax1.hist(np.random.randn(100),bins=20,color='k',alpha=0.3)

<IPython.core.display.Javascript object>

(array([ 2.,  1.,  2.,  3.,  5.,  5.,  2.,  4.,  9.,  6., 12., 12.,  7.,
         4.,  2.,  8.,  6.,  6.,  1.,  3.]),
 array([-2.3262903 , -2.11725115, -1.908212  , -1.69917285, -1.4901337 ,
        -1.28109456, -1.07205541, -0.86301626, -0.65397711, -0.44493796,
        -0.23589881, -0.02685966,  0.18217948,  0.39121863,  0.60025778,
         0.80929693,  1.01833608,  1.22737523,  1.43641438,  1.64545353,
         1.85449267]),
 <a list of 20 Patch objects>)

So we can see that when we call the hist it not only creates a histogram in the specified sub-plot but also gives out arrays, so next time we'll catch it in a redundant variable if we wish to avoid priniting them out

In [7]:
# Lets add some histograms and scatterplots to the above sub-plots but in a different figure
fig = plt.figure()
ax1 = fig.add_subplot(2,2,1)
ax2 = fig.add_subplot(2,2,2)
ax3 = fig.add_subplot(2,2,3)
# Once we create the AxesSubplot objects we can directly use them to plot in the respective sub-plot
_ = ax1.hist(np.random.randn(100),bins=20,color='k',alpha=0.3)
ax2.scatter(np.arange(30),np.arange(30)+3*np.random.randn(30))
ax3.plot(np.random.randn(50).cumsum(),'k--',color='k')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x11d205f98>]

In [8]:
# Making sub-plots using plt.subplots
# This creates both figure and axessubplot objects
fig, axes = plt.subplots(2,3)

<IPython.core.display.Javascript object>

In [9]:
from itertools import product
# We can adjust the spacing around the subplots
fig, axes = plt.subplots(2,2,sharex=True,sharey=True)
for count, (i,j) in enumerate(product(range(0,2,1),range(0,2,1))):
    axes[i,j].hist(np.random.randn(500),bins=50,color='g',alpha=0.5)
plt.subplots_adjust(wspace=0,hspace=0)

<IPython.core.display.Javascript object>


Notice the iteration method to automate multiple plots and in the plot notice that there is no spacing between different sub-plots 

Now lets check the docstring for the plot function to see the full set of settings that can be used

In [10]:
plt.plot?

In [11]:
# Marking the points on the line plot 
# Notice the line-style specified k = black, o = marker, -- = line
fig = plt.figure()
plt.plot(np.random.randn(30).cumsum(), 'ko--')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x11d6dc438>]

In [12]:
# The default way of plotting is to interpolate the points we can change that by specifying the drawstyle
fig = plt.figure()
data = np.random.randn(30).cumsum()
plt.plot(data,'k--',label='Default')
plt.plot(data,'k--',drawstyle='steps-post', label='steps-post')
plt.legend(loc='best')

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x11d71f080>

### Adjusting the Ticks, Labels and Legends <font color="red">(Imp.)</font>

The pyplot interface consists of methods like:
<ol>
    <li>xlim - to control plot range</li>
    <li>xticks - to control tick locations</li>
    <li>xticklabels - to control tick labels</li>
</ol>
    

In [13]:
plt.xlim()
# Just calling the xlim without attributes prints the limits, here it prints from the previous figure

(-1.4500000000000002, 30.45)

In [14]:
# Setting the ticks and ticklabels
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(np.random.randn(1000).cumsum())
ticks = ax.set_xticks([0,250,500,750,1000])
labels = ax.set_xticklabels(['one','two','three','four','five'],rotation=30,fontsize='small')
# ax.set_title('A random matplotlib plot :/ ')
# ax.set_xlabel('Stages')

# An alternate way of setting things quickly/optimally
props = {
    'title': 'A random matplotlib plot :D',
    'xlabel': 'Stages'
}
ax.set(**props)

<IPython.core.display.Javascript object>

[Text(0.5, 0, 'Stages'), Text(0.5, 1.0, 'A random matplotlib plot :D')]

In [15]:
# Adding legends
fig = plt.figure(); ax = fig.add_subplot(1,1,1)
ax.plot(np.random.randn(1000).cumsum(), 'g', label='one')
ax.plot(np.random.randn(1000).cumsum(), 'r', label='two')
ax.plot(np.random.randn(1000).cumsum(), 'b', label='three')
ax.plot(np.random.randn(1000).cumsum(), 'y', label='_nolegend_')
ax.legend(loc='best')
props = {
    'title':"A Random Plot",
    'ylabel': 'Some y unit',
    'xlabel':"Some X unit"
}
ax.set(**props)

<IPython.core.display.Javascript object>

[Text(0, 0.5, 'Some y unit'),
 Text(0.5, 0, 'Some X unit'),
 Text(0.5, 1.0, 'A Random Plot')]

Notice: figsize is pre-defined thus we can control the output size of the plot, labels are shown for only three and not for the fourth yellow plot since we chose its label as '_nolegend_' and legend is automatically located in the best space there are several other locations for its placement on plot

In [16]:
# check docstring of plot.legend to find more locations of placing the the legend
plt.legend?

### Annotating Plots and Patching

In [17]:
# For this I have used the dataset provided by Wes on hit github repo for the book

In [18]:
import pandas as pd
from datetime import datetime
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(1,1,1)

data = pd.read_csv('spx.csv', index_col=0, parse_dates=True)
spx = data['SPX']

spx.plot(ax=ax, style='k')

crisis_data = [
    (datetime(2007,10,11), 'Peak of Bull Market'),
    (datetime(2008,3,12), 'Bear Stearns Fails'),
    (datetime(2008,9,15), 'Lehman Bankruptcy')
]

for date, label in crisis_data:
    ax.annotate(label, xy=(date,spx.asof(date)+75),xytext=(date,spx.asof(date)+225),
                arrowprops=dict(facecolor='red',headwidth=4,width=3,headlength=4),
               horizontalalignment='left', verticalalignment='top')
    
# Zooming in on the specific region
ax.set_xlim(['1/1/2007','1/1/2011'])
ax.set_ylim([600,1800])

ax.set_title('Important dates in the 2008-2009 financial crisis')

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'Important dates in the 2008-2009 financial crisis')

Notice the annotation tool is ax.annotate rest of the code is for selecting appropriate data. It is important to focus on the region by zooming. Observe that by setting set_xlim and set_ylim we can bring a portion of the plot in the focus

In [19]:
# Adding Shapes/Patches on the plot
fig = plt.figure()
ax = fig.add_subplot(1,1,1)

rect = plt.Rectangle((0.2,0.75),0.4,0.15,color='k',alpha=0.3)
circ = plt.Circle((0.7,0.2),0.15,color='b',alpha=0.3)
poly = plt.Polygon([[0.15,0.15],[0.35,0.4],[0.2,0.6]],color='g',alpha=0.5)

ax.add_patch(rect)
ax.add_patch(circ)
ax.add_patch(poly)

<IPython.core.display.Javascript object>

<matplotlib.patches.Polygon at 0x11fc5c278>

### Saving Plots to the file

In [20]:
plt.savefig('patch.pdf')
# This saved the last figure we generated in the location where this notebook is saved

In [21]:
# In order to maintain a specific dpi and whitespace we can use:
plt.savefig('patch.png',dpi=400,bbox_inches='tight')
# Here bbox_inches determines the whitespace around the plot

#### Customizations : Format, Font, Color Schemes etc.
In order to generate plot that can be considered for publication, we can customize the default parameters that govern the figure size, subplot spacing, colors, font sizes, grid styles etc. 

One method is to programmatically customize using: __<font color='red'>"plt.rc"</font>__

In [28]:
# Example
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(np.random.randn(50).cumsum(), 'k--')
# To customize we can create a dictionary 
font_options = {'family': 'monospace',
        'weight': 'bold',
        'size': 9
}
plt.rc('font', **font_options)

<IPython.core.display.Javascript object>

Another and more extensive method to customize is to customize the configuration file <font color='red'>'matplotlibrc'</font> that can be found in the matplotlib/mpl-data directory. By customizing this file and placing it in our home directory titled <font color='red'>'.matplotlibrc'</font>, It will be loaded every time we use matplotlib

In [30]:
# Other Interesting customizations
fig = plt.figure()
with plt.style.context('dark_background'):
     plt.plot(np.sin(np.linspace(0, 2 * np.pi)), 'r-o')
plt.grid(c='w')
plt.show()

<IPython.core.display.Javascript object>

<center><h1><font color='green'> ************End of Textbook reference************** </font></h1></center>
<h2> Below this, I will be continuously adding interesting and advanced things that are rarely needed </h2>

In [None]:
# updating......

In [31]:
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(1,1,1, aspect=1)

ax.scatter([.5],[.5], c='#FFCC00', s=120000, label="face")
ax.scatter([.35, .65], [.63, .63], c='k', s=1000, label="eyes")

X = np.linspace(.3, .7, 100)
Y = 2* (X-.5)**2 + 0.30

ax.plot(X, Y, c='k', linewidth=8, label="smile")

ax.set_xlim(0,1)
ax.set_ylim(0,1)

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.set_xticks([])
ax.set_yticks([])
# Code Ref:https://gist.github.com/bbengfort/dd9d8027a37f3a96c44323a8303520f0

<IPython.core.display.Javascript object>

[]