
<img src="https://github.com/urcuqui/Data-Science/blob/master/Utilities/AVisual2.png?raw=true" width="500">

__Author__: Christian Urcuqui

__Date__: 2 Sept 2018

__Last Updated__: 3 Sept 2018

# Plotting and Visualization

This is an importance step in a data science project, specially, when you are applying an EDA approach. Visual Analysis allows us to understand the data, get patterns, outliers, make ideas about the models and insights about the datasets, and finally to communicate efectivetly the results to anyone.

![image](https://github.com/urcuqui/Data-Science/blob/master/Utilities/AVisual1.png?raw=true)

Python has many libraries for making static and dinamic visualizations, in this notebook we will see _matplotlib_, _seaborn_. In order to review these tools this notebook is divided in the next sections:
+ [Matplotlib](#Matplotlib)
+ [Visual analytics applications](#Visual-Analytics-Applications)





## Matplotlib

It is a desktop plotting package made by John Hunter in 2002 to enable MATLAB-like plotting interface in Python. Thanks to the collaboration betweeen the Matplotlib and IPython communities right now we have an easy way to make interactive plotting from he IPython shell (Jupyter notebooks). We will see that we can make a lot of type of plots and export them in different graphic formats (like PDF, SVG, PNG, GIF). 

https://matplotlib.org/

Matplotlib has spawned a number of add-on toolkits for data visualization, one of these is seaborn.

http://seaborn.pydata.org/

Remember that to use matplotlib we need to use a magic command with IPython, this is:

In [1]:
%matplotlib notebook

In [2]:
# next we can import the library
import matplotlib.pyplot as plt

In [11]:
# we will make a simple lineat plot, our first "hello world"
import numpy as np

data = np.arange(10)
print(data)
plt.plot(data)

[0 1 2 3 4 5 6 7 8 9]


<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x2b0e340e780>]

## Figures and Subplots

Plots in matplotlib reside within a _Figure_ object.You can make a new figure with _plt.figure_:

In [14]:
fig = plt.figure()

<IPython.core.display.Javascript object>

Note that in the figure we didn't have a graphic in the plot, we only had an empty window. The method plt.figure has a number of options, for example _figsize_ allows us to define the size and aspect ratio. 

Through the _Figure_ object we can add subplots (windows), the method to do this is add_subplot:

In [16]:
fig2 = plt.figure()
# this means that the figure might have 2 x 2 plots and the last parameter define 1 of them 
# in the first position (left to right and up to down)
ax1 = fig2.add_subplot(2, 2, 1)

<IPython.core.display.Javascript object>

In [20]:
fig3 = plt.figure()

ax1 = fig3.add_subplot(2, 2, 1) # first position
ax2 = fig3.add_subplot(2, 2, 2) # second position
ax3 = fig3.add_subplot(2, 2, 3) # third position

# Pay attention if you want to plot in one of the subplots, this will be reflected in the last window of the Figure
# k-- is a style option (in this case is a black dashed line).
plt.plot(np.random.randn(50).cumsum(), 'k--')


<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x2b0e4b9deb8>]

If we can to plot in a specific subplot we can use the object returned by __fig.add_subplot__ here are _AxesSubplot_ objects.

In [22]:
fig4 = plt.figure()

ax1_4 = fig4.add_subplot(2, 2, 1) # first position
ax2_4 = fig4.add_subplot(2, 2, 2) # second position
ax3_4 = fig4.add_subplot(2, 2, 3) # third position

plt.plot(np.random.randn(50).cumsum(), 'k--')

_ = ax1_4.hist(np.random.randn(100), bins=20, color='k', alpha=0.3)

ax2_4.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))

<IPython.core.display.Javascript object>

<matplotlib.collections.PathCollection at 0x2b0e5297668>

In [24]:
fig, axes = plt.subplots(2,3)

axes

<IPython.core.display.Javascript object>

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000002B0E56CA0B8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000002B0E4E9E4E0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000002B0E48A82B0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x000002B0E4490BE0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000002B0E41DDD30>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x000002B0E3465240>]],
      dtype=object)

Note that the axes aray can be easily indexed like a two-dimensional array.

In [25]:
axes[0,1]

<matplotlib.axes._subplots.AxesSubplot at 0x2b0e4e9e4e0>

It is useful if we want to indicate that subplots should have the same x- or y-axis using sharex and sharey, respectively. This is important when we want to compare data on the same scale; otherwise, matplotlib autoscales plot limits independently. 

### Adjusting the spacing around subplots

Matplotlib by default leaves a certain amount of padding between each subplot. This spacing is all specified relative to the height and width of the plot, but we can change it throught the method __subplots_adjust__

In [26]:
# we will make two subplots with the same x- and y-axis 
fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)
for i in range(2):
    for j in range(2):
        axes[i, j].hist(np.random.randn(500), bins=50, color='k', alpha=0.5)
plt.subplots_adjust(wspace=0, hspace=0)

<IPython.core.display.Javascript object>

## Colors, Markers, and Line Styles

Each AxesSubplot and plot in Matplotlib accepts arrays of x and y coordinates and optionally a string abbreviation indicating color and line style. In the next example if we define a plot like this
```
ax.plot(x, y, 'g--')
```
We made a plot with x versus y with green dashes. Another way is to separate the dash and the color
```
ax.plot(x, y, linestyle='--', color='g')
```




In [10]:
?plt.plot

Sometimes is necessary to use markers in order provide more differentiation in our data.

In [14]:
import numpy as np

plt.plot(np.random.randn(30).cumsum(), color='k', linestyle='--', marker='o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x19e900c89e8>]

In the next example we will make two interpolated plots and show the legend with the information of each plot label

In [3]:
import numpy as np

figure = plt.figure()

data = np.random.randn(30).cumsum()
plt.plot(data, 'k--', label='Default')
plt.plot(data, 'k-', drawstyle='steps-post', label='steps-post')

figure.legend(loc='best')

<IPython.core.display.Javascript object>



<matplotlib.legend.Legend at 0x2ca5aa244e0>

### Ticks, Labels, and Legends

We have two ways for most kind of plot decorations, one is through the procedural _pyplot_ interface (i.e., matplotlib.pyplot), and the object oriented matplotlib API.

The _pyplot_ interface consists of methods like _xlim_, _xticks_, and _xticklabels_. These control the plot range, tick locations, and tick labels, respectively. 

All the methods act on the active or most recently made AxesSubplot. 

In [4]:
fig = plt.figure()

ax = fig.add_subplot(1, 1, 1)

ax.plot(np.random.randn(1000).cumsum())

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x2ca5ad6b470>]

In order to change the x-axis ticks, we can use set_xticks and set_sticklabels. The previous instructs define the ticks along the data range.

In [6]:
fig = plt.figure()

ax = fig.add_subplot(1, 1, 1)

ax.plot(np.random.randn(1000).cumsum())

ticks = ax.set_xticks([0, 250, 500, 750, 1000])
# The rotation option sets the x tick labels at a 30-degree rotation.
labels = ax.set_xticklabels(['one', 'two', 'three', 'four', 'five'], rotation=30, fontsize='small')

ax.set_title('My first matplotlib plot')
ax.set_xlabel('Stages')


<IPython.core.display.Javascript object>

Text(0.5,0,'Stages')

In the same way to modify the y-axis, subtituting _y_ for _x_ in the above. The axes class has a set method that allows batch setting of plot propierties, like this:
```
props = {
    'title': 'My first matplotlib plot',
    'xlabel': 'Stages'
}
ax.set(**pros)
```

### Adding legends

In [4]:
%load_ext rpy2.ipython

In [5]:
%R require(ggplot2)

array([0], dtype=int32)

## References

+McKinney, W. (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.".