# Working with Visualization - Matplotlib

<h2> Matplotlib </h2>
<a id = "Matplotlib"> </a>

<p>Matplotlib is a very powerful plotting library useful for those working with Python, Pandas, and NumPy. The most used module of Matplotib is Pyplot which provides a convenient interface to the matplotlib object-oriented plotting library. It is modeled closely after Matlab(TM). Therefore, the majority of plotting commands in pyplot have Matlab(TM) analogs with similar arguments. Important commands are explained with interactive examples.</p>

<p> You can find more documentation and examples about Matplotlib on <a href="https://matplotlib.org/index.html"> Matplotlib.org </a> </p>


## General Concepts

### Types of inputs to plotting functions
All of plotting functions expect <b>numpy.array</b> or <b>list</b> as input. Classes that are 'array-like' such as <b>pandas</b> data objects and <b>numpy.matrix</b> may or may not work as intended. It is best to convert these to numpy.array objects prior to plotting.

## Getting Started with Pyplot
Pyplot is a module of Matplotlib which provides simple functions to add plot elements like lines, images, text, etc. to the current axes in the current figure.

### Make a simple plot
Here we import Matplotlib’s Pyplot module and Numpy library as most of the data that we will be working with will be in the form of arrays only.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Displays graphic output inline in Jupyter Noteook
%matplotlib inline

# %matplotlib is a magic command which performs the necessary behind-the-scenes setup for IPython
# to work correctly hand-in-hand with matplotlib
# It does not execute any Python import commands, that is, no names are added to the namespace.

In [None]:
x = np.linspace(0, 2, 100)

plt.plot(x, x, label='linear')  # Plot some data on the (implicit) axes.
plt.plot(x, x**2, label='quadratic')  # Plot more data on the axes...
plt.plot(x, x**3, label='cubic') # Plot more data on the axes...
plt.xlabel('x label') # Add an x-label to the axes.
plt.ylabel('y label') # Add an y-label to the axes.
plt.title("Simple Plot") # Add a title to the axes.
plt.legend() # Add a legend.
plt.show()  # Show the plot

We pass two arrays as our input arguments to Pyplot’s <i>plot()</i> method and use <i>show()</i> method to invoke the required plot. Here note that the first array appears on the x-axis and second array appears on the y-axis of the plot. Now that our first plot is ready, let us add the title, and name x-axis and y-axis using methods <i>title()</i>, <i>xlabel()</i> and <i>ylabel()</i> respectively.

We can also specify the size of the figure using method <i>figure()</i> and passing the values as a tuple of the length of rows and columns to the argument <i>figsize</i>

In [None]:
x = np.linspace(0, 2, 100)

plt.figure(figsize = (15,10))  # Set Figure size 
plt.plot(x, x, label='linear')  # Plot some data on the (implicit) axes.
plt.plot(x, x**2, label='quadratic')  # Plot more data on the axes...
plt.plot(x, x**3, label='cubic') # Plot more data on the axes...
plt.xlabel('x label') # Add an x-label to the axes.
plt.ylabel('y label') # Add an y-label to the axes.
plt.title("Simple Plot") # Add a title to the axes.
plt.legend() # Add a legend.
plt.show()  # Show the plot

In [None]:
# Example to plot csv-format data 
def placeRecordsIntoList(fileName):
    infile = open("UN.txt", 'r')
    listOfRecords = [line.rstrip() for line in infile]
    infile.close()
    for i in range(len(listOfRecords)):
        listOfRecords[i] = listOfRecords[i].split(',')
        listOfRecords[i][2] = eval(listOfRecords[i][2]) # population
        listOfRecords[i][3] = eval(listOfRecords[i][3]) # area
        #print(listOfRecords[i])
    return listOfRecords

def extractField(fileName, n):
    ## Extract the nth field from each record of a CSV file
    ## and place data into a list
    infile = open(fileName, 'r')
    r_list = [line.rstrip().split(',')[n-1] for line in infile]
    
    if r_list[0][-1].isdigit():
        return [eval(x) for x  in r_list]
    else:
        return r_list
    inline.close()

## Do statistical analysis of country's area
population = extractField("UN.txt", 3)  # Get population
area = extractField("UN.txt", 4)  # Get area 

# Plot population vs area
plt.figure(figsize = (15,10))  # Set Figure size 
plt.plot(area, population, 'go', label='population vs area')  # Plot some data on the (implicit) axes.
plt.xlabel('Area') # Add an x-label to the axes.
plt.ylabel('Population') # Add an y-label to the axes.
plt.title("Population vs Area for UN countries") # Add a title to the axes.
plt.legend() # Add a legend.
plt.show()  # Show the plot


With every X and Y argument, you can also pass an optional third argument in the form of a string which indicates the colour and line type of the plot. The default format is <b>b-</b> which means a solid blue line. In the figure below we use <b>go</b> which means green circles. Likewise, we can make many such combinations to format our plot.

<p> You can find more documentation and examples about color and line types on <a href="https://matplotlib.org/2.1.1/api/_as_gen/matplotlib.pyplot.plot.html"> Color and Line Types</a> </p>

In [None]:
x = np.linspace(0, 2, 100)

plt.figure(figsize = (15,10))  # Set Figure size 
plt.plot(x, x, 'v-', label='linear')  # Plot some data on the (implicit) axes.
plt.plot(x, x**2, 'go', label='quadratic')  # Plot more data on the axes...
plt.plot(x, x**3, '--', label='cubic') # Plot more data on the axes...
plt.xlabel('x label') # Add an x-label to the axes.
plt.ylabel('y label') # Add an y-label to the axes.
plt.title("Simple Plot") # Add a title to the axes.
plt.legend() # Add a legend.
plt.show()  # Show the plot

## Multiple plots in one figure:
We can use <i>subplot()</i> method to add more than one plots in one figure. In the image below, we used this method to separate two graphs which we plotted on the same axes in the previous example. The <i>subplot()</i> method takes three arguments: they are <i>nrows</i>, <i>ncols</i> and <i>index</i>. They indicate the number of rows, number of columns and the index number of the sub-plot. For instance, in our example, we want to create two sub-plots in one figure such that it comes in one row and in two columns and hence we pass arguments <i>(1,2,1)</i> and <i>(1,2,2)</i> in the subplot() method. Note that we have separately used <i>title()</i> method for both the subplots. We use <i>suptitle()</i> method to make a centralized title for the figure.

<b>subplot(nrows, ncols, index)</b> - Three integers (nrows, ncols, index). The subplot will take the index position on a grid with nrows rows and ncols columns. index starts at 1 in the upper left corner and increases to the right. index can also be a two-tuple specifying the (first, last) indices (1-based, and including last) of the subplot, e.g., fig.add_subplot(3, 1, (1, 2)) makes a subplot that spans the upper 2/3 of the figure.

In [None]:
x = np.linspace(0, 2, 100)

plt.figure(figsize = (10,5))  # Set Figure size 

plt.subplot(1,2,1)
plt.plot(x, x, 'v-', label='linear')  # Plot some data on the (implicit) axes.
plt.title("1st subplot")
plt.xlabel('x label') # Add an x-label to the axes.
plt.ylabel('y label') # Add an y-label to the axes.
plt.legend() # Add a legend.

plt.subplot(1,2,2)
plt.plot(x, x**2, 'go', label='quadratic')  # Plot more data on the axes...
plt.plot(x, x**3, '--', label='cubic') # Plot more data on the axes...
plt.title("2nd subplot")
plt.xlabel('x label') # Add an x-label to the axes.
plt.ylabel('y label') # Add an y-label to the axes.
plt.legend() # Add a legend.

plt.suptitle("My Sub-Plot") # Add a title to the axes.

plt.show()  # Show the plot

If we want our sub-plots in two rows and single column, we can pass arguments <i>(2,1,1)</i> and <i>(2,1,2)</i>

In [None]:
x = np.linspace(0, 2, 100)

plt.figure(figsize = (10,10))  # Set Figure size 

plt.subplot(2,1,1)
plt.plot(x, x, 'v-', label='linear')  # Plot some data on the (implicit) axes.
plt.title("1st subplot")
plt.xlabel('x label') # Add an x-label to the axes.
plt.ylabel('y label') # Add an y-label to the axes.
plt.legend() # Add a legend.

plt.subplot(2,1,2)
plt.plot(x, x**2, 'go', label='quadratic')  # Plot more data on the axes...
plt.plot(x, x**3, '--', label='cubic') # Plot more data on the axes...
plt.title("2nd subplot")
plt.xlabel('x label') # Add an x-label to the axes.
plt.ylabel('y label') # Add an y-label to the axes.
plt.legend() # Add a legend.

plt.suptitle("My Sub-Plot") # Add a title to the axes.

plt.show()  # Show the plot

In [None]:
x = np.linspace(0, 2, 100)

plt.figure(figsize = (10,5))  # Set Figure size 

plt.subplot(2,2,1)
plt.plot(x, x, 'v-', label='linear')  # Plot some data on the (implicit) axes.
plt.title("1st subplot")
plt.ylabel('y label') # Add an y-label to the axes.
plt.legend() # Add a legend.

plt.subplot(2,2,2)
plt.plot(x, x**2, 'go', label='quadratic')  # Plot more data on the axes...
plt.title("2nd subplot")
plt.ylabel('y label') # Add an y-label to the axes.
plt.legend() # Add a legend.

plt.subplot(2,2,3)
plt.plot(x, x**3, '--', label='cubic') # Plot more data on the axes...
plt.xlabel('x label') # Add an x-label to the axes.
plt.ylabel('y label') # Add an y-label to the axes.
plt.legend() # Add a legend.

plt.subplot(2,2,4)
plt.plot(x, x**4, '--', label='4th') # Plot more data on the axes...
plt.xlabel('x label') # Add an x-label to the axes.
plt.ylabel('y label') # Add an y-label to the axes.
plt.legend() # Add a legend.

plt.suptitle("My Sub-Plot") # Add a title to the axes.

plt.show()  # Show the plot


## Creating Different Types of Graphs with Pyplot

### 1) Histogram
Histograms are a very common type of plots when we are looking at data like height and weight, stock prices, waiting time for a customer, etc which are continuous in nature. Histogram’s data is plotted within a range against its frequency. Histograms are very commonly occurring graphs in probability and statistics and form the basis for various distributions like the normal -distribution, t-distribution, etc. In the following example, we generate a random continuous data of 1000 entries and plot it against its frequency with the data divided into 10 equal strata. We have used NumPy’s random.randn() method which generates data with the properties of a standard normal distribution i.e. mean = 0 and standard deviation = 1, and hence the histogram looks like a normal distribution curve.

In [None]:
# Plot Histogram
x = np.random.randn(1000)

plt.title('Histogram')
plt.xlabel('Random Data') # Add an x-label to the axes.
plt.ylabel('Frequency') # Add an y-label to the axes.
plt.hist(x, 10)  # 10 is the number of bins
plt.show()

In [None]:
# Population Histogram
plt.title('Population Histogram')
plt.xlabel('Population') # Add an x-label to the axes.
plt.ylabel('Frequency') # Add an y-label to the axes.
plt.hist(population, 100)  # 10 is the number of bins
plt.show()

In [None]:
# Area Histogram
plt.title('Area Histogram')
plt.xlabel('Area') # Add an x-label to the axes.
plt.ylabel('Frequency') # Add an y-label to the axes.
plt.hist(area, 50)  # 10 is the number of bins
plt.show()

In [None]:
# Plot Histogram
x = np.random.randn(1000)

plt.title('Histogram')
plt.xlabel('Random Data') # Add an x-label to the axes.
plt.ylabel('Frequency') # Add an y-label to the axes.
plt.hist(x, 100)  # 10 is the number of bins
plt.show()

### 2) Scatter Plots 
Scatter plots are widely used graphs, especially they come in handy in visualizing a problem of regression. In the following example, we feed in arbitrarily created data of height and weight and plot them against each other. We used xlim() and ylim() methods to set the limits of X-axis and Y-axis respectively.

In [None]:
# Scatter Plot
height = np.array([167,170,149,165,155,180,166,146,159,185,145,168,172,181,169])
weight = np.array([86,74,66,78,68,79,90,73,76,8,66,84,67,84,77])

plt.xlim(140,200)
plt.ylim(60,100)
plt.scatter(height, weight)  # Scatter Plot 
plt.title("Scatter Plot")
plt.xlabel('Height') # Add an x-label to the axes.
plt.ylabel('Weight') # Add an y-label to the axes.
plt.show()

### 3) Pie Charts
One more basic type of chart is a Pie chart which can be made using the method <i>pie()</i> We can also pass in arguments to customize our Pie chart to show shadow, explode a part of it, tilt it at an angle as follows:

In [None]:
# Pie Chart
firms = ['Firm A','Firm B','Firm C','Firm D','Firm E']
marketShare = [25,25,15,10,20]
Explode = [0,0.1,0,0,0]   # only "explode" the 2nd slice (i.e. 'Firm B')

plt.figure(figsize = (10,10))  # Set Figure size
plt.pie(marketShare,      # data
        explode=Explode,  # only "explode" the 2nd slice (i.e. 'Firm B')
        labels=firms,     # labels
        shadow=True,      # drop-shadow
        startangle=45)    # startangle = 90 such that everything is rotated counter-clockwise by 90 degrees 
plt.title("Scatter Plot")
plt.axis('equal')  # Add an x-label to the axes.
plt.legend(title='List of Firms') # Add an y-label to the axes.
plt.show()

### 4) Bar Graphs
Bar graphs are one of the most common types of graphs and are used to show data associated with the categorical variables. Pyplot provides a method <i>bar()</i> to make bar graphs which take arguments: categorical variables, their values and color (if you want to specify any).

In [None]:
# Bar Graph
x = ['Nuclear', 'Hydro', 'Gas', 'Oil', 'Coal', 'Biofuel']
energy = [5, 6, 15, 22, 24, 8]

plt.figure(figsize = (10,5))  # Set Figure size
plt.bar(x, energy, color='green')
plt.xlabel("Energy Source")
plt.ylabel("Energy Output (GJ)")
plt.title("Energy output from various fuel sources")

plt.show()

In [None]:
# Bar Graph for Country Area

# Get list of countires
country = extractField("UN.txt", 1)  # Get country

plt.figure(figsize = (20,5))  # Set Figure size
plt.bar(country[0:10], area[0:10],  color='green')
plt.xlabel("Countiry")
plt.ylabel("Area")
plt.title("Area by 10 Countries")

plt.show()

In [None]:
# Bar Graph - Error Bar
x = ['Nuclear', 'Hydro', 'Gas', 'Oil', 'Coal', 'Biofuel']
energy = [5, 6, 15, 22, 24, 8]
variance = [1, 2, 7, 4, 2, 3]

plt.figure(figsize = (10,5))  # Set Figure size
plt.bar(x, energy, color='green',  yerr=variance)
plt.xlabel("Energy Source")
plt.ylabel("Energy Output (GJ)")
plt.title("Energy output from various fuel sources")

plt.show()

In [None]:
# Horizontal Bar Graph - Error Bar
x = ['Nuclear', 'Hydro', 'Gas', 'Oil', 'Coal', 'Biofuel']
energy = [5, 6, 15, 22, 24, 8]
variance = [1, 2, 7, 4, 2, 3]

plt.figure(figsize = (10,5))  # Set Figure size
plt.barh(x, energy, color='green',  xerr=variance)
plt.xlabel("Energy Source")
plt.ylabel("Energy Output (GJ)")
plt.title("Energy output from various fuel sources")

plt.show()

### Bar Chart with Multiple X’s
To create horizontally stacked bar graphs we use the bar() method twice and pass the arguments where we mention the index and width of our bar graphs in order to horizontally stack them together. Also, notice the use of two other methods legend() which is used to show the legend of the graph and xticks() to label our x-axis based on the position of our bars.

In [None]:
# Bar Graph
x = ['Nuclear', 'Hydro', 'Gas', 'Oil', 'Coal', 'Biofuel']
energy2019 = [5, 6, 15, 22, 24, 8]
energy2018 = [4, 5, 11, 24, 28, 6]

index = np.arange(6)
width = 0.35

plt.figure(figsize = (10,5))  # Set Figure size
plt.bar(index, energy2018, width, color='green', label='2018')
plt.bar(index + width, energy2019, width, color='blue', label='2019',)

plt.title("Energy output from various fuel sources")
plt.xlabel("Energy Source")
plt.ylabel("Energy Output (GJ)")
plt.xticks(index + width/2, x)
plt.legend(loc='best')

plt.show()

### Stack Bar Charts
Similarly, to vertically stack the bar graphs together, we can use an argument bottom and mention the bar graph which we want to stack below as its value.

In [None]:
# Bar Graph
x = ['Nuclear', 'Hydro', 'Gas', 'Oil', 'Coal', 'Biofuel']
energy2019 = [5, 6, 15, 22, 24, 8]
energy2018 = [4, 5, 11, 24, 28, 6]

index = np.arange(6)
width = 0.35

plt.figure(figsize = (10,5))  # Set Figure size
plt.bar(index, energy2018, width, color='green', label='2018')
plt.bar(index, energy2019, width, color='blue', label='2019', bottom=energy2018)

plt.title("Energy output from various fuel sources")
plt.xlabel("Energy Source")
plt.ylabel("Energy Output (GJ)")
plt.xticks(index, x)
plt.legend(loc='best')

plt.show()