## Plotting and analysing Data
In this Notebook we introduce plotting of data in arrays using MatPlotLib package.
We also calculate the spacing between veins, plot a histogram of thickness and spacing, a vein log diagram, 

### Matplotlib
Matplotlib is one of the most popular Python packages (or *libraries*) used for data visualization. It is a cross-platform library for making 2D plots from data in arrays. To get started you just need to make the necessary imports, prepare some data, and you can start plotting with the help of the `plot()` function. When you’re done, remember to show your plot using the `show()` function.

Matplotlib is written in Python and makes use of NumPy, the numerical mathematics extension of Python. Here we will show some of the basic functionalities, including: Line, Bar, Scatter, Histrogram plots. There are many ways of making complex plots from complex data.

All we need to do to get started is to import the plotting capabilities of the matplotlib library using the `import` statement. We will import only a module of the matplotlib library *pyplot* and we do so in a way where we don't have to type the word *pyplot* everytime, but can refer to it a shorter word *plt* using `import as`.

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt

We also want to generate some dummy data at first, to show how **Matplotlib** works.

In [None]:
x = [3, 4, 7, 7, 10]
y = [2, 8, 5, 10, 3]

To show this data as a basic line graph, we use `plot()`.

In [None]:
plt.plot(x, y)
plt.show()

Or we can make this into a scatter plot using `scatter()`.

In [None]:
plt.scatter(x, y)
plt.show()

Or we can plot a histogram using `hist()`.

In [None]:
plt.hist(x)
plt.show()

### How to find out how to use all these 
You may wonder what the difference is between `bar()` and `scatter()` and what arguments both would take. When learning to use functions and methods in a new library, or learning to code in Python or any other coding language, it is very useful to refer to the documentation and learn what was the intended use and what arguments you need to provide in what format. For example:

[Library info on scatter()](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html)

[Library info on hist()](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html)

Or you can just use the `help()` function.

In [None]:
help(plt.scatter)

Let's now annotate our axes and give the plot a title using methods `title()`, `xlabel()`, and `ylabel()`.

In [None]:
plt.scatter(x, y)
plt.title("Some Random Values")
plt.xlabel("X axis (m)")
plt.ylabel("Y axis (m)")
plt.show()

A simple way of plotting multiple graphs together is to simply plot them underneath each other. 

In [None]:
plt.scatter(x, y)
plt.title("Scatter plot")
plt.xlabel("X axis (m)")
plt.ylabel("Y axis (m)")
plt.show()
plt.bar(y,x)
plt.title("Bar plot")
plt.xlabel("X axis (m)")
plt.ylabel("Y axis (m)")
plt.show()

What happens when you comment out the first `plt.show()`? Commenting out can be done using the hashtag `#` at the start of a line 

### Creating more complex figures
At its core, Matplotlib is *object-oriented*. To allow for more control and customization of your plots, in many cases you will create a Figure (*object*) and one or more Axes (*objects*) using `pyplot.subplots` and from that modify these individual *objects*.



#### subplots
To create simple grids of figures, *Matplotlib* has several convenience routines that make rows and columns of different subplots easy to create. A convenient routine is `subplots()` without arguments returns a Figure and a single Axes. Using this approach we can combine different graphs with different axes in one figure. Setting up subplots is standard practice even for simple figures. 

Let's create a grid of empty two by two plots.

In [None]:
fig, ax = plt.subplots(2,2)

This single call has now created a **fig** *object* and also an associated array of **ax** *objects*. We can refer to each object in the **ax** array in the same way we 

In [None]:
fig, ax = plt.subplots(2,2)
ax[0,0].scatter(y,x)
ax[0,1].plot(x,y)
ax[1,0].bar(y,x)
ax[1,1].scatter(y,x)
plt.show()

### Using the numpy library
Let's now use some mathematical objects in these same statements. The `numpy` python library is frequently used to *'add support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays*.

Similar as with *matplotlib*, we import `numpy` using import statement and give a shorter name using `as`

In [None]:
import numpy as np
fig, ax = plt.subplots(2,2)
ax[0,0].scatter(np.negative(y),np.negative(x))
ax[0,1].plot(x,y)
ax[1,0].bar(np.sqrt(y),x)
ax[1,1].scatter(y,np.negative(y))
plt.show()

## Plotting the veins dataset
Now onto our geological dataset. First we import our data using Pandas and put the data in a dataframe.

In [None]:
import pandas as pd
dataframe = pd.read_csv('Vein_dataset_large.csv')
dataframe.head()

### Plotting histograms
We now plot the histogram of thickness, position, spacing, and label each plot. We should also `import` *pyplot* `from` our matplotlib library `as` plt.

In [None]:
from matplotlib import pyplot as plt
fig, ax = plt.subplots(1,2, figsize = (10, 4))
num_bins = 15
ax[0].hist(dataframe["Thickness"], bins=15)
ax[1].hist(dataframe["Spacing"], bins=num_bins)
ax[0].set_ylabel("Amount (n)")
ax[1].set_ylabel("Amount (n)")
ax[0].set_xlabel("Thickness (mm)")
ax[1].set_xlabel("Spacing (mm)")
fig.suptitle('Vein thickness and vein spacing histograms', fontsize=16)
plt.show()

## Produce vein logs
Here we will reproduce a vein log, which shows the position of each vein as a bar along the transect (in mm), and the length and width of each bar is scaled according to the thickness of each vein (in mm). For that we use the `bar()` *method* to plot a bar plot on the **ax** *object* we will create. The width of the veins in the figure is exaggerated three times.

In [None]:
fig, ax = plt.subplots(1,1, figsize = (10, 10)) 
# We create the bar plot from the dataframe, increase the width of the bars, and give it a black colour.
ax.bar(dataframe["Position"], dataframe["Thickness"], width=3*dataframe["Thickness"], linewidth=0, color='k')
# We label our graph
ax.set_xlabel('Vein Distribution Transect Line (mm)', fontsize='large')
ax.set_ylabel('Thickness (mm)', fontsize='large')
# Some lines to put a limit to the extent of the axis which is defined flexibly by the size of the dataset arrays.
plt.ylim([0,dataframe["Thickness"].max()])
plt.xlim([0,dataframe["Position"].max()])
# Some lines of code to make it look nicer
fig.set_figheight(3)
fig.tight_layout(pad=1.05)
plt.show()

## Next notebook

In the next notebook we will analyse and plot the data within the datafile we just read in.

[Click here to open the next notebook](7_Calculating_veins.ipynb)