# Visualising data

> _"[...] the most important part of [...] research [is] being able to successfully communicate [...] results to clinicians"_
>
> -- Dr. Matthias Stahl<sup>1</sup>

This chapter will be your first taste of the enormous "ecosystem" of Python third-party libraries outside of the standard library that comes packaged with Python itself. We will start out introducing how to plot in general then move on to creating the plots for the climate data.

`matplotlib` is a one of several Python plotting libraries but is possibly the most commonly used. When you began the course 2 weeks ago you installed the library already so you don't need to worry about installing it.

When using `matplotlib` with Jupyter notebooks, the following line of code allows the figures to be plotted in the notebook results (so make sure you execute this cell.

In [None]:
%matplotlib widget

`matplotlib.pyplot` is a collection of command style functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc. In `matplotlib.pyplot` various states are preserved across function calls, so that it keeps track of things like the current figure and plotting area, and the plotting functions are directed to the current subplot.

What we first have to do is importing the library of course.

In [None]:
import matplotlib.pyplot as plt

## 1. Creating plots

In [None]:
plt.figure()
plt.plot([1, 2, 3, 2.5])
plt.ylabel('some numbers')

`plot()` is a versatile command, and will take an arbitrary number of arguments. For example, to plot x versus y, you can issue the command:

In [None]:
# list ranging from 1 to 9
x_list = list(range(1, 10))
# list with exponential values
y_list = [1, 4, 9, 16, 25, 36, 49, 64, 81]

In [None]:
plt.figure()
plt.plot(x_list, y_list)
plt.title("Title of the plot")

Using the pyplot interface, you build a graph by calling a sequence of functions and all of them are applied to the *current plot*, like so:

In [None]:
plt.figure()
plt.plot([1, 2, 3, 4], [10, 20, 25, 30], color='lightblue', linewidth=3)
plt.scatter([0.3, 3.8, 1.2, 2.5], [11, 25, 9, 26], color='darkgreen', marker='^')
plt.xlim(0.5, 4.5)
plt.title("Title of the plot")
plt.xlabel("This is the x-label")
plt.ylabel("This is the y-label")
# Uncomment the line below to save the figure in your currentdirectory
# plt.savefig('examplefigure.png')

## 2. Barplots

Barplots are made with the `plt.bar` function:

In [None]:
plt.figure()
# height of bars are nucleotide percentages of data/gene.fa: [A_perc, C_perc, G_perc, T_perc]
height = [17.627944760357433, 33.22502030869212, 30.300568643379368, 18.846466287571083]
# Names of bars
bars = ('A','C','G','T')
# making a barplot
plt.bar(bars, height)
# adding layouts: xlabel, ylabel and title. 
plt.xlabel('Nucleotide')
plt.ylabel('Percentage of occurence (%)')
plt.title('Distribution of nucleotides in fasta sequence')

In [None]:
plt.figure()
# height of bars are nucleotide percentages of data/gene.fa: [A_perc, C_perc, G_perc, T_perc]
height = [17.627944760357433, 33.22502030869212, 30.300568643379368, 18.846466287571083]
# Names of bars
bars = ('A','C','G','T')
#plt.bar(bars, height, color=('green','red','yellow','blue'))
plt.bar(bars, height, color=('#1f77b4','#ff7f0e','#2ca02c','#d62728'))

plt.xlabel('Nucleotide')
plt.ylabel('Percentage of occurence (%)')
plt.title('Distribution of nucleotides in fasta sequence')

In [None]:
plt.figure()
 
# width of the bars
barWidth = 0.3
 
# Choose the height of the blue bars
experimentA = [10, 9, 2]
 
# Choose the height of the cyan bars
experimentB = [10.8, 9.5, 4.5]
 
# Choose the height of the error bars (bars1)
yer1 = [0.5, 0.4, 0.5]
 
# Choose the height of the error bars (bars2)
yer2 = [1, 0.7, 1]
 
# The x position of bars
r1 = list(range(len(experimentA)))
r2 = [x + barWidth for x in r1]
 
# Create blue bars
plt.bar(r1, experimentA, width = 0.3, color = 'blue', edgecolor = 'black', yerr=yer1, capsize=5, label='Experiment A') # Capsize is the width of errorbars
 
# Create cyan bars
plt.bar(r2, experimentB, width = 0.3, color = 'cyan', edgecolor = 'black', yerr=yer2, capsize=7, label='Experiment B')
 
# general layout
plt.xticks([x + barWidth/2 for x in r1], ['cond_A', 'cond_B', 'cond_C'])
plt.ylabel('effect')
plt.legend()
 
# Show graphic
plt.show()


## 3. Plotting climate data

First, we begin by loading the climate data we cleaned in the last chapter into this notebook...

In [None]:
import csv
from datetime import datetime

def load_data(filename):
    with open(filename) as file_resource:
        reader = csv.DictReader(file_resource)
    
        data = []
        for reading in reader:
            data = data + [{'Time': datetime.fromisoformat(reading['Time']), 'Temperature': float(reading['Temperature'])}]
    
    return data

global_data = load_data("data/global_data.csv")
bel_data = load_data("data/bel_data.csv")
rus_data = load_data("data/rus_data.csv")
aus_data = load_data("data/aus_data.csv")

Use the cell below to check that the data loaded is what you expect...

Now we're ready to try to plot it. First we will seperate the dates to put on the x-axis and the temperatures to put on the y-axis

In [None]:
dates = []
temperatures = []
for reading in global_data:
    dates = dates + [reading['Time']]
    temperatures = temperatures + [reading['Temperature']]

In [None]:
plt.figure()
plt.plot(dates, temperatures, color='lightblue', linewidth=3)
plt.title("Global climate anomaly relative to 1960-1991 reference")
plt.xlabel("Time")
plt.ylabel("Temperature anomaly")

Beautiful! However there is a lot of noise, it might be useful to display smoothed data<sup>2</sup>. One way of smoothing these data is by taking a moving average over a window of, say, 2 years (24 months). So let's do that now:

---

### Exercise 9-1: Plotting smooth data
Check that the code below is correct then generate a plot with the smoothed data overlaid on the unsmoothed data.

In [None]:
def moving_average(data, window_size):
    smoothed = []
    for window_begin in range(len(data) - window_size + 1):
        # list(range(window_begin, window_begin + window_size))]
        temperatures = []
        for index in range(window_begin, window_begin + window_size):
            temperatures = temperatures + [data[index]['Temperature']]
            
        avg = sum(temperatures) / window_size
        
        smoothed = smoothed + [{
            'Temperature': avg,
            'Time': data[window_begin + (window_size // 2)]['Time']
        }]
    
    return smoothed

glo_average = moving_average(global_data, 24)

In [None]:
glo_avg_dates = []
glo_avg_temperatures = []
for reading in glo_average:
    glo_avg_dates = glo_avg_dates + [reading['Time']]
    glo_avg_temperatures = glo_avg_temperatures + [reading['Temperature']]

In [None]:
plt.figure()
plt.plot(dates, temperatures, color='lightblue', linewidth=3)
plt.plot(glo_avg_dates, glo_avg_temperatures, color='darkblue', linewidth=3)
plt.title("Global climate anomaly relative to 1960-1991 reference")
plt.xlabel("Time")
plt.ylabel("Temperature anomaly")

---

### Exercise 9-2: Plot country data
Create plots like you did above for each country. Remember that functions can save you for repeating work.

---

## 4. Chapter Review
In this chapter we used the `matplotlib` third-party Python library to generate plots in the notebook. We learned how to use the `plot()`, `scatter()` and `bar()` commands. Finally, we generated plots from the climate data.

### Review Questions

1. Is it possible to change the plot colour?
<details>
    <summary>Answer</summary>
    Yes, using the <code>color</code> argument to <code>plot()</code>
</details>


2. What does the <code>figure()</code> command do?
<details>
    <summary>Answer</summary>
    Makes <code>plot()</code> create a new plot rather than plotting on the previous one.
</details>


3. Can you use <code>matplotlib</code> to plot a histogram?
<details>
    <summary>Answer</summary>
    Yes.
</details>


4. Can an axis scale be non-linear (e.g. logarithmic scale)?
<details>
    <summary>Answer</summary>
    Yes, using the <code>xscale</code> or <code>yscale</code> commands.
</details>

## 5. References

1. [Matthias Stahl's personal website](https://www.higsch.com/about/)
2. [Data smoothing](https://www.climate4you.com/DataSmoothing.htm)

## 6. Supporting material

* [Become a Python Data Analyst](https://www.packtpub.com/eu/big-data-and-business-intelligence/become-python-data-analyst)
* [Add confidence interval on barplot](https://python-graph-gallery.com/8-add-confidence-interval-on-barplot/)
* [Matplotlib Cheatsheets](https://github.com/matplotlib/cheatsheets#cheatsheets)

## 7. Next session

Go to our [next chapter](10_Conclusion.ipynb). 