## Plotting Data



Creating graphical output with Python is typically achieved by using third-party libraries. Each of these libraries has its own way of describing graphical output, which can be very confusing. Furthermore, some libraries provide an object-oriented interface as well as a procedural interface. And to top it all, some libraries like pandas implement their own interface to some of the graphics libraries (and vice versa &#x2026;.). So if you start looking for help, you can find all sorts of messy/conflicting advice.

The matplotlib library (see [https://matplotlib.org/](https://matplotlib.org/))  is large and consists of several modules that address various graphing needs. In the sciences  Matlab-style X-Y data plotting is handled by the `pyplot` module \index{library!matplotlib!pyplot}. Because of this historical connection, `matplotlib.pyplot`  provides two different interfaces.  I will show a trivial example of the procedural interface below so that you can recognize the syntax when you see examples on the web.  For the remainder of the course, we will use the object-oriented interface to  `matplotlib.pyplot`. First, let's import some data:



In [1]:
import pandas as pd  # import pandas as pd
import pathlib as pl 

fn: str = "Yao_2018-b.xlsx"  # file name
sn: str = "both"  # sheet name
cwd :pl.Path = pl.Path.cwd()
fqfn :pl.Path = pl.Path(f"{cwd}/{fn}")

if not fqfn.exists():  # check if the file is actually there
    raise FileNotFoundError(f"Cannot find file {fqfn}")

os_peak: pd.DataFrame = pd.read_excel(fqfn, sheet_name=sn)

display(os_peak.head()) # do a visual confirmation
Depth: pd.Series = os_peak.iloc[:, 1]
Age: pd.Series = os_peak.iloc[:, 2]
d34S: pd.Series = os_peak.iloc[:, 3]
d34Serror: pd.Series = os_peak.iloc[:, 4]

### The procedural interface



In the following example, I give a typical command sequence. Line 4 creates a scatter plot (i.e., each coordinate pair is represented by a point), line 5, saves the figure as a pdf file, and line 6 makes the plot appear. You may ask, why would we need a command to make the plot appear? Plotting can be computationally expensive if you have a large dataset. So you don't want to redraw your plot window each time you change a label etc. So we first create all plot elements and then request that the plot be rendered with the `plt.show()` command. 



In [1]:
# plotting with the procedural interface
import matplotlib.pyplot as plt

plt.scatter(Age,d34S)      # create a scatter plot
plt.savefig(f"{cwd}/test.pdf")
plt.show()

Using the procedural interface is straightforward but limited. Imagine you have two plots in the same figure. If you try to set the x-axis label with `plt.xlabel` which of the two x-labels will you change?



### The object-oriented interface



Using the object-oriented approach avoids this problem quite elegantly. However, the `matplotlib` community uses confusing terminology.  First, we create (instantiate) a `canvas` object. Think of it as the page holding one or more of your figures. Weirdly enough, this object is not called `canvas`, but `Figure`.  Next, we need to create an object that holds the actual plot. The logical name for this would be `figure`, but alas, `matplotlib` calls the actual figure `axes`.  Have a look at this example:



In [1]:
# Create a canvas with one figure object
import matplotlib.pyplot as plt

fig: plt.Figure  # this variable  will hold the canvas object
ax1: plt.Axes  # this variable will hold the axis object

fig, ax1 = plt.subplots()  # create canvas and axis objects
ax1.scatter(Age, d34S)  # Create a scatter plot for ax
fig.savefig("plt_scatter.pdf") # use this before plt.show()
fig.show()

So what happens here: The `subplots()` method creates a canvas (aka `fig`), as well as a figure object (aka `ax1`). The canvas contains everything (potentially more than one figure), and the axes object has all the data, labels etc. &#x2026; Since we want to plot a scatter plot, we call the `scatter` method of the axes object.

Note, that there is nothing magical about the variable names, we might as well write



In [1]:
canvas, figure = plt.subplots()

However, everyone else uses `fig`, and `ax`, so this will become confusing once you look up online materials.

Note that the above code uses `fig.show()`, whereas many online examples use `plt.show()`. The main difference is that  `fig.show()` will only show the current figure. However, if you have several figures at once, it is easier to use  `plt.show()` which will show all current figures. Most people do thus not bother with `fig.show()` since plot show will always work. Going forward, we will use the `plt.show()` function.

One more caveat: `matplotlib.pyplot` knows the `plt.subplots()` and the `plt.subplot()` functions.  Note, that `plt.subplots()` is very different from `plt.subplot()`. Use the help system to see just how different their meaning really is.



#### Placing more than one graph into a figure



In [1]:
import matplotlib
import matplotlib.pyplot as plt

fig: plt.Figure  # figure (canvas)
ax1: plt.Axes  # first plot object
ax2: plt.Axes  # second plot object

# Create a figure with 1 row of 2 plots
fig, [ax1, ax2] = plt.subplots(nrows=1, ncols=2)  #

ax1.scatter(Age, d34S, color="C0")  # Create a scatter plot for ax
ax2.plot(Age, d34S, color="C1")  # Create lineplot for ax 2
plt.show()

Note that the resulting plots might overlap. See the "Visual Candy" section below for how to remedy this. And now the same for four figures:



In [1]:
import matplotlib.pyplot as plt

fig: plt.Figure
ax1: plt.Axes
ax2: plt.Axes
ax3: plt.Axes
ax4: plt.Axes

# Create a figure canvas with 2 plot objects
fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(nrows=2, ncols=2)  #

ax1.scatter(Age, d34S, color="C0")  # Create a scatter plot for ax
ax2.plot(Age, d34S, color="C1")  # Create lineplot for ax 2
ax3.scatter(Age, d34S, color="C2")  # Create a scatter plot for ax
ax4.plot(Age, d34S, color="C3")  # Create lineplot for ax 2
plt.show()

Note how the axis data returned by `plt.subplots()` function reflects the output geometry by returning a list of lists!  In other words, row elements are returned as a list of axis handles, and those lists are elements of another list.  So we could rewrite the code more elegantly as



In [1]:
from __future__ import annotations
import matplotlib.pyplot as plt

fig: plt.Figure # canvas object
ax: list[list[plt.Axes]] # list of axes objects

# Create a figure canvas with 2 plot objects
fig, ax = plt.subplots(nrows=2, ncols=2)  #

ax[0][0].scatter(Age, d34S, color="C0")  # Create a scatter plot for ax
ax[0][1].plot(Age, d34S, color="C1")  # Create lineplot for ax 2
ax[1][0].scatter(Age, d34S, color="C2")  # Create a scatter plot for ax
ax[1][1].plot(Age, d34S, color="C3")  # Create lineplot for ax 2
plt.show()

or you can use a nested loop if you happen to have many similar plots.



In [1]:
fig, ax = plt.subplots(nrows=2, ncols=2)  #

k = 0
for i, row in enumerate(ax):
    for j, col in enumerate(row):
        ax[i][j].scatter(Age, d34S, color=f"C{k}")
        k = k + 1

plt.show()

### Controlling figure size



Using the canvas (fig) handle, we can control almost any aspect of our figure (try and query the fig-handle with `help()`). Here we use the command in line seven to set the figure size to 6 by 4 inches



In [1]:
import matplotlib.pyplot as plt

fig: plt.Figure
ax: plt.Axes

fig, ax = plt.subplots()  #
fig.set_size_inches(6, 4)

ax.scatter(Age, d34S)  # Create a scatter plot for ax

fig.savefig("test_figure.pdf")
plt.show()

### Labels, title, and math symbols



Now, let's add a few bells and whistles to our plot 



In [1]:
import matplotlib.pyplot as plt

fig: plt.Figure
ax: plt.Axes

fig, ax = plt.subplots()
fig.set_size_inches(6, 4)

ax.scatter(Age, d34S)

ax.set_title("My first plot")
ax.set_xlabel("Age [Ma]")
ax.set_ylabel("d34S")

plt.show()

Lots of typing, but otherwise straightforward (again, save a template you can re-use next time.)



#### Math symbols



We can also add math symbols to the text. Matplotlib understands most LaTeX math symbols and will render them correctly if they are enclosed in dollar signs (see below). [See this link for a fairly complete list.](https://oeis.org/wiki/List_of_LaTeX_mathematical_symbols)



In [1]:
import matplotlib.pyplot as plt

fig: plt.Figure
ax: plt.Axes

fig, ax = plt.subplots()
fig.set_size_inches(6, 4)

ax.scatter(Age, d34S)

ax.set_title("My first plot")
ax.set_xlabel("Age [Ma]")
ax.set_ylabel("$\delta^{34}$S [$^0/_{00}$ VCDT] ")
# ax.set_ylabel("$\delta^{34}$S [mUr VCDT] ")  # mUr instead of permil
plt.show()

### Legends, arbitrary text, and visual clutter



This is now straightforward. The only noteworthy thing is that the location of, e.g., the arbitrary text, is given using the local coordinates of the data.



In [1]:
import matplotlib.pyplot as plt

fig: plt.Figure
ax: plt.Axes

fig, ax = plt.subplots()
fig.set_size_inches(6, 4)

ax.scatter(Age, d34S)

ax.set_title("My first plot")
ax.set_xlabel("Age [Ma]")
ax.set_ylabel("$\delta^{34}$S [mUr VCDT] ")
ax.legend(["Yao et al. 2018"])  # The legend
ax.text(55, 18.75, "Some text")  # Some arbitrary text

plt.show()

The black frame around the figure (and legend), is a figure element without function. A better word is visual clutter, which distracts from the actual information. Let's remove the elements we do not need. For the legend, we can simply add an option to suppress the frame, and for the top and right spine, we can render them invisible by specifying a non-color.



In [1]:
import matplotlib.pyplot as plt

fig: plt.Figure
ax: plt.Axes

fig, ax = plt.subplots()
fig.set_size_inches(6, 4)

ax.scatter(Age, d34S)

ax.set_title("My first plot")
ax.set_xlabel("Age [Ma]")
ax.set_ylabel("$\delta^{34}$S [VCDT] ")
ax.legend(["Yao et al. 2018"], frameon=False)
ax.text(55, 18.75, "Some text")

# remove unneeded frame elements
ax.spines["right"].set_color("none")  #
ax.spines["top"].set_color("none")

plt.show()

### Adding a second data set with the same y-axis



Is as simple as calling the axis object again. Here we use the same dataset but with a different plot method.  



In [1]:
import matplotlib.pyplot as plt

fig: plt.Figure
ax: plt.Axes

fig, ax = plt.subplots()
fig.set_size_inches(6,4)

ax.scatter(Age,d34S)
ax.plot(Age,d34S)

ax.set_title("My first plot")
ax.set_xlabel("Age [Ma]")
ax.set_ylabel("$\delta^{34}$S [VCDT] ")
ax.legend(["Yao et al. 2018"], frameon=False)
ax.text(55,18.75,"Some text")
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')

plt.show()

You can control whether the scatter symbol is on top of the line, or behind the line, by using the `zorder` keyword. See [https://stackoverflow.com/questions/2314379/how-to-plot-the-lines-first-and-points-last-in-matplotlib](https://stackoverflow.com/questions/2314379/how-to-plot-the-lines-first-and-points-last-in-matplotlib)



### Data with a shared x-axis, but an independent y-axis



At times, you will need to plot data that shares the X-axis but has an independent X-axis (say concentration versus isotope ratio). This can be achieved by creating a twin of the x-axis of the axes object.  See line 20 below: 



In [1]:
import matplotlib.pyplot as plt

fig: plt.Figure
ax1: plt.Axes
ax2: plt.Axes

fig, ax1 = plt.subplots()  # create plot canvas
fig.set_size_inches(6, 4)  # set figure size

ax1.scatter(Age, d34S, color="C0")  # scatter plot
ax1.plot(Age, d34S, color="C1")  # line plot

ax1.set_title("Using a secondary y-axis")  # title
ax1.set_xlabel("Age [Ma]")  # x label
ax1.set_ylabel("$\delta^{34}$S [VCDT] ")  # y-label
ax1.legend(["$\delta^{34}$S Yao et al. 2018"], frameon=False)  # The legend wo frame

# create new axes object that shares the x-axis, but has an
# independent y-axis
ax2 = ax1.twinx()
ax2.plot(Age, d34Serror, color="C2")
plt.show()  # render the figure

Not very pretty yet, so let's control the plot scale and add a legend 



In [1]:
import matplotlib.pyplot as plt

fig: plt.Figure
ax1: plt.Axes
ax2: plt.Axes

fig, ax1 = plt.subplots()  # create plot canvas
fig.set_size_inches(6, 4)  # set figure size

ax1.scatter(Age, d34S, color="C0")  # scatter plot
ax1.plot(Age, d34S, color="C1")  # line plot

ax1.set_title("Using a secondary y-axis")  # title
ax1.set_xlabel("Age [Ma]")  # x label
ax1.set_ylabel("$\delta^{34}$S [VCDT] ")  # y-label
ax1.legend(["$\delta^{34}$S Yao et al. 2018"], frameon=False)  # The legend wo frame

ax2 = ax1.twinx()
ax2.plot(Age, d34Serror, color="C2")
ax2.set_ylim([0, 0.4])
ax2.legend(["Analytical error"], frameon=False)
plt.show()  # render the figure

Somewhat better, but the legend is messy since each axis object has its own legend.
There are a couple of ways around this, and they are summarized in this StackOverflow post:

-   [secondary-axis-with-twinx-how-to-add-to-legend](https://stackoverflow.com/questions/5484922/secondary-axis-with-twinx-how-to-add-to-legend) but the solutions often depend on the type of plot you do (i..e., a line plot versus a scatter plot).

For our purposes, it is enough to position the legends manually. The location information is given in percent relative to the lower-left corner. A bit of a hack, but it works. 



In [1]:
import matplotlib.pyplot as plt

fig: plt.Figure
ax1: plt.Axes
ax2: plt.Axes

fig, ax1 = plt.subplots()  # create plot canvas
fig.set_size_inches(6,4)  # set figure size

ax1.scatter(Age,d34S,color="C0") # scatter plot
ax1.plot(Age,d34S,color="C1")    # line plot

ax1.set_title("Using a secondary y-axis")   # title
ax1.set_xlabel("Age [Ma]")       # x label
ax1.set_ylabel("$\delta^{34}$S [VCDT] ")     # y-label

ax1.legend(["$\delta^{34}$S Yao et al. 2018"],
          loc=(0.02, 0.9),
          frameon=False ) # The legend wo frame

ax2 = ax1.twinx()
ax2.plot(Age,d34Serror,color="C2")

ax2.set_ylim([0,0.4])
ax2.set_ylabel("Error [$^{0}/_{00}$ VCDT]")
ax2.legend(["Analytical error"],
           loc=(0.02,0.82),
           frameon=False)
plt.show()                           # render the figure

I should also note that there is a `twiny()` command which creates an independent x-axis, and to take things even further, you can create mapping functions between paired axes&#x2026;



### Visual candy



The default plot style is utilitarian and not particularly pretty. The Plot style is fortunately independent of the plot commands. We can use the `plt.style.use('style')` command to achieve a specific look.

Below we use the `ggplot` style that is popular with the R-crowd.  Note that you need to set the plotstyle before you call `fig, ax = plt.subplots()`



In [1]:
import matplotlib.pyplot as plt

fig: plt.Figure
ax: plt.Axes

plt.style.use("ggplot")
fig, ax = plt.subplots()
fig.set_size_inches(6, 4)

ax.scatter(Age, d34S, color="C0")
ax.plot(Age, d34S, color="C1")

ax.set_title("My first plot")
ax.set_xlabel("Age [Ma]")
ax.set_ylabel("$\delta^{34}$S [VCDT] ")
ax.legend(["Yao et al. 2018"], frameon=False)
ax.text(55, 18.75, "Some text")
ax.spines["right"].set_color("none")
ax.spines["top"].set_color("none")

fig.savefig("test_figure.pdf")
plt.show()

See [https://matplotlib.org/3.1.1/gallery/style_sheets/style_sheets_reference.html>](https://matplotlib.org/3.1.1/gallery/style_sheets/style_sheets_reference.html>)for examples of the default styles. If you check the actual pdf file that is saved by the above code, you may find that some of the label text is cut off. To prevent this, we add the `fig.tight_layout()` command before `plt.show`. This is particularly important if you place more than one plot into a figure.

So now, we have a pretty good template we can use in our code!  . It is highly recommended to add this to your collection of useful code snippets.



In [1]:
import matplotlib.pyplot as plt

plt.style.use("ggplot")
fig: plt.Figure
ax: plt.Axes

fig, ax = plt.subplots()

# plot data
ax.scatter(Age, d34S, color="C0")
ax.plot(Age, d34S, color="C1")

# plot options
fig.set_size_inches(6, 4)
ax.set_title("My first plot")
ax.set_xlabel("Age [Ma]")
ax.set_ylabel("$\delta^{34}$S [$^0/_{00}$ VCDT] ")
ax.legend(["Yao et al. 2018"], frameon=False)
ax.text(55, 18.75, "Some text")
ax.spines["right"].set_color("none")
ax.spines["top"].set_color("none")
fig.tight_layout()
fig.savefig("test_figure.pdf")

plt.show()

### Adding error bars



Quite often, it is important to show error bars with your measurements.  



In [1]:
import matplotlib.pyplot as plt

plt.style.use("ggplot")
fig: plt.Figure
ax: plt.Axes

fig, ax = plt.subplots()  # create plot canvas

ax.scatter(Age, d34S, color="C0")
ax.plot(Age, d34S, color="C1")

# Add error bars
ax.errorbar(
    Age,  # x values
    d34S,  # y values
    yerr=d34Serror,  # y-error, single value or list of values
    color="C0",  # color
    fmt="none",  # plot only error bars
)

# set title and labels
fig.set_size_inches(6, 4)  # set figure size
ax.set_title("Plotting error bars")  # the plot title
ax.set_xlabel("Age [Ma]")  # x label
ax.set_ylabel("$\delta^{34}$S [VCDT] ")  # y-label
ax.legend(
    ["$\delta^{34}$S Yao et al. 2018"], loc=(0.02, 0.9), frameon=False
)  # The legend wo frame
ax.spines["top"].set_color("none")  # do not display the right spine
fig.tight_layout()  # tighten up layout
plt.show()  # render the figure

### Recap



-   Matplotlib refers to the figure canvas as `figure` and to
    individual plot(s) within a figure as `axes`
    -   Matplotlib supports a procedural as well as an object-oriented
        interface. Commands used in both approaches, often have similar
        names but different meanings. Care must be taken to differentiate
        between both methods when perusing examples found in books or the
        on the internet.
    -   The object-oriented interface allows you to modify plot elements
        in great detail (there is basically no limit), but it may be
        tedious to so.
    -   Matplotlib plots can be styled see: [style-sheets-reference.html](https://matplotlib.org/3.1.1/gallery/style_sheets/style_sheets_reference.html)
    -   No one can remember all the different plot commands. So it is best
        to keep a generic code template

