# CSS 120: Environmental Data Science

## Paleoclimate

### Umberto Mignozzetti (UCSD)

(Based on Project Pythia and ClimateMatch)

## Today's Lecture

1. Paleoclimate proxies (physical characteristics of the environment that can stand in for direct measurements)
2. Reconstructions of paleo temperatures

## Packages


In [None]:
# To install
# !pip install LiPD --quiet
# !pip install pyleoclim --quiet
# !pip install climlab --quiet

## Packages


In [None]:
# imports
import os
import sys
from io import StringIO
import pandas as pd
import numpy as np
import pooch
import matplotlib.pyplot as plt
from matplotlib import patches
import tempfile
import lipd
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cartopy.io.shapereader as shapereader
import pyleoclim as pyleo

##  Helper functions


In [None]:
# Pooch Load
def pooch_load(filelocation=None, filename=None, processor=None):
    shared_location = "~/"
    user_temp_cache = tempfile.gettempdir()

    if os.path.exists(os.path.join(shared_location, filename)):
        file = os.path.join(shared_location, filename)
    else:
        file = pooch.retrieve(
            filelocation,
            known_hash=None,
            fname=os.path.join(user_temp_cache, filename),
            processor=processor,
        )

    return file

##  Helper functions


In [None]:
# Function to convert the PAGES2K LiDP files in a pandas.DataFrame
def lipd2df(
    lipd_dirpath,
    pkl_filepath=None,
    col_str=[
        "paleoData_pages2kID", "dataSetName", "archiveType", "geo_meanElev", "geo_meanLat",
        "geo_meanLon", "year", "yearUnits", "paleoData_variableName", "paleoData_units",
        "paleoData_values", "paleoData_proxy",
    ],
):
    """
    Convert a bunch of PAGES2k LiPD files to a `pandas.DataFrame` to boost data loading.

    If `pkl_filepath` isn't `None`, save the DataFrame as a pikle file.

    Parameters:
    ----------
        lipd_dirpath: str
          Path of the PAGES2k LiPD files
        pkl_filepath: str or None
          Path of the converted pickle file. Default: `None`
        col_str: list of str
          Name of the variables to extract from the LiPD files

    Returns:
    -------
        df: `pandas.DataFrame`
          Converted Pandas DataFrame
    """

    # Save the current working directory for later use, as the LiPD utility will change it in the background
    work_dir = os.getcwd()
    # LiPD utility requries the absolute path
    lipd_dirpath = os.path.abspath(lipd_dirpath)
    # Load LiPD files
    lipds = lipd.readLipd(lipd_dirpath)
    # Extract timeseries from the list of LiDP objects
    ts_list = lipd.extractTs(lipds)
    # Recover the working directory
    os.chdir(work_dir)
    # Create an empty pandas.DataFrame with the number of rows to be the number of the timeseries (PAGES2k records),
    # and the columns to be the variables we'd like to extract
    df_tmp = pd.DataFrame(index=range(len(ts_list)), columns=col_str)
    # Loop over the timeseries and pick those for global temperature analysis
    i = 0
    for ts in ts_list:
        if (
            "paleoData_useInGlobalTemperatureAnalysis" in ts.keys()
            and ts["paleoData_useInGlobalTemperatureAnalysis"] == "TRUE"
        ):
            for name in col_str:
                try:
                    df_tmp.loc[i, name] = ts[name]
                except:
                    df_tmp.loc[i, name] = np.nan
            i += 1
    # Drop the rows with all NaNs (those not for global temperature analysis)
    df = df_tmp.dropna(how="all")
    # Save the dataframe to a pickle file for later use
    if pkl_filepath:
        save_path = os.path.abspath(pkl_filepath)
        print(f"Saving pickle file at: {save_path}")
        df.to_pickle(save_path)
    return df

##  Helper functions


In [None]:
class SupressOutputs(list):
    def __enter__(self):
        self._stdout = sys.stdout
        sys.stdout = self._stringio = StringIO()
        return self

    def __exit__(self, *args):
        self.extend(self._stringio.getvalue().splitlines())
        del self._stringio  # free up some memory
        sys.stdout = self._stdout##  Helper functions


## Introduction to PAGES2k

There are various types of [paleoclimate archives and proxies](http://wiki.linked.earth/Climate_Proxy) that can be used to reconstruct past changes in Earth's climate. For example:

- **Sediment Cores**: Sediments deposited in layers within lakes and oceans serve as a record of climate variations over time.

- **Ice Cores**: Similarly to sediment cores, ice cores capture past climate changes in layers of ice accumulated over time. 
    + Common proxies for reconstructing past climate in ice cores include water isotopes, greenhouse gas concentrations of air bubbles in the ice, and dust.

- **Corals**: Corals form annual growth bands within their carbonate skeletons, recording temperature changes over time.

- **Speleothems**: These are cave formations that result from the deposition of minerals from groundwater. 

- **Tree Rings**: Each year, trees add a new layer of growth, known as a tree ring.

## Introduction to PAGES2k

There are many existing paleoclimate reconstructions spanning a variety of timescales and from global locations. 

One way to deal with the multiple sources is compiling all existing paleoclimate records for a single climate variable and over a specific time period.

This is the what the **PAGES2k network** do. 

The database consists of:

1. 692 records 
2. 648 locations
3. Variety of archives (e.g., trees, ice, sediment, corals, speleothems, etc.)
4; Span the Common Era (1 CE to present, i.e., the past ~2,000 years).

You can read more about the PAGES2k network, in [PAGES 2k Consortium (2017)](https://www.nature.com/articles/sdata201788).

## Get PAGES2k LiPD Files

The PAGES2k network is stored in a specific file format known as [Linked Paleo Data format (LiPD)](http://wiki.linked.earth/Linked_Paleo_Data).

LiPD files contain time series information in addition to supporting metadata (e.g., root metadata, location).

Pyleoclim (and its dependency package LiPD) leverages this additional information using LiPD-specific functionality.

Data stored in the `.lpd` format can be loaded directly as an Lipd object.

If the data_path points to *one LiPD file*, `lipd.readLipd.()` will load the specific record.

If data_path points to a *folder of LiPD files*, `lipd.readLipd.()` will load the full set of records. 

This function to read in the data is imbedded in the helper function above used to read the data in and convert it to a more usable format.

The first thing we need to do it to download the data.

## Get PAGES2k LiPD Files

In [None]:
# Set the name to save the PAGES2K data
fname = "pages2k_data"

# Download the data
lipd_file_path = pooch.retrieve(
    url="https://ndownloader.figshare.com/files/8119937",
    known_hash=None,
    path="./",
    fname=fname,
    processor=pooch.Unzip(),
)

## Get PAGES2k LiPD Files

Now we can use our helper function `lipd_2df()` to convert the LiPD files to a [Pandas dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

In [None]:
# convert all the lipd files into a DataFrame
fname = "pages2k_data"

with SupressOutputs():
    pages2k_data = lipd2df(
        lipd_dirpath=os.path.join(".", f"{fname}.unzip", "LiPD_Files")
    )

## Get PAGES2k LiPD Files

The PAGES2k data is now stored as a dataframe and we can view the data to understand different attributes it contains.


In [None]:
# print the top few rows of the PAGES2K data
pages2k_data.head()

## Plotting a Map of Proxy Reconstruction Locations

Now that we have converted the data into a Pandas dataframe, we can plot the PAGES2k network on a map to understand the spatial distribution of the temperature records and the types of proxies that were measured.

Before generating the plot, we have to define the colours and the marker types that we want to use in the plot.

We also need to set a list with the different `archive_type` names that appear in the data frame.

## Plotting a Map of Proxy Reconstruction Locations

In [None]:
# Markers and colors
markers = ["p", "p", "o", "v", "d", "*", "s", "s", "8", "D", "^"]
colors = [
    np.array([1.0, 0.83984375, 0.0]),
    np.array([0.73828125, 0.71484375, 0.41796875]),
    np.array([1.0, 0.546875, 0.0]),
    np.array([0.41015625, 0.41015625, 0.41015625]),
    np.array([0.52734375, 0.8046875, 0.97916667]),
    np.array([0.0, 0.74609375, 1.0]),
    np.array([0.25390625, 0.41015625, 0.87890625]),
    np.array([0.54296875, 0.26953125, 0.07421875]),
    np.array([1, 0, 0]),
    np.array([1.0, 0.078125, 0.57421875]),
    np.array([0.1953125, 0.80078125, 0.1953125]),
]

## Plotting a Map of Proxy Reconstruction Locations

In [None]:
# Plot
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.Robinson())
ax.set_title(f"PAGES2k Network (n={len(pages2k_data)})", fontsize=20, fontweight="bold")
ax.set_global()
ax.coastlines()
ax.add_feature(cfeature.LAND, facecolor="gray", alpha=0.3)
ax.gridlines(edgecolor="gray", linestyle=":")

# plot the different archive types
archive_types = pages2k_data.archiveType.unique()
for i, type_i in enumerate(archive_types):
    df = pages2k_data[pages2k_data["archiveType"] == type_i]
    # count the number of appearances of the same archive_type
    count = df["archiveType"].count()
    # generate the plot
    ax.scatter(df["geo_meanLon"],df["geo_meanLat"],marker=markers[i],c=colors[i],
        edgecolor="k",s=50,transform=ccrs.Geodetic(),label=f"{type_i} (n = {count})",)

# Legend
ax.legend(scatterpoints=1, bbox_to_anchor=(0, -0.4),loc="lower left",ncol=3,fontsize=15,)

# Reconstructing Past Changes in Ocean Climate

## Overview of Isotopes in Paleoclimate

Oxygen isotopes of corals can record changes in temperature associated with the phase of ENSO further back in time.

The two oxygen isotopes that are most commonly used in paleoclimate are oxygen 16 (<sup>16</sup>O), which is the which is the **"lighter"** oxygen isotope, and oxygen 18 (<sup>16</sup>O), which is the **"heavier"** oxygen isotope. 

![image-1.png](https://github.com/ClimateMatchAcademy/course-content/blob/main/tutorials/W1D4_Paleoclimate/images/t2_image1.png?raw=true)

Credit: [NASA Climate Science Investigations](https://www.ces.fau.edu/nasa/module-3/how-is-temperature-measured/isotopes.php)

## Overview of Isotopes in Paleoclimate

The two hydrogen isotopes that are most commonly used in paleoclimate are hydrogen (H), which is the **"lighter"** oxygen isotope, and deuterium (D), which is the **"heavier"** oxygen isotope. 

![image-2.png](https://github.com/ClimateMatchAcademy/course-content/blob/main/tutorials/W1D4_Paleoclimate/images/t2_image2.png?raw=true)

Credit: [NASA Climate Science Investigations](https://www.ces.fau.edu/nasa/module-3/how-is-temperature-measured/isotopes.php)

## Overview of Isotopes in Paleoclimate

Changes in the ratio of the heavy to light isotope can reflect changes in different climate variables, depending on geographic location and the material being measured.

The ratio represented in delta notation (δ) and in units of per mille (‰), and is calculated using the equation below (the same applies to the ratio of the heavy and light hydrogen isotopes)

## Assessing Variability Related to El Niño Using Pyleoclim Series

ENSO is a recurring climate pattern involving changes in SST in the central and eastern tropical Pacific Ocean.

Oxygen isotopes ([&delta;<sup>18</sup>O](https://en.wikipedia.org/wiki/Δ18O)) of corals are a commonly used proxy for reconstructing changes in tropical Pacific SST and ENSO.

Long-lived corals are well-suited for studying paleo-ENSO variability because they store decades to centuries of sub-annually resolved proxy information in the tropical Pacific. 

The oxygen isotopes of corals are useful for studying ENSO because they record changes in sea-surface temperature (SST), with more positive values of &delta;<sup>18</sup>O corresponding to colder SSTs, and vice-versa.

## Assessing Variability Related to El Niño Using Pyleoclim Series

One approach for detecting ENSO from coral isotope data is applying a 2- to 7-year bandpass filter to the &delta;<sup>18</sup>O records to highlight ENSO-related variability and compare (quantitatively) the bandpassed coral records to the Oceanic Niño Index (ONI). 

While we won't be going into this amount of detail, you may utilize the methods sections of these papers as a guide: [Cobb et al.(2003)](https://www.nature.com/articles/nature01779), [Cobb et al.(2013)](https://www.science.org/doi/10.1126/science.1228246).

Let us look at the &delta;<sup>18</sup>O records and comparing to a plot of the ONI without this band-pass filtering, in part to highlight why the filtering is needed.

## Load coral oxygen isotope proxy reconstructions

The two coral records we'll look at are from [Palmyra Atoll](https://en.wikipedia.org/wiki/Palmyra_Atoll) and [Line Islands](https://en.wikipedia.org/wiki/Line_Islands), both of which are in the tropical central Pacific Ocean.

Let's plot these approximate locations as well as the Niño 3.4 region.

In [None]:
fig = plt.figure(figsize=(9, 6))
ax = plt.axes(projection=ccrs.Robinson(central_longitude=180))
ax.stock_img()
ax.add_feature(cfeature.COASTLINE)
# Niño 3.4 region
rectangle = patches.Rectangle((170, -5),50,10,transform=ccrs.Geodetic(),
    edgecolor="k",facecolor="none",linewidth=3,)
ax.add_patch(rectangle)

rx, ry = rectangle.get_xy()
cx = rx + rectangle.get_width() / 2.0
cy = ry + rectangle.get_height() / 2.0

# add labels
ax.annotate(
    "Nino 3.4",
    (cx - 10, cy),
    color="w",
    transform=ccrs.PlateCarree(),
    weight="bold",
    fontsize=10,
    ha="center",
    va="center",
)

# add the proxy locations
ax.scatter(
    [-162.078333], [5.883611], transform=ccrs.Geodetic(), s=50, marker="v", color="w"
)
ax.scatter([-157.2], [1.7], transform=ccrs.Geodetic(), s=50, marker="v", color="w")

# add labels
ax.annotate("Palmyra Atoll", (-170, 10),
    color="w",transform=ccrs.Geodetic(),
    weight="bold",fontsize=10,
    ha="center",va="center",
)

ax.annotate(
    "Line Islands", (-144, -1),
    color="w",transform=ccrs.Geodetic(),
    weight="bold",fontsize=10,
    ha="center", va="center",
)

# change the map view to zoom in on central Pacific
ax.set_extent((120, 300, -25, 25), crs=ccrs.PlateCarree())

## Load coral oxygen isotope proxy reconstructions

To analyze and visualize paleoclimate proxy time series, we will be using [Pyleoclim](https://pyleoclim-util.readthedocs.io/en/latest/).

Pycleoclim is specifically designed for the analysis of paleoclimate data. 

The package is designed around object-oriented `Series`, which can be directly manipulated for plotting and time series-appropriate analysis and operation. 

The `Series` object describes the fundamentals of a time series. 

## Load coral oxygen isotope proxy reconstructions

To create a Pyleoclim `Series`, we first need to load the data set, and then specify values for its various properties:

*   `time`: Time values for the time series
*   `value`: Paleo values for the time series
*   `time_name` (optional): Name of the time vector, (e.g., 'Time', 'Age'). This is used to label the x-axis on plots
*   `time_unit` (optional): The units of the time axis (e.g., 'years')
*   `value_name` (optional): The name of the paleo variable (e.g., 'Temperature')
*   `value_unit` (optional): The units of the paleo variable (e.g., 'deg C')
*   `label` (optional): Name of the time series (e.g., 'Nino 3.4')
*   `clean_ts` (optional): If True (default), remove NaNs and set an increasing time axis

## Load coral oxygen isotope proxy reconstructions

A common data format for datasets downloaded from the NOAA Paleoclimate Database is a templated text file, which contains helpful data and metadata. 

Take a look at our two datasets [here](https://www.ncei.noaa.gov/pub/data/paleo/coral/east_pacific/palmyra_2003.txt) and [here](https://www.ncei.noaa.gov/pub/data/paleo/coral/east_pacific/cobb2013-fan-modsplice-noaa.txt).

The functionality in python allows us to ignore all of the information at the beginning of the text files, and we can load the data directly into a `pandas.DataFrame` using `.read_csv()`.

## Load Palmyra coral data

In [None]:
# download the data using the url
filename_Palmyra = "palmyra_2003.txt"
url_Palmyra = (
    "https://www.ncei.noaa.gov/pub/data/paleo/coral/east_pacific/palmyra_2003.txt"
)
data_path = pooch_load(
    filelocation=url_Palmyra, filename=filename_Palmyra
)  # open the file

# from the data set, we only want the data related to Modern Living Coral.
# this data is between row 6190 and 7539 of the dataset
rows = [int(row) for row in np.linspace(6190, 7539, 7539 - 6190 + 1)]

# use pandas to read in the csv file
palmyra = pd.read_csv(
    data_path,
    skiprows=lambda x: x
    not in rows,  # number of rows to skip based on definition of rows above
    sep="\s+",  # how the data values are seperated (delimited) : '\s+' = space
    encoding="ISO-8859-1",
    names=["CalendarDate", "d180"],
    header=None,
)

## Load Palmyra coral data

In [None]:
palmyra.head()

## Load Palmyra coral data

Now that we have the data in a dataframe, we can pull the relevant columns of this datframe into a `Series` object in Pyleoclim, which will allow us to organize the relevant metadata so that we can get a well-labeled, publication-quality plot:

In [None]:
ts_palmyra = pyleo.Series(
    time=palmyra["CalendarDate"],
    value=palmyra["d180"],
    time_name="Calendar date",
    time_unit="Years",
    value_name=r"$d18O$",
    value_unit="per mille",
    label="Palmyra Coral",
)

## Load Palmyra coral data

Since we want to compare datasets based on different measurements (coral &delta;<sup>18</sup>O and the ONI, i.e., a temperature anomaly), it's helpful to standardize the data by removing it's estimated mean and dividing by its estimated standard deviation. 

Thankfully Pyleoclim has a [function](https://pyleoclim-util.readthedocs.io/en/v0.7.4/core/ui.html#pyleoclim.core.ui.Series.standardize) to do that for us.

In [None]:
palmyra_stnd = ts_palmyra.standardize()
palmyra_stnd

## Load Line Island coral data

We will repeat these steps for the other dataset.

In [None]:
# Download the data using the url
filename_cobb2013 = "cobb2013-fan-modsplice-noaa.txt"
url_cobb2013 = "https://www.ncei.noaa.gov/pub/data/paleo/coral/east_pacific/cobb2013-fan-modsplice-noaa.txt"
data_path2 = pooch_load(
    filelocation=url_cobb2013, filename=filename_cobb2013
)  # open the file

# From the data set, we only want the data related to Modern Living Coral.
# So this data is between row 6190 and 7539 of the dataset
rows = [int(row) for row in np.linspace(127, 800, 800 - 127 + 1)]
line = pd.read_csv(
    data_path2,
    skiprows=lambda x: x not in rows,
    sep="\s+",
    encoding="ISO-8859-1",
    names=["age", "d18O"],
    header=None,
)

## Load Line Island coral data

In [None]:
line.head()

## Load Line Island coral data

In [None]:
ts_line = pyleo.Series(
    time=line["age"],
    value=line["d18O"],
    time_name="Calendar date",
    time_unit="Years",
    value_name=r"$d18O$",
    value_unit="per mille",
    label="Line Island Coral",
)

## Load Line Island coral data

In [None]:
line_stnd = ts_line.standardize()
line_stnd

## Plot the Data Using Multiseries

We will utilize the built in features of a [multiseries](https://pyleoclim-util.readthedocs.io/en/latest/core/api.html#multipleseries-pyleoclim-multipleseries) object to plot our coral proxy data side by side.

To create a `pyleo.MultipleSeries`, we first create a list with our `pyleo.Series` objects and then pass this into a `pyleo.MultipleSeries`.

In [None]:
# combine into a list
nino_comparison = [palmyra_stnd, line_stnd]

## Plot the Data Using Multiseries

In [None]:
# create multiseries
nino = pyleo.MultipleSeries(nino_comparison, name="El Nino Comparison")

## Plot the Data Using Multiseries

In [None]:
# plot the time series of both datasets
fig, ax = nino.stackplot(time_unit="year", xlim=[1960, 2000], colors=["b", "g"])

## Your turn

Recall that as SST becomes colder, &delta;<sup>18</sup>O becomes more positive, and vice versa. Compare the figure below of the ONI to the time series of coral &delta;<sup>18</sup>O you plotted above and answer the questions below.

![](https://climatedataguide.ucar.edu/sites/default/files/styles/extra_large/public/2022-03/indices_oni_2_2_lg.png?itok=Zew3VK_4)

Credit: [UCAR](https://climatedataguide.ucar.edu/sites/default/files/styles/extra_large/public/2022-03/indices_oni_2_2_lg.png?itok=Zew3VK_4)

1. Do the ENSO events recorded by the ONI agree with the coral data?
2. What are some considerations you can think of when comparing proxies such as this to the ONI?

## Make Warming Stripes Plot

We can also make a warming stripe for this data `Series` using the [`.stripes()`](https://pyleoclim-util.readthedocs.io/en/latest/core/api.html#pyleoclim.core.multipleseries.MultipleSeries.stripes) method, where darker red stripes indicate a warmer eastern Pacific and possibly an El Niño phase, and darker blue stripes indicate a cooler eastern Pacific and possibly La Niña phase.

Can you see the trend present in the data?

In [None]:
fig, ax = nino.stripes(
    ref_period=(1960, 1990), time_unit="year", show_xaxis=True, cmap="RdBu"
)

## Questions?

## See you next class!