# pystrat-tutorial

This Jupyter notebook is intended to provide a minimal working example (MWE) of core pystrat functionality. The pystrat package can be found here: https://github.com/yuempark/pystrat

The user is encouraged to read the docstrings for each function for further information.

Note that all data used in this notebook is modified or fabricated for instructional purposes, and does not accurately reflect real collected data.

In [1]:
# built in modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# pystrat
from pystrat import pystrat

To plot figures inline with text/code, we use the Jupyter "magic" command below.

In [2]:
%matplotlib inline 

You can save any figure by adding the following line before the `plt.show()` command:
```python
plt.savefig('<name_of_figure>.pdf')
```
The format of the saved figure is specified with the extension of the file name (example above uses pdf).

By default, matplotlib exports figures with all labels and text annotations converted to vector paths. However, it is often preferable to preserve these as text objects so that font, font size, etc. can easily be manipulated when importing the figure into Illustrator, for example. To do this, add the one of the following lines (depending on your preferred format) to a new cell and execute it.
```python
matplotlib.rcParams['svg.fonttype'] = 'none'
matplotlib.rcParams['pdf.fonttype'] = 42
```

In [3]:
litho = pd.read_csv('example-data/lithostratigraphy.csv')
litho.head()

Unnamed: 0,THICKNESS,LITHOLOGY,GRAIN_SIZE,COLOUR,FEATURES
0,14.8,si,sts,purple,
1,0.2,tuff,tuff,tuff,
2,1.5,si,sts,purple,
3,4.8,cover,cover,cover,
4,3.7,si,sts,purple,


In [4]:
test = pystrat.Section(litho['THICKNESS'], litho['LITHOLOGY'])

In [5]:
test.total_thickness

2435.5

In [6]:
test.add_facies_attribute('grain_size', litho['GRAIN_SIZE'])
test.add_facies_attribute('color', litho['COLOUR'])
test.add_facies_attribute('features', litho['FEATURES'])

In [7]:
test.features

array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, 'tectonized', nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, 'tectonized', nan, nan,
       nan, 'tectonized', nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, 'tectonized', nan, nan, nan, 'tectonized',
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 'tectonized',
       nan, nan, nan, nan, 'tectonized', nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, 'tectonized', nan, nan, nan,
       nan, 'tectonized', nan, 'tectonized', nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, 'tectonized', nan, 'tectonized', nan, nan,
       nan, nan, 'tectonized', nan, nan, nan, nan, nan, nan, 'tectonized',
       nan, nan], dtype=object)

In [8]:
chemo = pd.read_csv('example-data/chemostratigraphy.csv')
chemo.head()

Unnamed: 0,CARB_SAMPLE,CARB_REMARKS,CARB_HEIGHT,CARB_UNIT,CARB_d13C,CARB_d18O
0,1.5,,1.5,0,0.766,-0.814
1,5.6,,5.6,0,1.737,0.893
2,12.0,,12.0,0,1.882,0.732
3,21.4,,21.4,4,1.398,-0.682
4,25.5,,25.5,6,1.661,-0.516


In [9]:
test.add_data_attribute('d13C', chemo['CARB_HEIGHT'], chemo['CARB_d13C'])

In [11]:
test.d13C.values

array([ 0.766,  1.737,  1.882,  1.398,  1.661,  0.744,  0.911,  1.485,
        1.284,  0.788,  1.7  ,  2.366,  2.595,  0.37 ,  1.965,  1.584,
        1.19 ,  1.978,  1.85 ,  1.925,  1.752,  2.305,  2.032, -3.53 ,
        4.127,  3.945,  4.716,  4.903,  5.565,  5.994,  5.968,  6.284,
        5.2  ,  6.365,  7.059,  7.019,  6.85 ,  7.035,  7.22 ,  7.146,
        5.267,  7.005,  7.071,  6.598,  7.178,  5.789,  5.561,  5.677,
        5.666,  4.176,  5.937,  5.512,  5.538,  5.437,  4.612,  5.955,
        4.503,  5.339,  5.683,  4.12 ,  6.018,  5.719,  6.641,  6.595,
        6.463,  6.454,  6.727,  6.719,  6.799,  6.61 ,  6.701,  6.89 ,
        6.097,  5.819,  6.525,  6.44 ,  6.65 ,  6.892,  7.264,  6.887,
        6.995,  6.428,  6.253,  7.002,  6.891,  7.109,  7.089,  6.816,
       16.429,  7.078,  5.266,  6.553,  6.682,  6.829,  6.946,  6.863,
        5.859,  5.841,  4.913,  5.165,  5.12 ,  6.534,  6.513,  6.363,
        6.535,  6.446,  6.611,  6.909,  6.51 ,  6.749,  6.658,  6.457,
      

# Needs Review

Due to the new OOP system.

## Plot a Stratigraphic Section

Import the data - note that it must follow the formatting of `data_template.csv`. The data .csv must contain at least two headers: one of these headers MUST be named 'THICKNESS'. Other columns may be named whatever the user desires.

pystrat functions used:

* `read_data`

In [None]:
data = read_data('templates/data_template.csv', header=4)
data.head()

Import the formatting - note that it must follow the formatting of `formatting_template.csv`:

* Columns 1-4 are used to set the colour of the boxes
    * columns 1-3 must be called `r`, `g`, and `b` (for red, green, and blue), and values in columns 1-3 must be between 0-255
    * the header of column 4 must match one of the headers used in the data .csv, and all values in the data must be a subset of the values in this column
* Columns 6-7 are used to set the width of the boxes
    * column 6 must be called `width`
    * the header of column 7 must match one of the headers used in the data .csv, and all values in the data must be a subset of the values in this column
    * column 5 should be left blank for readability.

pystrat functions used:

* `read_formatting`

In [None]:
formatting = read_formatting('templates/formatting_template.csv')

Integrity check - check that values in the data are a subset of values in the formatting:

pystrat functions used:

* `integrity_check`

In [None]:
integrity_check(data, formatting)

Plot:

pystrat functions used:

* `initiate_figure`
* `add_data_axis`

In [None]:
# set up the strat ratio which sets the vertical scale of the section
strat_ratio = 0.004

# initiate the figure and set size
fig, ax = initiate_figure(data, formatting, strat_ratio, figwidth=12,
                          width_ratios=[1,1,1,1,0.5,0.5], linewidth=0.5)

# add data for d13C
add_data_axis(fig, ax, 2, data['CARB_d13C'], data['CARB_HEIGHT'], 'scatter')
ax[2].set_xlabel('$\delta^{13}$C')
ax[2].set_xlim(-15,10)
ax[2].set_xticks([-15,-10,-5,0,5,10])
ax[2].xaxis.grid(ls='--')

# add data for d18O
add_data_axis(fig, ax, 3, data['CARB_d18O'], data['CARB_HEIGHT'], 'scatter',
              color='orange')
ax[3].set_xlabel('$\delta^{18}$O')
ax[3].set_xlim(-20,5)
ax[3].set_xticks([-20,-15,-10,-5,0,5])
ax[3].xaxis.grid(ls='--')

# add height of paleomag samples
add_data_axis(fig, ax, 4, np.zeros(len(data['PM_HEIGHT'])), data['PM_HEIGHT'],
              'scatter', color=[0.8, 0.2, 0.6])
ax[4].set_xticks([])
ax[4].set_title('pmag sites')

# add height of ash samples
add_data_axis(fig, ax, 5, np.zeros(len(data['ASH_HEIGHT'])), data['ASH_HEIGHT'],
              'scatter', color='#AF4A34',marker='*')
ax[5].set_xticks([])
ax[5].set_title('ash')

plt.show(fig)

## LOWESS Fitting

Locally weighted scatter plot smoothing. To perform the basic LOWESS:

pystrat functions used:

* `lowess_fit`

In [None]:
# the LOWESS fit
height_LOWESS, d13C_LOWESS = lowess_fit(data['CARB_HEIGHT'], data['CARB_d13C'], frac=0.5)

# plot the results
fig, ax = plt.subplots(figsize=(10,3))

ax.scatter(data['CARB_HEIGHT'], data['CARB_d13C'], c='C7', label='data')
ax.plot(height_LOWESS, d13C_LOWESS, c='C1', lw=3, label='LOWESS')
ax.yaxis.grid(ls='--')
ax.set_xlabel('stratigraphic height [m]')
ax.set_ylabel('$\delta^{13}$C')
ax.legend()

plt.show(fig)

To normalize the data against the LOWESS fit:

pystrat functions used:

* `lowess_normalize`

In [None]:
# the LOWESS fit
height_LOWESS, d13C_LOWESS, d13C_norm = lowess_normalize(data['CARB_HEIGHT'], data['CARB_d13C'], frac=0.5)

# plot the results
fig, ax = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(10,7))

ax[0].scatter(data['CARB_HEIGHT'], data['CARB_d13C'], c='C7', label='data')
ax[0].plot(height_LOWESS, d13C_LOWESS, c='C1', lw=3, label='LOWESS')
ax[0].yaxis.grid(ls='--')
ax[0].set_ylabel('$\delta^{13}$C')
ax[0].legend()

ax[1].scatter(data['CARB_HEIGHT'], d13C_norm, c='C1', label='normalized data')
ax[1].yaxis.grid(ls='--')
ax[1].set_ylabel('normalized $\delta^{13}$C')
ax[1].legend()

plt.show(fig)

## Calculate Stratigraphic Thickness

Given the following for the start and end points, calculate a stratigraphic thickness between two points:

* latitude (decimal degrees)
* longitude (decimal degrees)
* elevation (m)
* strike of bedding (RHR)
* dip of bedding

pystrat functions used:

* `calculate_stratigraphic_thickness`

In [None]:
# the data
lat = (28.00181863, 28.00386025)
lon = (108.7997006, 108.8009878)
elev = (1159, 1167)
strike = (342, 335)
dip = (41, 36)

d = calculate_stratigraphic_thickness(lat, lon, elev, strike, dip)

print('The stratigraphic distance between the two points is: ' + str(np.round(d,2)) + ' m')

If you have a .csv with a number of stratigraphic thicknesses that need to be calculated, there is a function for that too. Note that the .csv must follow the formatting of `covers_template.csv`:

pystrat functions used:

* `calculate_stratigraphic_thickness_csv`

In [None]:
covers = calculate_stratigraphic_thickness_csv('templates/covers_template.csv')
covers

## Calculate Distance to Specified Units

Calculate the closest stratigraphic distance of each sample to a set of units:

pystrat functions used:

* `distance_to_units`

In [None]:
# let's say we are interested in seeing how close the samples are to either 'cover' or 'tuff' units
units = ['cover', 'tuff']

# the header of the column in 'data' in which to find the units in 'units'
unit_header = 'GRAIN_SIZE'

# run the function
unit_d = distance_to_units(data, data['CARB_HEIGHT'], units, unit_header)

We can plot the data, with samples within, say, 60cm of the specified units coloured differently:

In [None]:
# initiate the figure
fig, ax = initiate_figure(data, formatting, strat_ratio, 6, [1,1], linewidth=0.5, features=False)

# add data for d13C that is below the threshold
add_data_axis(fig, ax, 1, data[unit_d<0.6]['CARB_d13C'], data[unit_d<0.6]['CARB_HEIGHT'], 'scatter',
              color='C1', label='<60cm')

# add data for d13C that is above the threshold
add_data_axis(fig, ax, 1, data[unit_d>=0.6]['CARB_d13C'], data[unit_d>=0.6]['CARB_HEIGHT'], 'scatter',
              color='C7', label='>=60cm')

# prettify
ax[1].set_xlabel('$\delta^{13}$C')
ax[1].set_xlim(-15,10)
ax[1].set_xticks([-15,-10,-5,0,5,10])
ax[1].xaxis.grid(ls='--')
ax[1].legend()

plt.show(fig)

## Assign Units to Samples

The functionality described here is useful if the user wishes to assign a stratigraphic unit (taken from the data .csv) to collected samples. This function also corrects for addition/subtraction errors made in the field, and assigns the correct stratigraphic height to collected samples.

Note that this function has been designed specifically around sample collection/logging conventions used in the Swanson-Hysell Group, and may not apply if your conventions differ:

* the `recorded_height` parameter is the height of each sample as recorded in the field
* the `remarks` parameter is used to denote calculation errors made in the field
    * if the true height of the sample is X m above the recorded height, write `ADD X`
    * if the true height of the sample is X m below the recorded height, write `SUB X`
    * these corrections only need to be noted at the first sample where the correction comes into effect (i.e. it will apply to all following samples until a new remark is found)
    
Below is an example of the recommended way to apply these functions:

pystrat functions used:

* `sample_curate`

In [None]:
# do the calculations
data_sample_info = sample_curate(data, data['CARB_SAMPLE'], data['CARB_REMARKS'])

The code will flag samples that are on unit boundaries by adding 0.5 to the unit number. User input is required to correctly identify which unit these samples belong to:

* if the sample comes from the lower unit, subtract 0.5
* if the sample comes from the upper unit, add 0.5

To make this manual process easier, the following snippet of code prints the code necessary to make adjustments to the unit number:

pystrat functions used:

* `print_unit_edit_code`

In [None]:
# print the code needed to edit samples that are on unit boundaries
print_unit_edit_code(data_sample_info, 'data_sample_info')

# also show the recorded height and true height of these samples
mask = (data_sample_info['unit'] != np.floor(data_sample_info['unit']))
data_sample_info[mask]

Copy and paste the code that was printed by `print_unit_edit_code` into a new cell (as below), and edit the unit numbers as appropriate:

In [None]:
data_sample_info.loc[101,'unit'] = 77
data_sample_info.loc[120,'unit'] = 81
data_sample_info.loc[257,'unit'] = 126

To save the correct height and sample unit to the original data .csv, you can simply copy and paste the output table to the original data .csv:

In [None]:
pd.set_option("display.max_rows",9999)
data_sample_info