# Creating figures and tables: source codes

This notebook contains codes used in the 'creating figures and tables' activity.

#### Update pandas version

Run each of the next two cells only once in sequence. Then comment them out and restart the kernel. This will update the pandas version available to you to the most recent one, which is needed for rendering tables in markdown using `.to_markdown()`.

In [1]:
# pip install pandas==1.2.4

In [2]:
# pip install tabulate

In [3]:
import numpy as np
import pandas as pd
import altair as alt
from sklearn.decomposition import PCA

### Data

Import and tidy diatom relative abundance data.

In [4]:
# import diatom data
diatoms_raw = pd.read_csv('data/barron-diatoms.csv')
diatoms_raw.head(5)

# replace NaNs by 0
diatoms_mod1 = diatoms_raw.fillna(0)
diatoms_mod1.loc[4:5, :]

# set depth, age to indices and drop num.counted
diatoms_mod2 = diatoms_mod1.set_index(['Depth', 'Age'])

# store sample sizes
sampsize = diatoms_mod2['Num.counted']

# divide
diatoms_mod3 = diatoms_mod2.div(sampsize, axis = 0)

# drop num.counted and reset index
diatoms = diatoms_mod3.drop(columns = 'Num.counted').reset_index()

### Figure 1

Sea surface temperature plot to include with introduction.

In [5]:
# import sea surface temp reconstruction
seatemps = pd.read_csv('data/barron-sst.csv')

# line plot of time series
line = alt.Chart(seatemps).mark_line().encode(
    x = alt.X('Age', title = 'Thousands of years before present'),
    y = 'SST'
)

# highlight region with large variations
highlight = alt.Chart(
    pd.DataFrame(
        {'SST': np.linspace(0, 14, 100), 
         'upr': np.repeat(11, 100), 
         'lwr': np.repeat(15, 100)}
    )
).mark_area(opacity = 0.2, color = 'orange').encode(
    y = 'SST',
    x = alt.X('upr', title = 'Thousands of years before present'),
    x2 = 'lwr'
)

# add smooth trend
smooth = line.transform_loess(
    on = 'Age',
    loess = 'SST',
    bandwidth = 0.2
).mark_line(color = 'black')

# layer
fig1_temp = line + highlight + smooth

# display
fig1_temp

To save this figure, click the '...' icon and save in the format of your choice. (SVG tends to preserve resolution quite well; some of the other file formats are grainy.) This will initiate a download. Then, re-upload the file into your 'figures' directory.

Good practices:
* store your figure as its own chart in your code notebook (here named `fig1_temp`);
* name the chart in correspondence with the figure name it will be given in the report.

A naming convention that works for me is '`fig` + [figure number] + `_` + [descriptive name]'.

### Table 1

Example rows to include in report with data description.

Good practices:
* store your table exactly as you want it to appear as its own dataframe;
* name the dataframe in correspondence with the table name it will be given in the report.

A naming convention that works for me is '`tbl` + [table number] + `_` + [descriptive name]'. If you were to export a table as a .csv, give it the same name.

In [6]:
# render for report
tbl1_datarows = diatoms.head()

print(tbl1_datarows.to_markdown())

|    |   Depth |   Age |    A_curv |    A_octon |   ActinSpp |   A_nodul |   CoscinSpp |   CyclotSpp |   Rop_tess |   StephanSpp |
|---:|--------:|------:|----------:|-----------:|-----------:|----------:|------------:|------------:|-----------:|-------------:|
|  0 |    0    |  1.33 | 0.0248756 | 0.00995025 |   0.159204 | 0.0696517 |    0.104478 |    0.109453 | 0.00497512 |   0.00497512 |
|  1 |    0.05 |  1.37 | 0.04      | 0.01       |   0.155    | 0.08      |    0.1      |    0.08     | 0.035      |   0.01       |
|  2 |    0.1  |  1.42 | 0.04      | 0.03       |   0.165    | 0.09      |    0.145    |    0.035    | 0.005      |   0.005      |
|  3 |    0.15 |  1.46 | 0.055     | 0.005      |   0.105    | 0.005     |    0.06     |    0.14     | 0.125      |   0.015      |
|  4 |    0.2  |  1.51 | 0.0366667 | 0.00333333 |   0.126667 | 0.01      |    0.06     |    0.08     | 0.01       |   0          |


### Figure 2

Means plus/minus 2 standard deviations of relative abundances for each taxon.

In [None]:
# summary statistics conditional on age > 11
grouped_summary = diatoms.iloc[:, 2:10].groupby(
    diatoms.Age > 11
).aggregate(
    ['mean', 'std']
).transpose().melt(
    ignore_index = False
).reset_index().pivot(
    index = ['level_0', 'Age'],
    columns = 'level_1',
    values = 'value'
).reset_index().rename(
    columns = {'level_0': 'taxon'}
)

# means before and after 11k yrs bp
points = alt.Chart(grouped_summary).mark_point().encode(
    x = alt.X('mean', title = 'Average relative abundance'),
    y = alt.Y('Age', title = '', axis = None),
    color = alt.Color('Age', title = 'Before 11KyrBP')
)

# variability about means
bars = alt.Chart(grouped_summary).transform_calculate(
    lwr = 'datum.mean - 2*datum.std',
    upr = 'datum.mean + 2*datum.std'
).mark_errorbar().encode(
    x = alt.X('lwr:Q', title = 'Average relative abundance'), 
    x2 = 'upr:Q',
    y = alt.Y('Age', title = '', axis = None),
    color = alt.Color('Age', title = 'Before 11KyrBP')
)

# layer
fig2 = (points + bars).facet(
    row = alt.Row('taxon', 
                  title = None, 
                  header = alt.Header(labelAngle = 0, 
                                      labelAlign = 'left'))
).configure_facet(spacing = 0)

# display
fig2