<a href="https://colab.research.google.com/github/zliobaite/TBIteaching2024/blob/main/Practical1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Practical 1: NOW database**

In this exercise you will explore the NOW database of fossil mammals.

Task: compare the number of mammalian fossil localities in Kenya, Tanzania and Ethiopia over the time period from 34 million years ago to today. Does the number of localities increase/decrease/stay similar over time? How does the number of localities compare accross countries through time?


**Step 1:** explore the database

Go to the NOW database https://nowdatabase.luomus.fi/ . Go to "Locality" tab. Enter "Kenya" in the "Country or Continent" search box. How many localities are found? Try to open one or a few, see what is in there.

**Step 2:** download a csv with a list of localities from Kenya, Tanzania and Ethiopia

Enter "Kenya,Tanzania,Ethiopia" in the "Country or Continent" search box. How many localities are found?

Click "Export". In the pop up box tick "Export to file" and click "Selected localities/species". Rename the file with a shorter name for convenience, for example "now_KeTaEt.csv". Open the csv, explore it.

**Step 3:** read the data from the csv

In [None]:
# deleting possible previous uploads
import os
os.remove("now_KeTaEt.csv")

from google.colab import files
uploaded = files.upload()

In [None]:
import pandas as pd
import io

df = pd.read_csv(io.StringIO(uploaded['now_KeTaEt.csv'].decode('utf-8')),delimiter='\t')
df.head()
#df.columns
#df.loc[0].at['MAX_AGE']
#df.loc[0].at['MIN_AGE']


**Step 4:** compute middle age point for each locality

In [None]:
MID_AGE = (df['MAX_AGE'] + df['MIN_AGE'])/2
MID_AGE
# add MID_AGE column
df.insert(8,'MID_AGE',MID_AGE,True)
df.head()

**Step 5:** assign localities to time bins (1 Myr bins)

In [None]:
# find maximum mid age
max_mid_age_total = df['MID_AGE'].max()

import numpy as np
# round up to the nearest million
max_mid_age_total = np.ceil(max_mid_age_total)
max_mid_age_total

# creat a sequence of integers from 1 to maximum middle age
bin_edges = np.arange(1, max_mid_age_total+1)
bin_edges = np.insert(bin_edges, 0, 0)
bin_edges

# adding a column with indicating the bin assignment
df['bin_number'] = pd.cut(df['MID_AGE'], bins=bin_edges, right=False)
df.head()




Step 6: plotting the number of localities per bin over time

In [None]:
# Group by bin_number and count occurrences
bin_counts = df.groupby('bin_number').size()
bin_counts

bin_midpoints = [(bin_edges[i] + bin_edges[i+1]) / 2 for i in range(len(bin_edges)-1)]

import matplotlib.pyplot as plt
# Plotting
plt.figure(figsize=(10, 6))  # Adjust figure size as needed
plt.plot(bin_midpoints, bin_counts.values, marker='o', linestyle='-', color='g')
plt.xlabel('Time bin mid point, Myr')
plt.ylabel('Number of localities')
plt.show()

**Step 7:** plotting a separate time series for each country

In [None]:
# Plotting separate time series for each country
plt.figure(figsize=(12, 8))

for country in df['COUNTRY'].unique():
    country_df = df[df['COUNTRY'] == country]
    # Group by bin_number and count occurrences
    bin_counts = country_df.groupby(pd.cut(country_df['MID_AGE'], bins=bin_edges)).size()
    plt.plot(bin_midpoints, bin_counts.values, marker='o', linestyle='-', label=country)

plt.xlabel('Time bin mid point, Myr')
plt.ylabel('Number of localities')
plt.title('Loality count by country')
plt.legend()
plt.show()

**Step 8:** normalize by the area of the country

In [None]:
# areas of the countries in the km2
area_data = {
    'Kenya': 580,
    'Tanzania': 974,
    'Ethiopia': 1112
}

plt.figure(figsize=(12, 8))

for country in df['COUNTRY'].unique():
    country_df = df[df['COUNTRY'] == country]
    bin_counts = country_df.groupby(pd.cut(country_df['MID_AGE'], bins=bin_edges)).size()
    # Normalize by country area
    country_area = area_data[country]
    normalized_counts = bin_counts / country_area
    plt.plot(bin_midpoints, normalized_counts.values, marker='o', linestyle='-', label=country)

plt.xlabel('Time bin mid point, Myr')
plt.ylabel('Localities per km2)')
plt.title('Normalized density of localities by country')
plt.legend()
plt.show()