## Notebook 4 - Comparison of RCT output to inventory data. 

This notebook compares the distibutions of DBH and tree heights between the inventory data and RCT. It is a quick method to check the quality of the RCT outputs and can show us where improvements need to be made.

In [None]:
# Suppress depreciation warnings
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

# Import the required modules
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats as stats
import helper_functions

### Read in tree data from the parsed and translated RCT data.

In [None]:
rct_df = pd.read_csv('../data/rct_translated_joined_data.csv')
rct_df

### Load in Inventory data.

In [None]:
# Read in as pandas dataframe
inventory_df = pd.read_csv('../data/inventory_data.csv')

# Filter to only include the data for Mak1
inventory_df = inventory_df[inventory_df['plot_id'] == 'Mak1']

# Convert the inventory DBH to meters
inventory_df['dbh'] = inventory_df['dbh'] / 100

inventory_df

### Scatter plot of Inventory and RCT DBH / Height

In [None]:
plt.figure(figsize=(10, 10))
plt.scatter(inventory_df['dbh'], inventory_df['height'])
plt.scatter(rct_df['DBH'], rct_df['height'])
plt.xlabel('DBH')
plt.ylabel('Height')
plt.title('DBH vs Height')
plt.legend(['Inventory', 'RCT'])

### Plot comparison of DBH and height distributions

In [None]:
# Plotting the DBH histogram
plt.figure(figsize=(10, 6))
plt.hist(rct_df['DBH'], bins=30, alpha=0.5, label='RCT DBH', color='blue')
plt.hist(inventory_df['dbh'], bins=30, alpha=0.5, label='Inventory DBH', color='green')
plt.xlabel('DBH (meters)')
plt.ylabel('Frequency')
plt.title('Distribution of DBH')
plt.legend()
plt.show()

In [None]:
# Plotting the height histogram
plt.figure(figsize=(10, 6))
plt.hist(rct_df['height'], bins=30, alpha=0.5, label='RCT DBH', color='blue')
plt.hist(inventory_df['height'], bins=30, alpha=0.5, label='Inventory DBH', color='green')
plt.xlabel('DBH (meters)')
plt.ylabel('Frequency')
plt.title('Distribution of Tree Height')
plt.legend()
plt.show()

To create a smooth line representation of the distribution of DBH data, you can use kernel density estimation (KDE) instead of plotting the raw histogram data. KDE will provide a smooth curve that estimates the probability density function of the variable. 

In [None]:
# Compute DBH histogram data
# Define the range over which to plot the KDE
x = np.linspace(0, max(rct_df['DBH'].max(), inventory_df['dbh'].max()), 1000)

# Calculate KDE for each dataset
rct_kde = stats.gaussian_kde(rct_df['DBH'])
inv_kde = stats.gaussian_kde(inventory_df['dbh'])

# Plotting the KDEs
plt.figure(figsize=(10, 6))
plt.plot(x, rct_kde(x), label='RCT DBH', color='blue')
plt.plot(x, inv_kde(x), label='Inventory DBH', color='green')
plt.xlabel('DBH (meters)')
plt.ylabel('Density')
plt.title('Smoothed Distribution of DBH')
plt.legend()
plt.show()

In [None]:
# Compute Height histogram data
# Define the range over which to plot the KDE
x = np.linspace(0, max(rct_df['height'].max(), inventory_df['height'].max()), 1000)

# Calculate KDE for each dataset
rct_kde = stats.gaussian_kde(rct_df['height'])
inv_kde = stats.gaussian_kde(inventory_df['height'])

# Plotting the KDEs
plt.figure(figsize=(10, 6))
plt.plot(x, rct_kde(x), label='RCT Height', color='blue')
plt.plot(x, inv_kde(x), label='Inventory Height', color='green')
plt.xlabel('Height (meters)')
plt.ylabel('Density')
plt.title('Smoothed Distribution of Tree Height')
plt.legend()
plt.show()