# CHM Fractal Analysis: Complete Hypothesis Testing

**Site:** Barro Colorado Island, Panama  
**Forest Type:** Tropical Moist Forest (Neotropical Lowland Rainforest)  
**Data Source:** [Smithsonian ALS Panama 2023](https://smithsonian.dataone.org/datasets/ALS_Panama_2023/)

This notebook performs comprehensive fractal dimension analysis on Canopy Height Models (CHM) to test **all nine research hypotheses** from the Fractal Self-Affinity in Nature framework.

## Five Primary Testable Hypotheses

1. **Optimal Filling** - Old-growth forests maximize light interception, producing higher fractal dimensions
2. **Scale Invariance** - Steady-state forests show scale-invariant gap distributions (power-law decay)
3. **Zeta Distribution** - Canopy gaps follow power-law with exponent α ≈ 2.0, related to ζ(2)
4. **Universal Repulsion** - Dominant tree spacing follows Wigner-Dyson distribution (GUE statistics)
5. **Biotic Decoupling** - Established ecosystems show weak correlation with topographic variables

## Four Additional Spatial Distribution Hypotheses

6. **Fractal String Gap** - Gap sizes follow fractal string spectrum predictions
7. **Prime Number Repulsion (GUE)** - Large tree spacing shows prime-like repulsion patterns
8. **Complex Dimension Oscillation** - Log-periodic oscillations in canopy structure
9. **Riemann Gas Density** - Tree density follows Riemann gas statistical mechanics

## Tropical Forest Expectations

Based on BCI's multi-layered structure and high biodiversity:
- **Higher fractal dimension** (D > 2.5) due to 3-4 canopy layers
- **Strong scale invariance** (mature, undisturbed forest)
- **Power-law gap distribution** (natural disturbance dynamics)
- **Biotic decoupling** from topography (self-organized structure)

## 1. Environment Setup

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import rioxarray as rxr
from pathlib import Path
from datetime import datetime
import json
from scipy import ndimage
from scipy.stats import ks_2samp, pearsonr, spearmanr
from scipy.spatial import cKDTree
from scipy.special import zeta
from rasterio.enums import Resampling
import warnings
warnings.filterwarnings('ignore')

print(f"Environment ready at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

Environment ready at 2025-12-25 19:14:31


## 2. Configuration

In [2]:
# Site configuration
SITE_NAME = "bci_panama"
SITE_DESCRIPTION = "Barro Colorado Island - Tropical Moist Forest"

# Data paths
DATA_BASE = Path.home() / "data-store/data/output/smithsonian"
RAW_DATA = DATA_BASE / "raw"

# Input files
CHM_PATH = RAW_DATA / "chm" / "BCI_whole_2023_05_26_chm.tif"
DTM_PATH = RAW_DATA / "dtm" / "BCI_whole_2023_05_26_dtm.tif"

# Output directory
OUTPUT_DIR = DATA_BASE / "analysis" / "fractal_hypotheses"
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Analysis parameters
GAP_THRESHOLD = 2.0  # meters - pixels below this are gaps
DOMINANT_TREE_THRESHOLD = 35.0  # meters - emergent tree detection (higher for tropical)
LOCAL_MAX_WINDOW = 21  # pixels - for tree top detection
NODATA_VALUE = -9999

# Verify files exist
print("Input files:")
for name, path in [("CHM", CHM_PATH), ("DTM", DTM_PATH)]:
    exists = path.exists()
    size = f"{path.stat().st_size / 1e9:.2f} GB" if exists else "NOT FOUND"
    print(f"  [{('OK' if exists else 'MISSING')}] {name}: {path.name} ({size})")

print(f"\nOutput: {OUTPUT_DIR}")

Input files:
  [OK] CHM: BCI_whole_2023_05_26_chm.tif (1.14 GB)
  [OK] DTM: BCI_whole_2023_05_26_dtm.tif (1.14 GB)

Output: /home/jovyan/data-store/data/output/smithsonian/analysis/fractal_hypotheses


## 3. Load and Subset CHM

In [3]:
# Load CHM
print(f"Loading CHM from: {CHM_PATH.name}")
print("This may take a moment for the 1.1 GB file...")

chm_full = rxr.open_rasterio(CHM_PATH, masked=True)

print(f"\nFull CHM loaded:")
print(f"  Shape: {chm_full.shape}")
print(f"  CRS: {chm_full.rio.crs}")
print(f"  Resolution: {abs(chm_full.rio.resolution()[0]):.2f}m")

Loading CHM from: BCI_whole_2023_05_26_chm.tif
This may take a moment for the 1.1 GB file...

Full CHM loaded:
  Shape: (1, 11000, 13000)
  CRS: EPSG:32617
  Resolution: 0.50m


In [4]:
# Use subset for analysis (full island would take too long)
USE_SUBSET = True
SUBSET_SIZE = 2000  # pixels (1km x 1km at 0.5m resolution)

if USE_SUBSET:
    # Get center region of the island
    full_shape = chm_full.shape
    center_y = full_shape[1] // 2
    center_x = full_shape[2] // 2
    half_size = SUBSET_SIZE // 2
    
    chm = chm_full[:, 
                   center_y - half_size : center_y + half_size,
                   center_x - half_size : center_x + half_size]
    
    print(f"Using center subset: {SUBSET_SIZE}x{SUBSET_SIZE} pixels")
    print(f"  = {SUBSET_SIZE * 0.5 / 1000:.1f} km x {SUBSET_SIZE * 0.5 / 1000:.1f} km")
else:
    chm = chm_full
    print("Using full CHM")

# Get the data array
chm_array = chm.values.squeeze().astype(np.float64)

# Get NoData value
if chm.rio.nodata is not None:
    NODATA_VALUE = chm.rio.nodata

# Create masks
nodata_mask = np.isnan(chm_array) | (chm_array == NODATA_VALUE)
valid_mask = ~nodata_mask

# Get pixel resolution
pixel_resolution = abs(chm.rio.resolution()[0])

print(f"\nAnalysis array:")
print(f"  Shape: {chm_array.shape}")
print(f"  Valid pixels: {valid_mask.sum():,} ({100*valid_mask.sum()/chm_array.size:.1f}%)")
print(f"  Resolution: {pixel_resolution:.3f}m")

Using center subset: 2000x2000 pixels
  = 1.0 km x 1.0 km

Analysis array:
  Shape: (2000, 2000)
  Valid pixels: 3,999,996 (100.0%)
  Resolution: 0.500m


In [5]:
# Basic height statistics
valid_heights = chm_array[valid_mask]
valid_heights = valid_heights[(valid_heights >= 0) & (valid_heights <= 80)]

print("Height Statistics:")
print(f"  Min:    {np.min(valid_heights):.2f} m")
print(f"  Max:    {np.max(valid_heights):.2f} m")
print(f"  Mean:   {np.mean(valid_heights):.2f} m")
print(f"  Median: {np.median(valid_heights):.2f} m")
print(f"  Std:    {np.std(valid_heights):.2f} m")
print(f"  P95:    {np.percentile(valid_heights, 95):.2f} m")
print(f"  P99:    {np.percentile(valid_heights, 99):.2f} m")

Height Statistics:
  Min:    0.01 m
  Max:    54.88 m
  Mean:   24.18 m
  Median: 24.26 m
  Std:    8.63 m
  P95:    38.51 m
  P99:    43.38 m


## 4. Analysis Functions

Core functions for all 9 hypotheses.

In [6]:
def differential_box_counting(img, valid_mask, scales=None, min_valid_frac=0.8):
    """Differential Box Counting for self-affine surfaces (H1)."""
    rows, cols = img.shape
    M = min(rows, cols)

    valid_values = img[valid_mask]
    z_min_global, z_max_global = valid_values.min(), valid_values.max()
    z_range_global = z_max_global - z_min_global

    if z_range_global <= 0:
        return np.nan, 0.0, np.array([]), np.array([])

    if scales is None:
        max_scale = M // 4
        scales = [2**i for i in range(1, 12) if 2**i <= max_scale]

    Ns_list = []
    valid_scales = []

    for s in scales:
        nx, ny = cols // s, rows // s
        if nx < 1 or ny < 1:
            continue

        N_total = 0
        valid_boxes = 0

        for i in range(ny):
            for j in range(nx):
                y_start, y_end = i * s, (i + 1) * s
                x_start, x_end = j * s, (j + 1) * s

                box_data = img[y_start:y_end, x_start:x_end]
                box_valid = valid_mask[y_start:y_end, x_start:x_end]

                if box_valid.sum() / (s * s) < min_valid_frac:
                    continue

                valid_boxes += 1
                valid_heights = box_data[box_valid]
                z_min_box, z_max_box = valid_heights.min(), valid_heights.max()

                z_min_scaled = (z_min_box - z_min_global) / z_range_global * M
                z_max_scaled = (z_max_box - z_min_global) / z_range_global * M

                k = int(np.floor(z_min_scaled / s))
                l = int(np.floor(z_max_scaled / s))
                N_total += l - k + 1

        if valid_boxes > 0:
            Ns_list.append(N_total)
            valid_scales.append(s)

    if len(valid_scales) < 3:
        return np.nan, 0.0, np.array([]), np.array([])

    log_inv_s = np.log(1.0 / np.array(valid_scales))
    log_Ns = np.log(np.array(Ns_list))

    coeffs = np.polyfit(log_inv_s, log_Ns, 1)
    D = coeffs[0]

    predicted = np.polyval(coeffs, log_inv_s)
    ss_res = np.sum((log_Ns - predicted) ** 2)
    ss_tot = np.sum((log_Ns - np.mean(log_Ns)) ** 2)
    r2 = 1 - (ss_res / ss_tot) if ss_tot > 0 else 0

    return D, r2, np.array(valid_scales), np.array(Ns_list)

In [7]:
def gliding_box_lacunarity(img, valid_mask, box_sizes=None, min_valid_frac=0.5):
    """Lacunarity analysis for scale invariance (H2)."""
    rows, cols = img.shape
    img_clean = img.copy()
    img_clean[~valid_mask] = np.nan
    
    if box_sizes is None:
        max_size = min(rows, cols) // 4
        box_sizes = [2**i for i in range(2, 10) if 2**i <= max_size]
    
    lacunarity_values = []
    valid_sizes = []
    
    for r in box_sizes:
        box_sums = []
        step = max(1, r // 2)
        
        for i in range(0, rows - r + 1, step):
            for j in range(0, cols - r + 1, step):
                box = img_clean[i:i+r, j:j+r]
                box_valid = valid_mask[i:i+r, j:j+r]
                
                if box_valid.sum() / (r * r) >= min_valid_frac:
                    box_sums.append(np.nansum(box))
        
        if len(box_sums) >= 10:
            box_sums = np.array(box_sums)
            mu = np.mean(box_sums)
            L = (np.var(box_sums) / (mu ** 2)) + 1 if mu > 0 else 1.0
            lacunarity_values.append(L)
            valid_sizes.append(r)
    
    if len(valid_sizes) < 3:
        return np.array([]), np.array([]), 0.0, 0.0
    
    log_r = np.log(np.array(valid_sizes))
    log_L = np.log(np.array(lacunarity_values))
    
    coeffs = np.polyfit(log_r, log_L, 1)
    slope = coeffs[0]
    
    predicted = np.polyval(coeffs, log_r)
    ss_res = np.sum((log_L - predicted) ** 2)
    ss_tot = np.sum((log_L - np.mean(log_L)) ** 2)
    r2 = 1 - (ss_res / ss_tot) if ss_tot > 0 else 0
    
    return np.array(lacunarity_values), np.array(valid_sizes), r2, slope

In [8]:
def identify_gaps(chm, valid_mask, gap_threshold=2.0, min_gap_pixels=4):
    """Identify contiguous gap regions (H3)."""
    gap_mask = (chm < gap_threshold) & valid_mask
    labeled, num_features = ndimage.label(gap_mask)
    if num_features == 0:
        return np.array([]), labeled
    gap_areas = ndimage.sum(gap_mask, labeled, range(1, num_features + 1))
    return gap_areas[gap_areas >= min_gap_pixels], labeled


def fit_gap_distribution(gap_areas, resolution=1.0, min_fit_area=1.0):
    """Fit power law and exponential to gap sizes (H3)."""
    areas = gap_areas * (resolution ** 2)
    areas = areas[areas > 0]
    
    if len(areas) < 20:
        return {'error': f'Insufficient gaps ({len(areas)})'}
    
    sorted_areas = np.sort(areas)
    ecdf = np.arange(1, len(sorted_areas) + 1) / len(sorted_areas)
    ccdf = 1 - ecdf
    
    fit_mask = sorted_areas >= min_fit_area
    if np.sum(fit_mask) < 10:
        return {'error': f'Insufficient gaps >= {min_fit_area} m²'}
    
    fit_areas = sorted_areas[fit_mask]
    fit_ccdf = ccdf[fit_mask]
    
    try:
        mask = fit_ccdf > 0.01
        log_x = np.log(fit_areas[mask])
        log_ccdf = np.log(fit_ccdf[mask])
        
        pl_coeffs = np.polyfit(log_x, log_ccdf, 1)
        alpha_pl = -pl_coeffs[0] + 1
        
        predicted_pl = np.polyval(pl_coeffs, log_x)
        ss_res_pl = np.sum((log_ccdf - predicted_pl) ** 2)
        ss_tot = np.sum((log_ccdf - np.mean(log_ccdf)) ** 2)
        r2_pl = 1 - (ss_res_pl / ss_tot) if ss_tot > 0 else 0
        
        exp_coeffs = np.polyfit(fit_areas[mask], log_ccdf, 1)
        predicted_exp = np.polyval(exp_coeffs, fit_areas[mask])
        ss_res_exp = np.sum((log_ccdf - predicted_exp) ** 2)
        r2_exp = 1 - (ss_res_exp / ss_tot) if ss_tot > 0 else 0
    except:
        alpha_pl, r2_pl, r2_exp = np.nan, 0, 0
    
    return {
        'n_gaps': len(areas),
        'n_gaps_fitted': int(np.sum(fit_mask)),
        'min_area': float(areas.min()),
        'max_area': float(areas.max()),
        'median_area': float(np.median(areas)),
        'power_law': {'alpha': float(alpha_pl), 'r2': float(r2_pl)},
        'exponential': {'r2': float(r2_exp)},
        'zeta_2': np.pi**2 / 6,
        'sorted_areas': sorted_areas,
        'ccdf': ccdf
    }

In [9]:
def detect_local_maxima(img, valid_mask, window_size=21, height_threshold=35.0):
    """Detect tree tops as local maxima (H4, H7, H9)."""
    img_masked = np.where(valid_mask & (img >= height_threshold), img, -np.inf)
    local_max = ndimage.maximum_filter(img_masked, size=window_size)
    is_peak = (img_masked == local_max) & (img_masked > -np.inf)
    peak_coords = np.array(np.where(is_peak)).T
    peak_heights = img[is_peak]
    return peak_coords, peak_heights


def wigner_dyson_gue(s):
    """Wigner-Dyson distribution for GUE."""
    return (32 / np.pi**2) * s**2 * np.exp(-4 * s**2 / np.pi)


def compute_nearest_neighbor_spacing(coords, resolution=1.0):
    """Compute NN spacing distribution (H4)."""
    if len(coords) < 10:
        return np.array([]), 0.0
    
    coords_physical = coords * resolution
    tree = cKDTree(coords_physical)
    distances, _ = tree.query(coords_physical, k=2)
    nn_distances = distances[:, 1]
    mean_spacing = np.mean(nn_distances)
    return nn_distances / mean_spacing, mean_spacing


def fit_spacing_distribution(spacings):
    """Compare spacing to Wigner-Dyson and Poisson (H4)."""
    if len(spacings) < 10:
        return {'error': 'Insufficient trees'}
    
    n_samples = len(spacings) * 10
    
    # Wigner-Dyson samples
    wd_samples = []
    while len(wd_samples) < n_samples:
        s = np.random.uniform(0, 4, 1000)
        p = wigner_dyson_gue(s)
        accept = np.random.uniform(0, 1, 1000) < p / 0.5
        wd_samples.extend(s[accept].tolist())
    wd_samples = np.array(wd_samples[:n_samples])
    
    poisson_samples = np.random.exponential(1.0, n_samples)
    
    ks_wd, p_wd = ks_2samp(spacings, wd_samples)
    ks_poisson, p_poisson = ks_2samp(spacings, poisson_samples)
    
    return {
        'n_trees': len(spacings),
        'ks_wigner_dyson': float(ks_wd),
        'p_wigner_dyson': float(p_wd),
        'ks_poisson': float(ks_poisson),
        'p_poisson': float(p_poisson),
        'spacings': spacings
    }

In [10]:
def compute_chm_structure_metrics(chm_array, valid_mask, resolution=1.0):
    """Compute CHM structure metrics for H5."""
    window_size = 15
    chm_roughness = ndimage.generic_filter(
        np.where(valid_mask, chm_array, np.nan),
        lambda x: np.nanstd(x) if np.sum(~np.isnan(x)) > 5 else np.nan,
        size=window_size, mode='constant', cval=np.nan
    )
    return {'roughness': chm_roughness}


def compute_topographic_variables(dem_array, valid_mask, resolution=1.0):
    """Compute slope, TPI, roughness from DEM (H5)."""
    gy, gx = np.gradient(np.where(valid_mask, dem_array, np.nan), resolution)
    slope = np.degrees(np.arctan(np.sqrt(gx**2 + gy**2)))
    slope[~valid_mask] = np.nan
    
    window_size = 15
    local_mean = ndimage.uniform_filter(
        np.where(valid_mask, dem_array, np.nan), size=window_size, mode='constant', cval=np.nan
    )
    tpi = dem_array - local_mean
    tpi[~valid_mask] = np.nan
    
    roughness = ndimage.generic_filter(
        np.where(valid_mask, dem_array, np.nan),
        lambda x: np.nanstd(x) if np.sum(~np.isnan(x)) > 5 else np.nan,
        size=window_size, mode='constant', cval=np.nan
    )
    
    return {'elevation': dem_array, 'slope': slope, 'tpi': tpi, 'roughness': roughness}


def analyze_biotic_decoupling(chm_array, valid_mask, dem_array=None, resolution=1.0):
    """Analyze CHM-topography correlation (H5)."""
    chm_metrics = compute_chm_structure_metrics(chm_array, valid_mask, resolution)
    roughness = chm_metrics['roughness']
    
    if dem_array is not None:
        topo_vars = compute_topographic_variables(dem_array, valid_mask, resolution)
        has_dem = True
    else:
        rows, cols = chm_array.shape
        y_coords, x_coords = np.meshgrid(np.arange(rows), np.arange(cols), indexing='ij')
        topo_vars = {'x_position': x_coords / cols, 'y_position': y_coords / rows}
        has_dem = False
    
    valid_indices = np.where(valid_mask & ~np.isnan(roughness))
    n_valid = len(valid_indices[0])
    
    if n_valid < 100:
        return {'error': 'Insufficient valid pixels'}
    
    max_samples = 50000
    if n_valid > max_samples:
        idx = np.random.choice(n_valid, max_samples, replace=False)
        sample_rows, sample_cols = valid_indices[0][idx], valid_indices[1][idx]
    else:
        sample_rows, sample_cols = valid_indices[0], valid_indices[1]
    
    roughness_samples = roughness[sample_rows, sample_cols]
    correlations = {}
    
    var_names = ['elevation', 'slope', 'tpi', 'roughness'] if has_dem else ['x_position', 'y_position']
    for var_name in var_names:
        if var_name in topo_vars:
            var_data = topo_vars[var_name][sample_rows, sample_cols]
            valid_both = ~(np.isnan(roughness_samples) | np.isnan(var_data))
            if np.sum(valid_both) > 50:
                r, p = pearsonr(roughness_samples[valid_both], var_data[valid_both])
                correlations[f'chm_roughness_vs_{var_name}'] = {'r': float(r), 'p': float(p)}
    
    r_values = [abs(v['r']) for v in correlations.values() if not np.isnan(v['r'])]
    mean_abs_r = float(np.mean(r_values)) if r_values else np.nan
    
    return {'n_samples': len(roughness_samples), 'dem_available': has_dem, 
            'correlations': correlations, 'mean_abs_correlation': mean_abs_r}

In [11]:
def analyze_log_periodic_oscillations(lacunarity_values, box_sizes):
    """Test for log-periodic oscillations (H8)."""
    if len(lacunarity_values) < 5:
        return {'error': 'Insufficient scales'}
    
    log_r = np.log(box_sizes)
    log_L = np.log(lacunarity_values)
    
    coeffs = np.polyfit(log_r, log_L, 1)
    trend = np.polyval(coeffs, log_r)
    residuals = log_L - trend
    
    n = len(residuals)
    if n >= 5:
        autocorr = np.correlate(residuals, residuals, mode='full')
        autocorr = autocorr[n-1:] / autocorr[n-1]
        peaks = [(i, autocorr[i]) for i in range(1, len(autocorr) - 1) 
                 if autocorr[i] > autocorr[i-1] and autocorr[i] > autocorr[i+1]]
        has_oscillation = len(peaks) > 0 and any(p[1] > 0.3 for p in peaks)
    else:
        has_oscillation = False
    
    rms_residual = float(np.sqrt(np.mean(residuals**2)))
    return {'rms_residual': rms_residual, 'has_oscillation': has_oscillation}


def analyze_riemann_gas_density(tree_coords, valid_mask, resolution=1.0, n_bins=20):
    """Test Riemann gas statistics (H9)."""
    if len(tree_coords) < 20:
        return {'error': 'Insufficient trees'}
    
    rows, cols = valid_mask.shape
    cell_size = max(rows, cols) // n_bins
    
    densities = []
    for i in range(n_bins):
        for j in range(n_bins):
            y_start, y_end = i * cell_size, (i + 1) * cell_size
            x_start, x_end = j * cell_size, (j + 1) * cell_size
            
            in_cell = ((tree_coords[:, 0] >= y_start) & (tree_coords[:, 0] < y_end) &
                       (tree_coords[:, 1] >= x_start) & (tree_coords[:, 1] < x_end))
            n_trees = np.sum(in_cell)
            
            cell_valid = valid_mask[y_start:y_end, x_start:x_end]
            valid_area = cell_valid.sum() * resolution**2
            
            if valid_area > 0:
                density = n_trees / valid_area * 10000
                densities.append(density)
    
    densities = np.array([d for d in densities if d > 0])
    
    if len(densities) < 10:
        return {'error': 'Insufficient non-empty cells'}
    
    mean_density = np.mean(densities)
    fano_factor = float(np.var(densities) / mean_density) if mean_density > 0 else np.nan
    
    interp = 'repulsion' if fano_factor < 0.8 else ('clustering' if fano_factor > 1.2 else 'random')
    return {'n_cells': len(densities), 'mean_density_per_ha': float(mean_density),
            'fano_factor': fano_factor, 'interpretation': interp}

---

# HYPOTHESIS 1: Optimal Filling

**Prediction:** Tropical old-growth forests show higher fractal dimensions (D > 2.5) due to multi-layered canopy.

In [12]:
print("="*70)
print("HYPOTHESIS 1: OPTIMAL FILLING")
print("="*70)

D_dbc, r2_dbc, scales_dbc, Ns_dbc = differential_box_counting(chm_array, valid_mask)

print(f"\nDBC Fractal Dimension: D = {D_dbc:.4f}")
print(f"R-squared: {r2_dbc:.4f}")

if np.isnan(D_dbc):
    h1_class, h1_interp = "INSUFFICIENT_DATA", "Insufficient data"
elif D_dbc >= 2.5:
    h1_class = "OLD_GROWTH"
    h1_interp = f"D = {D_dbc:.2f} indicates OLD GROWTH (high complexity)"
elif D_dbc >= 2.3:
    h1_class = "MODERATE"
    h1_interp = f"D = {D_dbc:.2f} indicates MODERATE complexity"
else:
    h1_class = "LOW"
    h1_interp = f"D = {D_dbc:.2f} indicates LOW complexity"

print(f"\n{h1_interp}")

h1_results = {'hypothesis': 'H1: Optimal Filling', 'D_dbc': float(D_dbc) if not np.isnan(D_dbc) else None,
              'r_squared': float(r2_dbc), 'classification': h1_class, 'interpretation': h1_interp,
              'supports_hypothesis': h1_class == 'OLD_GROWTH'}

HYPOTHESIS 1: OPTIMAL FILLING

DBC Fractal Dimension: D = 2.3114
R-squared: 0.9906

D = 2.31 indicates MODERATE complexity


---

# HYPOTHESIS 2: Scale Invariance

**Prediction:** Mature tropical forests show scale-invariant gap structure (R² > 0.95).

In [13]:
print("="*70)
print("HYPOTHESIS 2: SCALE INVARIANCE")
print("="*70)

lacunarity, lac_sizes, r2_lac, lac_slope = gliding_box_lacunarity(chm_array, valid_mask)

print(f"\nLacunarity R² (log-log): {r2_lac:.4f}")
print(f"Slope: {lac_slope:.4f}")

if r2_lac > 0.95:
    h2_class = "SCALE_INVARIANT"
    h2_interp = f"R² = {r2_lac:.2f} indicates STRONG scale invariance (OLD GROWTH)"
elif r2_lac > 0.80:
    h2_class = "MODERATE"
    h2_interp = f"R² = {r2_lac:.2f} indicates MODERATE scale invariance"
else:
    h2_class = "CHARACTERISTIC_SCALES"
    h2_interp = f"R² = {r2_lac:.2f} indicates CHARACTERISTIC SCALES present"

print(f"\n{h2_interp}")

h2_results = {'hypothesis': 'H2: Scale Invariance', 'r_squared': float(r2_lac),
              'slope': float(lac_slope), 'classification': h2_class, 'interpretation': h2_interp,
              'supports_hypothesis': h2_class == 'SCALE_INVARIANT'}

HYPOTHESIS 2: SCALE INVARIANCE

Lacunarity R² (log-log): 0.9775
Slope: -0.0264

R² = 0.98 indicates STRONG scale invariance (OLD GROWTH)


---

# HYPOTHESIS 3: Zeta Distribution

**Prediction:** Gap sizes follow power-law with α ≈ 2.0, related to ζ(2) = π²/6.

In [14]:
print("="*70)
print("HYPOTHESIS 3: ZETA DISTRIBUTION")
print("="*70)

gap_areas, labeled_gaps = identify_gaps(chm_array, valid_mask, GAP_THRESHOLD)
gap_results = fit_gap_distribution(gap_areas, pixel_resolution, min_fit_area=1.0)

if 'error' in gap_results:
    print(f"Error: {gap_results['error']}")
    h3_class, h3_interp = "INSUFFICIENT_DATA", gap_results['error']
else:
    print(f"\nGaps: {gap_results['n_gaps']} (fitted: {gap_results['n_gaps_fitted']})")
    print(f"Power Law α: {gap_results['power_law']['alpha']:.4f}")
    print(f"Power Law R²: {gap_results['power_law']['r2']:.4f}")
    print(f"Exponential R²: {gap_results['exponential']['r2']:.4f}")
    print(f"\nζ(2) = π²/6 ≈ {gap_results['zeta_2']:.4f}")
    print(f"|α - 2| = {abs(gap_results['power_law']['alpha'] - 2.0):.4f}")
    
    pl_r2, alpha = gap_results['power_law']['r2'], gap_results['power_law']['alpha']
    exp_r2 = gap_results['exponential']['r2']
    
    if pl_r2 > exp_r2 and pl_r2 > 0.8:
        if 1.8 <= alpha <= 2.2:
            h3_class = "ZETA_DISTRIBUTED"
            h3_interp = f"α ≈ {alpha:.2f} close to 2.0 (OLD GROWTH signature)"
        else:
            h3_class = "POWER_LAW_MODIFIED"
            h3_interp = f"α = {alpha:.2f} deviates from 2.0 (DISTURBANCE modified)"
    else:
        h3_class = "EXPONENTIAL"
        h3_interp = f"Exponential fits better (R²={exp_r2:.2f})"
    
    print(f"\n{h3_interp}")

h3_results = {'hypothesis': 'H3: Zeta Distribution', 'n_gaps': gap_results.get('n_gaps', 0),
              'power_law_alpha': gap_results.get('power_law', {}).get('alpha'),
              'power_law_r2': gap_results.get('power_law', {}).get('r2'),
              'classification': h3_class, 'interpretation': h3_interp,
              'supports_hypothesis': h3_class == 'ZETA_DISTRIBUTED'}

HYPOTHESIS 3: ZETA DISTRIBUTION

Gaps: 172 (fitted: 172)
Power Law α: 1.9512
Power Law R²: 0.9110
Exponential R²: 0.9600

ζ(2) = π²/6 ≈ 1.6449
|α - 2| = 0.0488

Exponential fits better (R²=0.96)


---

# HYPOTHESIS 4: Universal Repulsion

**Prediction:** Emergent tree spacing follows Wigner-Dyson (GUE) distribution.

In [15]:
print("="*70)
print("HYPOTHESIS 4: UNIVERSAL REPULSION")
print("="*70)

tree_coords, tree_heights = detect_local_maxima(chm_array, valid_mask, LOCAL_MAX_WINDOW, DOMINANT_TREE_THRESHOLD)
print(f"\nEmergent trees detected: {len(tree_coords)}")
print(f"Height threshold: >= {DOMINANT_TREE_THRESHOLD}m")

if len(tree_coords) >= 10:
    print(f"Height range: {tree_heights.min():.1f} - {tree_heights.max():.1f}m")
    spacings, mean_spacing = compute_nearest_neighbor_spacing(tree_coords, pixel_resolution)
    print(f"Mean NN distance: {mean_spacing:.2f}m")
    
    spacing_results = fit_spacing_distribution(spacings)
    print(f"\nWigner-Dyson KS: {spacing_results['ks_wigner_dyson']:.4f} (p={spacing_results['p_wigner_dyson']:.4f})")
    print(f"Poisson KS: {spacing_results['ks_poisson']:.4f} (p={spacing_results['p_poisson']:.4f})")
    
    ks_wd, ks_po = spacing_results['ks_wigner_dyson'], spacing_results['ks_poisson']
    if ks_wd < ks_po and spacing_results['p_wigner_dyson'] > 0.05:
        h4_class = "WIGNER_DYSON"
        h4_interp = f"Spacing follows Wigner-Dyson (p={spacing_results['p_wigner_dyson']:.3f}) -> REPULSION"
    elif ks_po < ks_wd and spacing_results['p_poisson'] > 0.05:
        h4_class = "POISSON"
        h4_interp = f"Spacing follows Poisson (p={spacing_results['p_poisson']:.3f}) -> RANDOM"
    else:
        h4_class = "NEITHER"
        h4_interp = "Neither distribution fits well -> Complex structure"
else:
    spacing_results = {'error': 'Insufficient trees'}
    h4_class, h4_interp = "INSUFFICIENT_DATA", f"Only {len(tree_coords)} trees detected"
    mean_spacing = np.nan
    spacings = np.array([])

print(f"\n{h4_interp}")

h4_results = {'hypothesis': 'H4: Universal Repulsion', 'n_trees': len(tree_coords),
              'mean_spacing_m': float(mean_spacing) if not np.isnan(mean_spacing) else None,
              'classification': h4_class, 'interpretation': h4_interp,
              'supports_hypothesis': h4_class == 'WIGNER_DYSON'}

HYPOTHESIS 4: UNIVERSAL REPULSION

Emergent trees detected: 721
Height threshold: >= 35.0m
Height range: 35.0 - 54.9m
Mean NN distance: 16.72m

Wigner-Dyson KS: 0.1524 (p=0.0000)
Poisson KS: 0.2951 (p=0.0000)

Neither distribution fits well -> Complex structure


---

# HYPOTHESIS 5: Biotic Decoupling

**Prediction:** Mature tropical forests show weak correlation (|r| < 0.3) between canopy and topography.

In [16]:
print("="*70)
print("HYPOTHESIS 5: BIOTIC DECOUPLING")
print("="*70)

# Load DEM if available
dem_array = None
if DTM_PATH.exists():
    print(f"\nLoading DTM from: {DTM_PATH.name}")
    dem_full = rxr.open_rasterio(DTM_PATH, masked=True)
    
    if USE_SUBSET:
        dem = dem_full[:, center_y - half_size : center_y + half_size,
                       center_x - half_size : center_x + half_size]
    else:
        dem = dem_full
    
    dem_array = dem.values.squeeze().astype(np.float64)
    
    if dem_array.shape != chm_array.shape:
        print(f"Resampling DEM from {dem_array.shape} to match CHM {chm_array.shape}")
        dem = dem.rio.reproject_match(chm)
        dem_array = dem.values.squeeze().astype(np.float64)
    
    dem_nodata = dem.rio.nodata if dem.rio.nodata is not None else NODATA_VALUE
    dem_valid_mask = ~np.isnan(dem_array) & (dem_array != dem_nodata)
    combined_mask = valid_mask & dem_valid_mask
    print(f"DEM range: {dem_array[dem_valid_mask].min():.1f} - {dem_array[dem_valid_mask].max():.1f} m")
else:
    print("\nNo DEM available - using position metrics")
    combined_mask = valid_mask

topo_results = analyze_biotic_decoupling(chm_array, combined_mask, dem_array, pixel_resolution)

if 'error' in topo_results:
    print(f"Error: {topo_results['error']}")
    h5_class, h5_interp = "INSUFFICIENT_DATA", topo_results['error']
else:
    print(f"\nSamples: {topo_results['n_samples']:,}")
    print(f"DEM used: {topo_results['dem_available']}")
    print("\nCorrelations:")
    for var, stats in topo_results['correlations'].items():
        sig = "***" if stats['p'] < 0.001 else "**" if stats['p'] < 0.01 else "*" if stats['p'] < 0.05 else ""
        print(f"  {var}: r = {stats['r']:+.4f} {sig}")
    
    mean_r = topo_results['mean_abs_correlation']
    print(f"\nMean |r|: {mean_r:.4f}")
    
    if mean_r < 0.3:
        h5_class = "DECOUPLED"
        h5_interp = f"Mean |r| = {mean_r:.2f} indicates BIOTIC DECOUPLING (OLD GROWTH)"
    elif mean_r < 0.6:
        h5_class = "PARTIAL"
        h5_interp = f"Mean |r| = {mean_r:.2f} indicates PARTIAL coupling"
    else:
        h5_class = "COUPLED"
        h5_interp = f"Mean |r| = {mean_r:.2f} indicates strong TOPOGRAPHIC control"
    
    print(f"\n{h5_interp}")

h5_results = {'hypothesis': 'H5: Biotic Decoupling', 'dem_available': topo_results.get('dem_available', False),
              'mean_abs_correlation': topo_results.get('mean_abs_correlation'),
              'classification': h5_class, 'interpretation': h5_interp,
              'supports_hypothesis': h5_class == 'DECOUPLED'}

HYPOTHESIS 5: BIOTIC DECOUPLING

Loading DTM from: BCI_whole_2023_05_26_dtm.tif
DEM range: 39.0 - 166.8 m

Samples: 50,000
DEM used: True

Correlations:
  chm_roughness_vs_elevation: r = -0.1156 ***
  chm_roughness_vs_slope: r = +0.0692 ***
  chm_roughness_vs_roughness: r = +0.0735 ***

Mean |r|: 0.0861

Mean |r| = 0.09 indicates BIOTIC DECOUPLING (OLD GROWTH)


---

# HYPOTHESES 6-9: Spatial Distribution (Fractal String Theory)

In [17]:
print("="*70)
print("HYPOTHESES 6-9: SPATIAL DISTRIBUTION")
print("="*70)

# H6: Fractal String Gap
print("\n--- H6: Fractal String Gap ---")
if 'power_law' in gap_results and gap_results['power_law']['r2'] > 0.8:
    alpha = gap_results['power_law']['alpha']
    h6_class = "FRACTAL_STRING" if 1.5 <= alpha <= 2.5 else "NON_FRACTAL"
    h6_interp = f"α = {alpha:.2f} {'consistent with' if h6_class == 'FRACTAL_STRING' else 'outside'} fractal string predictions"
else:
    h6_class, h6_interp = "NO_POWER_LAW", "Gap distribution does not follow power law"
print(h6_interp)
h6_results = {'hypothesis': 'H6: Fractal String Gap', 'classification': h6_class,
              'interpretation': h6_interp, 'supports_hypothesis': h6_class == 'FRACTAL_STRING'}

# H7: Prime Number Repulsion
print("\n--- H7: Prime Number Repulsion (GUE) ---")
if h4_class == 'WIGNER_DYSON':
    h7_class, h7_interp = "GUE_REPULSION", "Tree spacing consistent with GUE statistics"
elif h4_class == 'REPULSION_PARTIAL':
    h7_class, h7_interp = "PARTIAL_REPULSION", "Partial evidence for prime-like repulsion"
else:
    h7_class, h7_interp = "NO_REPULSION", "Tree spacing does not show prime-like repulsion"
print(h7_interp)
h7_results = {'hypothesis': 'H7: Prime Number Repulsion', 'classification': h7_class,
              'interpretation': h7_interp, 'supports_hypothesis': h7_class == 'GUE_REPULSION'}

# H8: Complex Dimension Oscillation
print("\n--- H8: Complex Dimension Oscillation ---")
if len(lacunarity) >= 5:
    osc_results = analyze_log_periodic_oscillations(lacunarity, lac_sizes)
    if 'error' not in osc_results:
        if osc_results['has_oscillation']:
            h8_class = "OSCILLATION_PRESENT"
            h8_interp = f"Log-periodic oscillations detected (RMS = {osc_results['rms_residual']:.4f})"
        else:
            h8_class = "NO_OSCILLATION"
            h8_interp = f"No log-periodic oscillations (RMS = {osc_results['rms_residual']:.4f})"
    else:
        h8_class, h8_interp = "INSUFFICIENT_DATA", osc_results['error']
else:
    h8_class, h8_interp = "INSUFFICIENT_DATA", "Insufficient lacunarity data"
print(h8_interp)
h8_results = {'hypothesis': 'H8: Complex Dimension Oscillation', 'classification': h8_class,
              'interpretation': h8_interp, 'supports_hypothesis': h8_class == 'OSCILLATION_PRESENT'}

# H9: Riemann Gas Density
print("\n--- H9: Riemann Gas Density ---")
if len(tree_coords) >= 20:
    riemann_results = analyze_riemann_gas_density(tree_coords, valid_mask, pixel_resolution)
    if 'error' not in riemann_results:
        fano = riemann_results['fano_factor']
        interp = riemann_results['interpretation']
        if interp == 'repulsion':
            h9_class = "RIEMANN_GAS"
            h9_interp = f"Fano = {fano:.3f} < 1 indicates repulsion (Riemann gas)"
        elif interp == 'clustering':
            h9_class = "CLUSTERING"
            h9_interp = f"Fano = {fano:.3f} > 1 indicates clustering (not Riemann gas)"
        else:
            h9_class = "POISSON"
            h9_interp = f"Fano = {fano:.3f} ≈ 1 indicates random/Poisson"
    else:
        h9_class, h9_interp = "INSUFFICIENT_DATA", riemann_results['error']
        fano = np.nan
else:
    h9_class, h9_interp = "INSUFFICIENT_DATA", f"Only {len(tree_coords)} trees"
    fano = np.nan
print(h9_interp)
h9_results = {'hypothesis': 'H9: Riemann Gas Density', 'classification': h9_class,
              'fano_factor': float(fano) if not np.isnan(fano) else None,
              'interpretation': h9_interp, 'supports_hypothesis': h9_class == 'RIEMANN_GAS'}

HYPOTHESES 6-9: SPATIAL DISTRIBUTION

--- H6: Fractal String Gap ---
α = 1.95 consistent with fractal string predictions

--- H7: Prime Number Repulsion (GUE) ---
Tree spacing does not show prime-like repulsion

--- H8: Complex Dimension Oscillation ---
No log-periodic oscillations (RMS = 0.0056)

--- H9: Riemann Gas Density ---
Fano = 5.056 > 1 indicates clustering (not Riemann gas)


---

# COMPREHENSIVE SUMMARY

In [18]:
all_results = [h1_results, h2_results, h3_results, h4_results, h5_results,
               h6_results, h7_results, h8_results, h9_results]

print("="*80)
print("           COMPREHENSIVE HYPOTHESIS TESTING SUMMARY - BCI TROPICAL FOREST")
print("="*80)
print(f"\nSite: {SITE_DESCRIPTION}")
print(f"Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Subset: {USE_SUBSET} ({SUBSET_SIZE}x{SUBSET_SIZE} pixels)")
print(f"Valid pixels: {valid_mask.sum():,}")

print("\n" + "-"*80)
print("PRIMARY HYPOTHESES (1-5)")
print("-"*80)

for i, result in enumerate(all_results[:5], 1):
    status = "SUPPORTED" if result['supports_hypothesis'] else "NOT SUPPORTED"
    print(f"\nH{i}: {result['hypothesis'].split(': ')[1]}")
    print(f"    Classification: {result['classification']}")
    print(f"    {result['interpretation']}")
    print(f"    Status: {status}")

print("\n" + "-"*80)
print("SPATIAL DISTRIBUTION HYPOTHESES (6-9)")
print("-"*80)

for i, result in enumerate(all_results[5:], 6):
    status = "SUPPORTED" if result['supports_hypothesis'] else "NOT SUPPORTED"
    print(f"\nH{i}: {result['hypothesis'].split(': ')[1]}")
    print(f"    Classification: {result['classification']}")
    print(f"    {result['interpretation']}")
    print(f"    Status: {status}")

# Overall assessment
supported = sum(1 for r in all_results if r['supports_hypothesis'])
primary_supported = sum(1 for r in all_results[:5] if r['supports_hypothesis'])

print("\n" + "="*80)
print("OVERALL ASSESSMENT")
print("="*80)
print(f"\nPrimary hypotheses supported: {primary_supported}/5")
print(f"All hypotheses supported: {supported}/9")

if primary_supported >= 4:
    forest_class = "OLD-GROWTH / MATURE TROPICAL FOREST"
    desc = "Strong evidence for self-organized, mature tropical forest structure."
elif primary_supported >= 2:
    forest_class = "RECOVERING / TRANSITIONAL FOREST"
    desc = "Mixed evidence suggesting forest transition toward old-growth."
else:
    forest_class = "YOUNG / DISTURBED FOREST"
    desc = "Limited support for old-growth hypotheses."

print(f"\nFOREST CLASSIFICATION: {forest_class}")
print(f"\n{desc}")

print("\n" + "-"*80)
print("KEY METRICS")
print("-"*80)
print(f"  DBC Fractal Dimension (D):     {D_dbc:.4f}" + (" [HIGH]" if D_dbc >= 2.5 else ""))
print(f"  Lacunarity R² (scale inv.):    {r2_lac:.4f}" + (" [SCALE INVARIANT]" if r2_lac > 0.95 else ""))
alpha_val = gap_results.get('power_law', {}).get('alpha', np.nan)
print(f"  Gap power-law exponent (α):    {alpha_val:.4f}" + (" [NEAR ζ(2)]" if 1.8 <= alpha_val <= 2.2 else ""))
print(f"  Tree spacing:                  {h4_class}")
mean_corr = topo_results.get('mean_abs_correlation', np.nan)
print(f"  Topographic correlation:       {mean_corr:.4f}" + (" [DECOUPLED]" if mean_corr < 0.3 else ""))

           COMPREHENSIVE HYPOTHESIS TESTING SUMMARY - BCI TROPICAL FOREST

Site: Barro Colorado Island - Tropical Moist Forest
Analysis Date: 2025-12-25 19:20:52
Subset: True (2000x2000 pixels)
Valid pixels: 3,999,996

--------------------------------------------------------------------------------
PRIMARY HYPOTHESES (1-5)
--------------------------------------------------------------------------------

H1: Optimal Filling
    Classification: MODERATE
    D = 2.31 indicates MODERATE complexity
    Status: NOT SUPPORTED

H2: Scale Invariance
    Classification: SCALE_INVARIANT
    R² = 0.98 indicates STRONG scale invariance (OLD GROWTH)
    Status: SUPPORTED

H3: Zeta Distribution
    Classification: EXPONENTIAL
    Exponential fits better (R²=0.96)
    Status: NOT SUPPORTED

H4: Universal Repulsion
    Classification: NEITHER
    Neither distribution fits well -> Complex structure
    Status: NOT SUPPORTED

H5: Biotic Decoupling
    Classification: DECOUPLED
    Mean |r| = 0.09 indicat

In [19]:
# Save results to JSON
output_data = {
    'site': {'name': SITE_NAME, 'description': SITE_DESCRIPTION,
             'subset_used': USE_SUBSET, 'subset_size': SUBSET_SIZE,
             'resolution_m': float(pixel_resolution),
             'analysis_date': datetime.now().isoformat()},
    'height_stats': {'mean_m': float(np.mean(valid_heights)), 'max_m': float(np.max(valid_heights)),
                     'p95_m': float(np.percentile(valid_heights, 95))},
    'hypotheses': {f'H{i+1}': r for i, r in enumerate(all_results)},
    'summary': {'primary_supported': primary_supported, 'total_supported': supported,
                'forest_classification': forest_class}
}

results_path = OUTPUT_DIR / f"{SITE_NAME}_hypothesis_results.json"
with open(results_path, 'w') as f:
    json.dump(output_data, f, indent=2)

print(f"\nResults saved to: {results_path}")


Results saved to: /home/jovyan/data-store/data/output/smithsonian/analysis/fractal_hypotheses/bci_panama_hypothesis_results.json


In [20]:
# Cleanup
chm_full.close()
if USE_SUBSET:
    chm.close()
if dem_array is not None:
    dem_full.close()

import gc
gc.collect()
print("Memory cleaned up.")

Memory cleaned up.


---

## References

1. **Sarkar & Chaudhuri (1994)** - Differential Box Counting
2. **Plotnick et al. (1996)** - Lacunarity analysis
3. **Clark & Evans (1954)** - Nearest neighbor analysis
4. **Wigner-Dyson** - Random matrix theory
5. **BCI Forest Dynamics** - https://forestgeo.si.edu/sites/neotropics/barro-colorado-island
6. **Smithsonian ALS 2023** - https://smithsonian.dataone.org/datasets/ALS_Panama_2023/