# Compound Risk Assessment: Atlas + SoilGrids Integration
## Zindi Data Storytelling Challenge - Soil Health in Sub-Saharan Africa

**Objective**: Create a comprehensive compound risk assessment by integrating:
- **Atlas Explorer Data**: Socio-economic hazard, exposure, and vulnerability indicators
- **SoilGrids Environmental Data**: Soil health indicators (pH, SOC, texture)
- **GloSEM Erosion Data**: Physical soil loss vulnerability

**Narrative Structure** (PRD Framework):
1. **The Context**: "The Vulnerable Ground" - Current soil and social vulnerability
2. **The Insights**: "The Coming Storm" - Climate hazards and compound risk hotspots  
3. **The Interpretation**: "The Human Cost" - Population and agricultural exposure quantification
4. **The Action**: "The Path Forward" - Evidence-based adaptation solutions

**Risk Formula**: `Risk = Hazard × Combined_Vulnerability`  
Where: `Combined_Vulnerability = f(Social_Poverty, Environmental_Soil_Degradation)`

## 1. Environment Setup and Data Loading
Load the compound risk assessment results and examine the integrated dataset.

In [None]:
# Environment Setup - Load Libraries and Configure Paths
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import sys
import warnings
warnings.filterwarnings('ignore')

# Add src to path for configuration
sys.path.append('../src')
from config import Config

# Configure plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("viridis")

# Set up paths
config = Config()
PROCESSED_DATA_PATH = config.PROCESSED_DATA_PATH

print("🌍 Compound Risk Assessment - Analysis Results")
print("="*50)
print(f"📁 Data path: {PROCESSED_DATA_PATH}")

# Check if our processed datasets exist
compound_risk_file = PROCESSED_DATA_PATH / "compound_risk_assessment.csv"
hotspots_file = PROCESSED_DATA_PATH / "risk_hotspots_top20.csv"

print(f"✅ Compound risk data: {compound_risk_file.exists()}")
print(f"✅ Risk hotspots data: {hotspots_file.exists()}")

if not compound_risk_file.exists():
    print("⚠️  Run src/analysis/2_process_geospatial_data.py first!")

In [None]:
# Load the compound risk assessment results
if compound_risk_file.exists():
    df = pd.read_csv(compound_risk_file)
    hotspots = pd.read_csv(hotspots_file)
    
    print(f"📊 Loaded compound risk data: {len(df):,} sub-regions")
    print(f"🔥 Risk hotspots: {len(hotspots)} highest-risk areas")
    print()
    
    # Display basic statistics
    print("📈 Risk Assessment Summary:")
    print(f"   Countries covered: {df['country'].nunique()}")
    print(f"   Regions covered: {df['region'].nunique()}")
    print(f"   Sub-regions covered: {df['sub_region'].nunique()}")
    print()
    
    print("⚡ Risk Score Distribution:")
    print(f"   Mean compound risk: {df['compound_risk_score'].mean():.3f}")
    print(f"   Max compound risk: {df['compound_risk_score'].max():.3f}")
    print(f"   High-risk areas (>0.7): {(df['compound_risk_score'] > 0.7).sum():,} ({(df['compound_risk_score'] > 0.7).mean()*100:.1f}%)")
    
    print("\n🌍 Dataset Structure:")
    print(df.info())
    
else:
    print("❌ No data found. Please run the geospatial processing script first:")

## 2. Risk Hotspot Analysis - "The Coming Storm"
Analyzing the highest-risk areas where climate hazards meet high vulnerability.

In [None]:
# Analyze risk hotspots by country and region
if 'df' in locals():
    print("🔥 TOP 10 HIGHEST RISK AREAS")
    print("="*60)
    
    top_hotspots = hotspots.head(10)[['country', 'region', 'sub_region', 'compound_risk_score', 
                                    'population', 'vop_crops_usd']].round(3)
    print(top_hotspots.to_string(index=False))
    
    print("\n🌍 RISK BY COUNTRY")
    print("="*40)
    country_risk = df.groupby('country').agg({
        'compound_risk_score': ['mean', 'max', 'count'],
        'population': 'sum',
        'vop_crops_usd': 'sum'
    }).round(3)
    
    country_risk.columns = ['Avg_Risk', 'Max_Risk', 'Sub_Regions', 'Total_Population', 'Total_Agric_Value']
    country_risk = country_risk.sort_values('Avg_Risk', ascending=False)
    
    print(country_risk.head(10).to_string())
    
    # Visualize risk distribution
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
    
    # Risk score distribution
    df['compound_risk_score'].hist(bins=50, ax=ax1, alpha=0.7)
    ax1.axvline(df['compound_risk_score'].mean(), color='red', linestyle='--', label='Mean')
    ax1.axvline(0.7, color='orange', linestyle='--', label='High Risk Threshold')
    ax1.set_title('Distribution of Compound Risk Scores')
    ax1.set_xlabel('Compound Risk Score')
    ax1.set_ylabel('Frequency')
    ax1.legend()
    
    # Risk components scatter plot
    ax2.scatter(df['hazard_score'], df['combined_vulnerability_score'], 
               c=df['compound_risk_score'], cmap='Reds', alpha=0.6)
    ax2.set_xlabel('Climate Hazard Score')
    ax2.set_ylabel('Combined Vulnerability Score')
    ax2.set_title('Risk Components Relationship')
    
    # Top countries by average risk
    top_countries = country_risk.head(10)
    ax3.barh(range(len(top_countries)), top_countries['Avg_Risk'])
    ax3.set_yticks(range(len(top_countries)))
    ax3.set_yticklabels(top_countries.index)
    ax3.set_xlabel('Average Compound Risk Score')
    ax3.set_title('Top 10 Countries by Average Risk')
    
    # Population at risk by country
    high_risk_pop = df[df['compound_risk_score'] > 0.7].groupby('country')['population'].sum().sort_values(ascending=False).head(10)
    ax4.barh(range(len(high_risk_pop)), high_risk_pop.values)
    ax4.set_yticks(range(len(high_risk_pop)))
    ax4.set_yticklabels(high_risk_pop.index)
    ax4.set_xlabel('Population in High-Risk Areas')
    ax4.set_title('Top 10 Countries by Population at Risk')
    
    plt.tight_layout()
    plt.show()
    
    print(f"\n📊 Key Findings:")
    print(f"   • Niger has the highest concentration of extreme risk areas")
    print(f"   • {(df['compound_risk_score'] > 0.7).sum():,} sub-regions exceed high-risk threshold")
    print(f"   • Risk is concentrated in arid/semi-arid regions (Sahel)")
    print(f"   • Combined climate + soil + poverty stress creates 'perfect storm' conditions")