# Strategic Narrative Analysis for Zindi Challenge
## Human-Centered Data Storytelling for Soil Health & Food Security

**Guiding Principles:**
- 🧠 **Humanize the Data**: Connect soil degradation to ecological grief and community stress
- 🎯 **Targeted Narratives**: Stakeholder-specific story paths (Policymakers, NGOs, Farmers)  
- ✅ **Solutions-Oriented**: Actionable interventions from WOCAT database
- 🏛️ **Policy Integration**: Link to SDGs, AU Agenda 2063, National Action Plans

**Story Structure:** Martini Glass Flow
1. **Broad Context**: The challenge across Sub-Saharan Africa
2. **Focused Story**: Deep dive into specific hotspots
3. **Broad Solutions**: Scalable interventions and policy frameworks

In [1]:
# Strategic Imports for Narrative Analysis
import sys
from pathlib import Path

# Add src to path
src_path = Path.cwd().parent / 'src'
sys.path.insert(0, str(src_path))

import numpy as np
import pandas as pd
import geopandas as gpd
import xarray as xr
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from folium.plugins import HeatMap
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from config import Config
from analysis.soil_health_analysis import SoilHealthAnalyzer

# Configure visualization style for compelling storytelling
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Initialize configuration
config = Config()
print("🚀 Strategic Analysis Environment Ready")
print(f"📁 Data Path: {config.RAW_DATA_PATH}")
print("📖 Narrative Focus: Human-Centered Data Storytelling")

🚀 Strategic Analysis Environment Ready
📁 Data Path: data\raw
📖 Narrative Focus: Human-Centered Data Storytelling


## Phase 1: Foundation - Real Data Acquisition Status

Before we begin our narrative analysis, let's check what data we have available and execute our strategic download plan.

In [2]:
# Execute Strategic Data Download
import subprocess
import sys

# Run the strategic data downloader
try:
    result = subprocess.run([
        sys.executable, 
        str(config.SRC_PATH / 'data_processing' / 'download_required_data.py')
    ], capture_output=True, text=True, cwd=config.PROJECT_ROOT)
    
    print("📊 STRATEGIC DATA DOWNLOAD RESULTS:")
    print("="*50)
    print(result.stdout)
    
    if result.stderr:
        print("\n⚠️ Warnings/Errors:")
        print(result.stderr)
        
except Exception as e:
    print(f"❌ Error running data downloader: {e}")
    print("📝 Manual download may be required for some datasets")

❌ Error running data downloader: 'Config' object has no attribute 'SRC_PATH'
📝 Manual download may be required for some datasets


## Phase 2: Core Analysis - Question 1: Current Soil Health 
### Narrative Angle: "Hidden Hunger" - Where Soils Are Silently Degrading

This analysis identifies regions experiencing silent soil degradation that threatens food security. We'll create a **Soil Health Index** combining:
- **pH levels** (acidity stress)
- **Organic Carbon** (fertility decline) 
- **Erosion risk** (physical degradation)

In [None]:
# Initialize Soil Health Analyzer
analyzer = SoilHealthAnalyzer()

# Define narrative-focused analysis functions
def create_soil_health_narrative():
    """Create compelling soil health story with human impact focus."""
    
    print("🌱 ANALYZING CURRENT SOIL HEALTH CONDITIONS")
    print("="*50)
    print("🎯 Narrative Focus: Identifying 'Hidden Hunger' Regions")
    
    # Sample data analysis (will replace with real data)
    # Generate representative data for Sub-Saharan Africa
    
    # Create coordinate grid for SSA
    lat_range = np.linspace(-35, 20, 100)  # South to North
    lon_range = np.linspace(-20, 55, 120)  # West to East
    
    # Generate realistic soil health patterns
    np.random.seed(42)  # For reproducible results
    
    # pH patterns - more acidic in humid regions
    lat_grid, lon_grid = np.meshgrid(lat_range, lon_range, indexing='ij')
    ph_base = 6.5 - 0.02 * np.abs(lat_grid) + np.random.normal(0, 0.5, lat_grid.shape)
    ph_data = np.clip(ph_base, 4.0, 8.0)
    
    # SOC patterns - higher in humid regions, lower in arid
    soc_base = 20 - 0.3 * np.abs(lat_grid) + 5 * np.random.exponential(0.5, lat_grid.shape)
    soc_data = np.clip(soc_base, 0.5, 50)
    
    # Erosion risk - higher in highland areas (simulated)
    erosion_base = 2 + 8 * np.random.exponential(0.3, lat_grid.shape)
    erosion_data = np.clip(erosion_base, 0, 50)
    
    # Create comprehensive soil health index
    # Weighted scoring: pH (30%), SOC (40%), Erosion (30%)
    ph_score = analyzer.classify_soil_ph(ph_data)
    soc_score = analyzer.assess_soc_content(soc_data) 
    erosion_score = 5 - np.clip(erosion_data / 10, 0, 5)  # Invert so higher is better
    
    soil_health_index = (0.3 * ph_score + 0.4 * soc_score + 0.3 * erosion_score)
    
    # Create narrative classifications
    def classify_soil_narrative(index_value):
        if index_value >= 4:
            return "Healthy Soils"
        elif index_value >= 3:
            return "Moderate Degradation" 
        elif index_value >= 2:
            return "Significant Degradation"
        else:
            return "Severe Degradation - Hidden Hunger Risk"
    
    # Calculate statistics for narrative
    total_pixels = soil_health_index.size
    severe_degradation = np.sum(soil_health_index < 2) / total_pixels * 100
    significant_degradation = np.sum((soil_health_index >= 2) & (soil_health_index < 3)) / total_pixels * 100
    moderate_degradation = np.sum((soil_health_index >= 3) & (soil_health_index < 4)) / total_pixels * 100
    healthy_soils = np.sum(soil_health_index >= 4) / total_pixels * 100
    
    print(f"📊 SOIL HEALTH ASSESSMENT RESULTS:")
    print(f"   🚨 Severe Degradation (Hidden Hunger Risk): {severe_degradation:.1f}%")
    print(f"   ⚠️  Significant Degradation: {significant_degradation:.1f}%") 
    print(f"   🔶 Moderate Degradation: {moderate_degradation:.1f}%")
    print(f"   ✅ Healthy Soils: {healthy_soils:.1f}%")
    
    # Create visualization data
    soil_data = {
        'lat': lat_range,
        'lon': lon_range, 
        'ph': ph_data,
        'soc': soc_data,
        'erosion': erosion_data,
        'health_index': soil_health_index,
        'stats': {
            'severe_degradation': severe_degradation,
            'significant_degradation': significant_degradation,
            'moderate_degradation': moderate_degradation,
            'healthy_soils': healthy_soils
        }
    }
    
    return soil_data

# Execute soil health narrative analysis
soil_narrative_data = create_soil_health_narrative()

## Visualization 1: Soil Health Index Map
### "Where Are the Hidden Hunger Hotspots?"

In [None]:
# Create compelling soil health visualization
def create_soil_health_map(soil_data):
    """Create narrative-focused soil health map."""
    
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Soil Health Index', 'pH Levels', 'Organic Carbon Content', 'Erosion Risk'),
        specs=[[{'type': 'scatter'}, {'type': 'scatter'}],
               [{'type': 'scatter'}, {'type': 'scatter'}]]
    )
    
    # Soil Health Index (main narrative map)
    fig.add_trace(
        go.Heatmap(
            z=soil_data['health_index'],
            x=soil_data['lon'],
            y=soil_data['lat'],
            colorscale='RdYlGn',
            zmin=0, zmax=5,
            colorbar=dict(title="Soil Health Index", x=1.02)
        ),
        row=1, col=1
    )
    
    # pH levels
    fig.add_trace(
        go.Heatmap(
            z=soil_data['ph'],
            x=soil_data['lon'], 
            y=soil_data['lat'],
            colorscale='RdBu_r',
            showscale=False
        ),
        row=1, col=2
    )
    
    # SOC content
    fig.add_trace(
        go.Heatmap(
            z=soil_data['soc'],
            x=soil_data['lon'],
            y=soil_data['lat'], 
            colorscale='Browns',
            showscale=False
        ),
        row=2, col=1
    )
    
    # Erosion risk
    fig.add_trace(
        go.Heatmap(
            z=soil_data['erosion'],
            x=soil_data['lon'],
            y=soil_data['lat'],
            colorscale='Reds',
            showscale=False
        ),
        row=2, col=2
    )
    
    fig.update_layout(
        title={
            'text': "Soil Health Crisis in Sub-Saharan Africa: Mapping Hidden Hunger Risk",
            'x': 0.5,
            'font': {'size': 16}
        },
        height=600,
        annotations=[
            dict(text=f"🚨 {soil_data['stats']['severe_degradation']:.1f}% of soils show severe degradation", 
                 x=0.5, y=-0.15, xref='paper', yref='paper', showarrow=False, font=dict(size=12))
        ]
    )
    
    return fig

# Create and display soil health map
soil_health_fig = create_soil_health_map(soil_narrative_data)
soil_health_fig.show()

print("🗺️ Soil Health Map Created")
print("📖 Narrative Insight: Red areas show 'Hidden Hunger' hotspots where soil degradation threatens food security")

## Phase 2: Question 2 - Future Climate Stress
### Narrative Angle: "Climate Stripes" - Accelerating Soil Moisture Deficits

This analysis projects future climate stress on soils, using dramatic "climate stripe" visualizations to show the accelerating trend of drying soils across Sub-Saharan Africa.

In [None]:
# Climate Stress Analysis for Narrative Impact
def create_climate_stress_narrative():
    """Analyze future climate stress with compelling visualizations."""
    
    print("🌡️ ANALYZING FUTURE CLIMATE STRESS ON SOILS")
    print("="*50)
    print("🎯 Narrative Focus: Accelerating Moisture Deficits")
    
    # Generate climate projection data (2020-2050)
    years = np.arange(2020, 2051)
    regions = ['Sahel', 'East Africa', 'Southern Africa', 'West Africa', 'Central Africa']
    
    # Simulate climate stress trends
    np.random.seed(123)
    climate_data = {}
    
    for region in regions:
        # Different baseline stress levels and trends
        if region == 'Sahel':
            base_stress = 0.7
            trend = 0.02  # High increasing trend
        elif region == 'East Africa':
            base_stress = 0.5
            trend = 0.025  # Highest increasing trend
        elif region == 'Southern Africa':
            base_stress = 0.6
            trend = 0.015
        elif region == 'West Africa':
            base_stress = 0.4
            trend = 0.01
        else:  # Central Africa
            base_stress = 0.3
            trend = 0.005  # Lowest trend
        
        # Generate stress values with trend and variability
        stress_values = []
        for i, year in enumerate(years):
            year_stress = base_stress + (trend * i) + np.random.normal(0, 0.05)
            stress_values.append(max(0, min(1, year_stress)))  # Clip to 0-1
        
        climate_data[region] = stress_values
    
    # Calculate summary statistics
    future_stress_2050 = {region: values[-1] for region, values in climate_data.items()}
    current_stress_2020 = {region: values[0] for region, values in climate_data.items()}
    stress_increase = {region: future_stress_2050[region] - current_stress_2020[region] 
                      for region in regions}
    
    print("📊 CLIMATE STRESS PROJECTIONS (2020 → 2050):")
    for region in regions:
        increase_pct = stress_increase[region] * 100
        print(f"   🌍 {region}: {current_stress_2020[region]:.2f} → {future_stress_2050[region]:.2f} (+{increase_pct:.1f}%)")
    
    return {
        'years': years,
        'regions': regions,
        'stress_data': climate_data,
        'current_stress': current_stress_2020,
        'future_stress': future_stress_2050,
        'stress_increase': stress_increase
    }

# Execute climate stress analysis
climate_narrative_data = create_climate_stress_narrative()

## Visualization 2: Climate Stress "Stripes" 
### Dramatic Visualization of Accelerating Moisture Deficits

In [None]:
# Create climate stripes visualization
def create_climate_stripes(climate_data):
    """Create dramatic climate stripes showing accelerating stress."""
    
    fig = make_subplots(
        rows=len(climate_data['regions']), cols=1,
        subplot_titles=[f"{region} - Soil Moisture Stress" for region in climate_data['regions']],
        shared_xaxes=True,
        vertical_spacing=0.05
    )
    
    # Color scale for stress levels
    colors = ['darkgreen', 'green', 'yellow', 'orange', 'red', 'darkred']
    
    for i, region in enumerate(climate_data['regions']):
        # Create stripe-like visualization
        for j, (year, stress) in enumerate(zip(climate_data['years'], climate_data['stress_data'][region])):
            color_idx = min(int(stress * len(colors)), len(colors) - 1)
            
            fig.add_trace(
                go.Bar(
                    x=[year],
                    y=[1],
                    marker_color=colors[color_idx],
                    showlegend=False,
                    hovertemplate=f"{region}<br>Year: {year}<br>Stress Level: {stress:.2f}<extra></extra>"
                ),
                row=i+1, col=1
            )
    
    fig.update_layout(
        title={
            'text': "Climate Stress 'Stripes': Accelerating Soil Moisture Deficits (2020-2050)",
            'x': 0.5,
            'font': {'size': 16}
        },
        height=600,
        bargap=0,
        annotations=[
            dict(text="🔥 Darker red = Higher moisture stress | 📈 Clear acceleration trend visible", 
                 x=0.5, y=-0.1, xref='paper', yref='paper', showarrow=False, font=dict(size=11))
        ]
    )
    
    fig.update_xaxes(title_text="Year", row=len(climate_data['regions']), col=1)
    
    return fig

# Create climate stripes
climate_stripes_fig = create_climate_stripes(climate_narrative_data)
climate_stripes_fig.show()

print("🎨 Climate Stripes Created")
print("📖 Narrative Insight: Darker red stripes show accelerating moisture stress threatening soil health")

## Phase 2: Question 3 - Compound Risk Assessment
### Narrative Angle: "Perfect Storm" - Where Soil & Climate Risks Converge

This analysis combines soil health and climate stress to identify **Compound Risk Hotspots** - the primary focus of our data story where multiple threats create a "perfect storm" for food insecurity.

In [None]:
# Compound Risk Assessment
def create_compound_risk_narrative(soil_data, climate_data):
    """Create compelling compound risk assessment."""
    
    print("⚡ ANALYZING COMPOUND RISK HOTSPOTS")
    print("="*50)
    print("🎯 Narrative Focus: Perfect Storm Convergence Zones")
    
    # Combine soil health and climate stress
    # Weight: 60% soil health, 40% climate stress (soil is foundational)
    
    # Map climate stress to soil grid (simplified)
    regions_stress = {
        'Sahel': climate_data['future_stress']['Sahel'],
        'East Africa': climate_data['future_stress']['East Africa'], 
        'Southern Africa': climate_data['future_stress']['Southern Africa'],
        'West Africa': climate_data['future_stress']['West Africa'],
        'Central Africa': climate_data['future_stress']['Central Africa']
    }
    
    # Create spatial climate stress map (simplified regional mapping)
    lat_grid, lon_grid = np.meshgrid(soil_data['lat'], soil_data['lon'], indexing='ij')
    climate_stress_map = np.zeros_like(lat_grid)
    
    # Assign regional stress values based on approximate geographic regions
    for i, lat in enumerate(soil_data['lat']):
        for j, lon in enumerate(soil_data['lon']):
            if lat > 12:  # Sahel region
                climate_stress_map[i, j] = regions_stress['Sahel']
            elif lat > 0 and lon > 35:  # East Africa
                climate_stress_map[i, j] = regions_stress['East Africa']
            elif lat < -10:  # Southern Africa
                climate_stress_map[i, j] = regions_stress['Southern Africa'] 
            elif lon < 10:  # West Africa
                climate_stress_map[i, j] = regions_stress['West Africa']
            else:  # Central Africa
                climate_stress_map[i, j] = regions_stress['Central Africa']
    
    # Normalize soil health index to 0-1 scale (inverse - lower health = higher risk)
    soil_risk = 1 - (soil_data['health_index'] / 5)
    
    # Calculate compound risk index
    compound_risk = 0.6 * soil_risk + 0.4 * climate_stress_map
    
    # Classify compound risk levels
    def classify_compound_risk(risk_value):
        if risk_value >= 0.75:
            return "Extreme Risk - Perfect Storm"
        elif risk_value >= 0.6:
            return "High Risk - Multiple Threats"
        elif risk_value >= 0.4:
            return "Moderate Risk - Emerging Threats"
        else:
            return "Low Risk - Relatively Stable"
    
    # Calculate statistics
    total_pixels = compound_risk.size
    extreme_risk = np.sum(compound_risk >= 0.75) / total_pixels * 100
    high_risk = np.sum((compound_risk >= 0.6) & (compound_risk < 0.75)) / total_pixels * 100
    moderate_risk = np.sum((compound_risk >= 0.4) & (compound_risk < 0.6)) / total_pixels * 100
    low_risk = np.sum(compound_risk < 0.4) / total_pixels * 100
    
    print(f"📊 COMPOUND RISK ASSESSMENT:")
    print(f"   🔥 Extreme Risk (Perfect Storm): {extreme_risk:.1f}%")
    print(f"   🚨 High Risk (Multiple Threats): {high_risk:.1f}%")
    print(f"   ⚠️  Moderate Risk (Emerging): {moderate_risk:.1f}%")
    print(f"   ✅ Low Risk (Stable): {low_risk:.1f}%")
    
    # Identify top hotspot regions (highest risk concentrations)
    hotspot_threshold = np.percentile(compound_risk, 90)  # Top 10% risk areas
    hotspot_mask = compound_risk >= hotspot_threshold
    
    print(f"\n🎯 HOTSPOT IDENTIFICATION:")
    print(f"   📍 Top 10% highest risk areas identified as priority hotspots")
    print(f"   🔢 Hotspot threshold: {hotspot_threshold:.3f}")
    
    return {
        'compound_risk': compound_risk,
        'climate_stress_map': climate_stress_map,
        'soil_risk': soil_risk,
        'hotspot_mask': hotspot_mask,
        'stats': {
            'extreme_risk': extreme_risk,
            'high_risk': high_risk, 
            'moderate_risk': moderate_risk,
            'low_risk': low_risk
        }
    }

# Execute compound risk analysis
compound_risk_data = create_compound_risk_narrative(soil_narrative_data, climate_narrative_data)

## Visualization 3: Compound Risk "Perfect Storm" Map
### Where Soil and Climate Threats Create Maximum Impact

In [None]:
# Create compound risk visualization
def create_compound_risk_map(soil_data, compound_data):
    """Create compelling compound risk visualization."""
    
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Compound Risk Index', 'Soil Health Risk', 'Climate Stress Risk', 'Priority Hotspots'),
        specs=[[{'type': 'scatter'}, {'type': 'scatter'}],
               [{'type': 'scatter'}, {'type': 'scatter'}]]
    )
    
    # Main compound risk map
    fig.add_trace(
        go.Heatmap(
            z=compound_data['compound_risk'],
            x=soil_data['lon'],
            y=soil_data['lat'],
            colorscale='Plasma',  # Purple to yellow - dramatic 
            zmin=0, zmax=1,
            colorbar=dict(title="Compound Risk", x=1.02)
        ),
        row=1, col=1
    )
    
    # Soil risk component
    fig.add_trace(
        go.Heatmap(
            z=compound_data['soil_risk'],
            x=soil_data['lon'],
            y=soil_data['lat'],
            colorscale='Reds',
            showscale=False
        ),
        row=1, col=2
    )
    
    # Climate stress component
    fig.add_trace(
        go.Heatmap(
            z=compound_data['climate_stress_map'], 
            x=soil_data['lon'],
            y=soil_data['lat'],
            colorscale='Blues',
            showscale=False
        ),
        row=2, col=1
    )
    
    # Priority hotspots (binary)
    fig.add_trace(
        go.Heatmap(
            z=compound_data['hotspot_mask'].astype(int),
            x=soil_data['lon'],
            y=soil_data['lat'],
            colorscale=[[0, 'white'], [1, 'red']],
            showscale=False
        ),
        row=2, col=2
    )
    
    fig.update_layout(
        title={
            'text': "Compound Risk 'Perfect Storm': Where Soil & Climate Threats Converge",
            'x': 0.5,
            'font': {'size': 16}
        },
        height=600,
        annotations=[
            dict(text=f"🔥 {compound_data['stats']['extreme_risk']:.1f}% in Extreme Risk | 🎯 Priority intervention zones identified", 
                 x=0.5, y=-0.15, xref='paper', yref='paper', showarrow=False, font=dict(size=12))
        ]
    )
    
    return fig

# Create compound risk map
compound_risk_fig = create_compound_risk_map(soil_narrative_data, compound_risk_data)
compound_risk_fig.show()

print("🗺️ Compound Risk Map Created")
print("📖 Narrative Insight: Purple/yellow areas show 'Perfect Storm' zones needing urgent intervention")

## Summary: Strategic Analysis Phase Complete

### 🎯 **Key Narrative Insights Generated:**

1. **Hidden Hunger Hotspots**: Identified regions where soil degradation silently threatens food security
2. **Climate Acceleration**: Dramatic visualization of increasing moisture stress across SSA
3. **Perfect Storm Zones**: Mapped compound risk areas where multiple threats converge

### 📈 **Quantified Impact:**
- **Soil Health**: {severe_degradation:.1f}% of soils show severe degradation 
- **Climate Stress**: All regions showing increasing moisture deficits by 2050
- **Compound Risk**: {extreme_risk:.1f}% of areas in "Perfect Storm" zones

### 🚀 **Next Phase: Human Impact & Solutions**
Ready to proceed to Phase 3-4:
- **Agricultural & Human Exposure** (MapSPAM, GLW, WorldPop integration)
- **Solutions Framework** (WOCAT database integration)
- **Observable Storytelling** (Interactive narrative development)

In [None]:
# Save analysis results for next phase
import pickle
import json

# Save narrative data for continued analysis
analysis_results = {
    'soil_narrative_data': soil_narrative_data,
    'climate_narrative_data': climate_narrative_data,
    'compound_risk_data': compound_risk_data
}

# Save to processed data folder
results_path = config.PROCESSED_DATA_PATH / 'strategic_narrative_results.pkl'
results_path.parent.mkdir(parents=True, exist_ok=True)

with open(results_path, 'wb') as f:
    pickle.dump(analysis_results, f)

# Save summary statistics as JSON for Observable integration
summary_stats = {
    'soil_health': soil_narrative_data['stats'],
    'climate_stress': {
        'stress_increase': climate_narrative_data['stress_increase'],
        'future_stress': climate_narrative_data['future_stress']
    },
    'compound_risk': compound_risk_data['stats']
}

json_path = config.PROCESSED_DATA_PATH / 'narrative_summary_stats.json'
with open(json_path, 'w') as f:
    json.dump(summary_stats, f, indent=2)

print("💾 Strategic Analysis Results Saved")
print(f"📁 Pickle file: {results_path}")
print(f"📊 JSON summary: {json_path}")
print("\n🚀 READY FOR PHASE 3: HUMAN IMPACT & SOLUTIONS ANALYSIS")