# Urban Health Risk Mapping
## Archetype 5: The Built & Natural Environment Specialist

This notebook uses geospatial data to create health risk maps for urban areas, demonstrating how urban design and the built environment influence disease transmission patterns. Students will analyze population density, building characteristics, transportation networks, and green spaces to identify neighborhoods that might be more vulnerable during respiratory virus outbreaks.

### Learning Objectives:
- Understand how urban design affects disease transmission patterns
- Learn geospatial analysis techniques for public health applications
- Analyze the relationship between built environment features and health risks
- Create risk maps to inform public health planning and interventions
- Evaluate the role of green spaces and urban infrastructure in health outcomes
- Design evidence-based urban planning recommendations

### Key Concepts:
- **Urban density**: Population and building density effects on transmission
- **Ventilation**: Indoor air quality and airborne disease transmission
- **Transportation hubs**: Mass transit as disease transmission vectors
- **Green spaces**: Parks and natural areas as health-promoting infrastructure
- **Spatial analysis**: Geographic information systems (GIS) for health research

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Circle
import warnings
warnings.filterwarnings('ignore')

# For spatial analysis and mapping
from scipy.spatial.distance import cdist
from scipy import ndimage
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Set up plotting style
plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (12, 8)
sns.set_palette("husl")

print("🏙️ Libraries loaded successfully for urban health risk mapping")

## Part 1: Understanding Urban Environment and Health

The built environment significantly influences disease transmission through multiple pathways:

### Physical Infrastructure:
- **Building density**: Crowded buildings facilitate close contact
- **Ventilation systems**: Poor ventilation increases airborne transmission risk
- **Housing quality**: Overcrowded housing prevents isolation
- **Transportation**: Public transit creates transmission opportunities

### Social Infrastructure:
- **Green spaces**: Parks provide safe outdoor gathering spaces
- **Commercial areas**: Shops and restaurants vary in transmission risk
- **Educational facilities**: Schools and universities as transmission hubs
- **Healthcare access**: Proximity to medical facilities affects outcomes

In [None]:
# Generate synthetic urban data for a metropolitan area
# This simulates what you might get from urban planning databases, census data, and GIS systems

def generate_urban_environment_data(grid_size=50, seed=42):
    """
    Generate synthetic urban environment data
    Each cell represents a neighborhood or city block
    """
    np.random.seed(seed)
    
    # Create coordinate grid
    x_coords, y_coords = np.meshgrid(range(grid_size), range(grid_size))
    
    urban_data = []
    
    # Define city structure with multiple centers (polycentric city)
    city_centers = [(15, 15), (35, 35), (20, 40), (40, 15)] # Business districts
    
    for i in range(grid_size):
        for j in range(grid_size):
            # Calculate distance to nearest city center
            distances_to_centers = [np.sqrt((i-cx)**2 + (j-cy)**2) for cx, cy in city_centers]
            min_distance_to_center = min(distances_to_centers)
            
            # Population density decreases with distance from centers
            base_density = max(50, 500 - min_distance_to_center * 15)
            population_density = max(10, base_density + np.random.normal(0, 50))
            
            # Building characteristics
            if min_distance_to_center < 5: # Urban core
                avg_building_height = 15 + np.random.normal(0, 5)
                building_age = 30 + np.random.normal(0, 15) # Mix of old and new
                ventilation_quality = 0.7 + np.random.normal(0, 0.15) # Better in newer buildings
            elif min_distance_to_center < 15: # Urban
                avg_building_height = 8 + np.random.normal(0, 3)
                building_age = 40 + np.random.normal(0, 20)
                ventilation_quality = 0.6 + np.random.normal(0, 0.2)
            else: # Suburban
                avg_building_height = 3 + np.random.normal(0, 1)
                building_age = 25 + np.random.normal(0, 15)
                ventilation_quality = 0.8 + np.random.normal(0, 0.1) # Newer suburban development
            
            # Ensure reasonable bounds
            avg_building_height = max(1, avg_building_height)
            building_age = max(1, min(100, building_age))
            ventilation_quality = max(0.1, min(1.0, ventilation_quality))
            
            # Transportation infrastructure
            # Major transit lines
            transit_distance = min([
                abs(i - 25), # Horizontal line
                abs(j - 25), # Vertical line
                abs((i + j) - 50), # Diagonal line
                abs((i - j))   # Other diagonal
            ])
            
            transit_accessibility = max(0.1, 1.0 - transit_distance / 10)
            
            # Green space (parks tend to be planned in residential areas)
            # Create some large parks
            park_centers = [(10, 30), (40, 10), (30, 45)]
            distance_to_park = min([np.sqrt((i-px)**2 + (j-py)**2) for px, py in park_centers])
            
            if distance_to_park < 3:
                green_space_access = 0.9 + np.random.normal(0, 0.05)
            elif distance_to_park < 8:
                green_space_access = 0.6 + np.random.normal(0, 0.1)
            else:
                green_space_access = 0.3 + np.random.normal(0, 0.15)
            
            green_space_access = max(0.0, min(1.0, green_space_access))
            
            # Socioeconomic factors (correlated with distance from center and infrastructure)
            base_income = 50000 + max(0, (20 - min_distance_to_center) * 3000)
            median_income = base_income + np.random.normal(0, 15000)
            median_income = max(20000, median_income)
            
            # Education level (correlated with income)
            college_education_pct = min(80, max(10, (median_income - 30000) / 1000 + np.random.normal(0, 10)))
            
            # Healthcare access
            hospital_centers = [(12, 18), (38, 32), (25, 40)]
            distance_to_hospital = min([np.sqrt((i-hx)**2 + (j-hy)**2) for hx, hy in hospital_centers])
            healthcare_access = max(0.2, 1.0 - distance_to_hospital / 25)
            
            urban_data.append({
                'x': i,
                'y': j,
                'population_density': population_density,
                'avg_building_height': avg_building_height,
                'building_age': building_age,
                'ventilation_quality': ventilation_quality,
                'transit_accessibility': transit_accessibility,
                'green_space_access': green_space_access,
                'median_income': median_income,
                'college_education_pct': college_education_pct,
                'healthcare_access': healthcare_access,
                'distance_to_center': min_distance_to_center
            })
    
    return pd.DataFrame(urban_data)

# Generate the urban dataset
urban_df = generate_urban_environment_data()

print(f"🏘️ Generated urban environment dataset: {len(urban_df)} neighborhoods")
print(f"📐 Grid size: {urban_df['x'].max()+1} x {urban_df['y'].max()+1}")
print(f"\n📊 Dataset summary:")
print(urban_df.describe().round(2))

## Part 2: Basic Urban Environment Mapping

In [None]:
# Create basic urban environment maps
def create_spatial_map(data, column, title, cmap='viridis'):
    """
    Create a spatial heatmap from urban data
    """
    # Reshape data into grid format
    grid_size = int(np.sqrt(len(data)))
    values = data[column].values.reshape(grid_size, grid_size)
    
    return values

# Create comprehensive urban environment visualization
fig, axes = plt.subplots(2, 4, figsize=(20, 10))
axes = axes.flatten()

# Define the maps to create
maps_config = [
    ('population_density', 'Population Density', 'Reds'),
    ('avg_building_height', 'Avg Building Height', 'Blues'),
    ('ventilation_quality', 'Ventilation Quality', 'Greens'),
    ('transit_accessibility', 'Transit Accessibility', 'Purples'),
    ('green_space_access', 'Green Space Access', 'YlGn'),
    ('median_income', 'Median Income', 'RdYlBu'),
    ('healthcare_access', 'Healthcare Access', 'Oranges'),
    ('building_age', 'Average Building Age', 'copper')
]

# Create each map
for i, (column, title, cmap) in enumerate(maps_config):
    values = create_spatial_map(urban_df, column, title, cmap)
    
    im = axes[i].imshow(values, cmap=cmap, interpolation='bilinear')
    axes[i].set_title(title, fontsize=12, fontweight='bold')
    axes[i].set_xticks([])
    axes[i].set_yticks([])
    
    # Add colorbar
    plt.colorbar(im, ax=axes[i], shrink=0.8)

plt.tight_layout()
plt.show()

print("🗺️ Urban environment maps show spatial patterns across the metropolitan area")
print(f"\n🔍 Key observations:")
print(f"- Population density ranges from {urban_df['population_density'].min():.0f} to {urban_df['population_density'].max():.0f} people/km²")
print(f"- Building heights vary from {urban_df['avg_building_height'].min():.1f} to {urban_df['avg_building_height'].max():.1f} floors")
print(f"- Ventilation quality scores range from {urban_df['ventilation_quality'].min():.2f} to {urban_df['ventilation_quality'].max():.2f}")
print(f"- Income inequality: ${urban_df['median_income'].min():,.0f} to ${urban_df['median_income'].max():,.0f}")

## Part 3: Health Risk Assessment Model

In [None]:
# Develop a comprehensive health risk assessment model
def calculate_health_risk_scores(data):
    """
    Calculate various health risk scores based on built environment factors
    """
    df = data.copy()
    
    # Risk factors (higher values = higher risk)
    risk_factors = ['population_density', 'avg_building_height', 'building_age', 'transit_accessibility']
    
    # Protective factors (higher values = lower risk)
    protective_factors = ['ventilation_quality', 'green_space_access', 'median_income', 
                          'healthcare_access', 'college_education_pct']
    
    # Normalize risk factors (0 = low risk, 1 = high risk)
    for factor in risk_factors:
        min_val = df[factor].min()
        max_val = df[factor].max()
        df[f'{factor}_norm'] = (df[factor] - min_val) / (max_val - min_val)
    
    # Normalize protective factors (0 = high risk, 1 = low risk)
    for factor in protective_factors:
        min_val = df[factor].min()
        max_val = df[factor].max()
        df[f'{factor}_norm'] = 1 - (df[factor] - min_val) / (max_val - min_val)
    
    # Calculate transmission risk (airborne disease)
    transmission_risk = (
        df['population_density_norm'] * 0.3 +  # Crowding
        df['avg_building_height_norm'] * 0.2 +  # Vertical density
        df['ventilation_quality_norm'] * 0.25 + # Poor ventilation
        df['transit_accessibility_norm'] * 0.15 + # Transit exposure
        df['building_age_norm'] * 0.1  # Older buildings, worse ventilation
    )
    
    # Calculate vulnerability risk (susceptible populations)
    vulnerability_risk = (
        df['median_income_norm'] * 0.35 + # Economic vulnerability
        df['healthcare_access_norm'] * 0.30 + # Limited healthcare access
        df['college_education_pct_norm'] * 0.25 + # Health literacy proxy
        df['green_space_access_norm'] * 0.10  # Mental health/exercise proxy
    )
    
    # Calculate overall health risk
    overall_risk = (transmission_risk * 0.6 + vulnerability_risk * 0.4)
    
    # Add calculated risks to dataframe
    df['transmission_risk'] = transmission_risk
    df['vulnerability_risk'] = vulnerability_risk
    df['overall_health_risk'] = overall_risk
    
    return df

# Calculate health risks
urban_risk_df = calculate_health_risk_scores(urban_df)

# Create risk assessment visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

risk_maps = [
    ('transmission_risk', 'Disease Transmission Risk', 'Reds'),
    ('vulnerability_risk', 'Population Vulnerability Risk', 'Oranges'),
    ('overall_health_risk', 'Overall Health Risk', 'RdYlBu_r')
]

# Create risk maps
for i, (column, title, cmap) in enumerate(risk_maps):
    row = i // 2
    col = i % 2
    
    values = create_spatial_map(urban_risk_df, column, title)
    
    im = axes[row, col].imshow(values, cmap=cmap, interpolation='bilinear')
    axes[row, col].set_title(title, fontsize=14, fontweight='bold')
    axes[row, col].set_xticks([])
    axes[row, col].set_yticks([])
    
    # Add colorbar
    cbar = plt.colorbar(im, ax=axes[row, col], shrink=0.8)
    cbar.set_label('Risk Score (0=Low, 1=High)', fontsize=10)

# Fourth plot: Risk distribution histogram
axes[1,1].hist(urban_risk_df['overall_health_risk'], bins=30, alpha=0.7, color='red', edgecolor='black')
axes[1,1].axvline(urban_risk_df['overall_health_risk'].mean(), color='blue', linestyle='--', 
                  linewidth=2, label=f"Mean: {urban_risk_df['overall_health_risk'].mean():.2f}")
axes[1,1].axvline(urban_risk_df['overall_health_risk'].median(), color='green', linestyle='--', 
                  linewidth=2, label=f"Median: {urban_risk_df['overall_health_risk'].median():.2f}")
axes[1,1].set_xlabel('Overall Health Risk Score')
axes[1,1].set_ylabel('Number of Neighborhoods')
axes[1,1].set_title('Distribution of Health Risk Scores')
axes[1,1].legend()
axes[1,1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print risk assessment summary
print(f"\n⚕️ Health Risk Assessment Summary:")
print(f"Average transmission risk: {urban_risk_df['transmission_risk'].mean():.3f}")
print(f"Average vulnerability risk: {urban_risk_df['vulnerability_risk'].mean():.3f}")
print(f"Average overall health risk: {urban_risk_df['overall_health_risk'].mean():.3f}")

# Identify high-risk areas
high_risk_threshold = urban_risk_df['overall_health_risk'].quantile(0.9)
high_risk_areas = urban_risk_df[urban_risk_df['overall_health_risk'] >= high_risk_threshold]

print(f"\n🚨 High-Risk Areas (top 10%):")
print(f"Number of high-risk neighborhoods: {len(high_risk_areas)}")
print(f"Risk threshold: {high_risk_threshold:.3f}")
print(f"Population in high-risk areas: {high_risk_areas['population_density'].sum():,.0f}")

# Analyze characteristics of high-risk areas
print(f"\n📊 High-Risk Area Characteristics:")
print(f"Average population density: {high_risk_areas['population_density'].mean():,.0f} vs {urban_risk_df['population_density'].mean():,.0f} (overall)")
print(f"Average income: ${high_risk_areas['median_income'].mean():,.0f} vs ${urban_risk_df['median_income'].mean():,.0f} (overall)")
print(f"Average ventilation quality: {high_risk_areas['ventilation_quality'].mean():.2f} vs {urban_risk_df['ventilation_quality'].mean():.2f} (overall)")
print(f"Average green space access: {high_risk_areas['green_space_access'].mean():.2f} vs {urban_risk_df['green_space_access'].mean():.2f} (overall)")

## Part 4: Transportation and Mobility Analysis

In [None]:
# Analyze transportation networks and mobility patterns
def analyze_transportation_risk(data):
    """
    Analyze transportation-related health risks
    """
    df = data.copy()
    
    # Simulate transportation hubs (major stations, airports, bus terminals)
    transport_hubs = [
        {'x': 25, 'y': 25, 'type': 'metro_station', 'capacity': 50000},
        {'x': 15, 'y': 35, 'type': 'bus_terminal', 'capacity': 20000},
        {'x': 40, 'y': 15, 'type': 'airport', 'capacity': 100000},
        {'x': 35, 'y': 40, 'type': 'train_station', 'capacity': 30000},
        {'x': 10, 'y': 10, 'type': 'metro_station', 'capacity': 35000}
    ]
    
    # Calculate distance and exposure risk to each transportation hub
    for i, row in df.iterrows():
        hub_risks = []
        total_exposure = 0
        
        for hub in transport_hubs:
            distance = np.sqrt((row['x'] - hub['x'])**2 + (row['y'] - hub['y'])**2)
            
            # Risk decreases with distance but increases with hub capacity
            if distance < 10: # Within influence zone
                exposure_risk = (hub['capacity'] / 100000) * np.exp(-distance / 5)
                total_exposure += exposure_risk
            
            hub_risks.append({
                'hub_type': hub['type'],
                'distance': distance,
                'exposure_risk': exposure_risk if distance < 10 else 0
            })
        
        df.loc[i, 'transport_hub_exposure'] = min(1.0, total_exposure)
    
    # Simulate traffic density based on proximity to city centers and transport hubs
    df['traffic_density'] = (
        df['transit_accessibility'] * 0.4 +
        df['transport_hub_exposure'] * 0.3 +
        (1 - df['distance_to_center'] / df['distance_to_center'].max()) * 0.3
    )
    
    # Calculate commuting patterns (people travel toward city centers)
    df['commute_flow'] = np.where(
        df['distance_to_center'] > 10,
        df['population_density'] * 0.6, # Suburban residents commute in
        df['population_density'] * 0.3   # Urban residents may commute less
    )
    
    return df, transport_hubs

# Analyze transportation risks
urban_transport_df, hubs = analyze_transportation_risk(urban_risk_df)

# Create transportation analysis visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Plot 1: Transportation hub exposure
transport_values = create_spatial_map(urban_transport_df, 'transport_hub_exposure', 'Transport Hub Exposure')
im1 = axes[0,0].imshow(transport_values, cmap='Reds', interpolation='bilinear')
axes[0,0].set_title('Transportation Hub Exposure Risk', fontsize=14, fontweight='bold')
axes[0,0].set_xticks([])
axes[0,0].set_yticks([])

# Add hub locations
hub_colors = {'metro_station': 'blue', 'bus_terminal': 'green', 'airport': 'red', 'train_station': 'purple'}
for hub in hubs:
    circle = Circle((hub['x'], hub['y']), 2, color=hub_colors[hub['type']], alpha=0.8)
    axes[0,0].add_patch(circle)
    axes[0,0].text(hub['x'], hub['y']-3, hub['type'].replace('_', ' ').title(), 
                   ha='center', fontsize=8, fontweight='bold')

plt.colorbar(im1, ax=axes[0,0], shrink=0.8)

# Plot 2: Traffic density
traffic_values = create_spatial_map(urban_transport_df, 'traffic_density', 'Traffic Density')
im2 = axes[0,1].imshow(traffic_values, cmap='Oranges', interpolation='bilinear')
axes[0,1].set_title('Traffic Density', fontsize=14, fontweight='bold')
axes[0,1].set_xticks([])
axes[0,1].set_yticks([])
plt.colorbar(im2, ax=axes[0,1], shrink=0.8)

# Plot 3: Commute flow patterns
commute_values = create_spatial_map(urban_transport_df, 'commute_flow', 'Commute Flow')
im3 = axes[1,0].imshow(commute_values, cmap='Blues', interpolation='bilinear')
axes[1,0].set_title('Daily Commute Flow Intensity', fontsize=14, fontweight='bold')
axes[1,0].set_xticks([])
axes[1,0].set_yticks([])
plt.colorbar(im3, ax=axes[1,0], shrink=0.8)

# Plot 4: Transportation risk vs income
axes[1,1].scatter(urban_transport_df['median_income'], urban_transport_df['transport_hub_exposure'], 
                  c=urban_transport_df['traffic_density'], cmap='viridis', alpha=0.6, s=30)
axes[1,1].set_xlabel('Median Income ($)')
axes[1,1].set_ylabel('Transportation Hub Exposure')
axes[1,1].set_title('Transportation Risk vs Income\n(Color = Traffic Density)')
axes[1,1].grid(True, alpha=0.3)

# Add trend line
z = np.polyfit(urban_transport_df['median_income'], urban_transport_df['transport_hub_exposure'], 1)
p = np.poly1d(z)
axes[1,1].plot(urban_transport_df['median_income'], p(urban_transport_df['median_income']), 
               "r--", alpha=0.8, linewidth=2)

plt.colorbar(axes[1,1].collections[0], ax=axes[1,1], shrink=0.8, label='Traffic Density')

plt.tight_layout()
plt.show()

# Calculate transportation risk statistics
print(f"\n🚌 Transportation Risk Analysis:")
print(f"Average transport hub exposure: {urban_transport_df['transport_hub_exposure'].mean():.3f}")
print(f"Average traffic density: {urban_transport_df['traffic_density'].mean():.3f}")

# Analyze correlation between income and transportation exposure
from scipy.stats import pearsonr
corr_income_transport, p_value = pearsonr(urban_transport_df['median_income'], 
                                            urban_transport_df['transport_hub_exposure'])

print(f"\n📊 Income vs Transportation Exposure:")
print(f"Correlation coefficient: {corr_income_transport:.3f}")
print(f"P-value: {p_value:.3f}")
if p_value < 0.05:
    direction = "positive" if corr_income_transport > 0 else "negative"
    print(f"Significant {direction} correlation between income and transportation exposure")

# Identify transportation risk hotspots
transport_hotspots = urban_transport_df[
    urban_transport_df['transport_hub_exposure'] > urban_transport_df['transport_hub_exposure'].quantile(0.9)
]

print(f"\n🎯 Transportation Risk Hotspots:")
print(f"Number of high-exposure areas: {len(transport_hotspots)}")
print(f"Average income in hotspots: ${transport_hotspots['median_income'].mean():,.0f}")
print(f"Average population density in hotspots: {transport_hotspots['population_density'].mean():,.0f}")

## Part 6: Neighborhood Clustering and Risk Typology

In [None]:
# Use clustering to identify neighborhood typologies based on health risk factors
def cluster_neighborhoods(data, n_clusters=5):
    """
    Cluster neighborhoods based on health-related characteristics
    """
    # Select features for clustering
    features = [
        'population_density', 'median_income', 'green_space_access',
        'ventilation_quality', 'transit_accessibility', 'healthcare_access',
        'transport_hub_exposure'
    ]
    
    # Prepare data for clustering
    X = data[features].copy()
    
    # Standardize features
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    # Perform K-means clustering
    kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    clusters = kmeans.fit_predict(X_scaled)
    
    # Add cluster labels to data
    data_clustered = data.copy()
    data_clustered['cluster'] = clusters
    
    # Calculate cluster characteristics
    cluster_profiles = data_clustered.groupby('cluster')[features + ['overall_health_risk']].mean()
    
    # Name clusters based on characteristics
    cluster_names = {}
    for cluster_id in range(n_clusters):
        profile = cluster_profiles.loc[cluster_id]
        
        if profile['median_income'] > data['median_income'].quantile(0.7) and profile['green_space_access'] > 0.6:
            cluster_names[cluster_id] = 'Affluent Green Suburbs'
        elif profile['population_density'] > data['population_density'].quantile(0.8) and profile['median_income'] < data['median_income'].quantile(0.4):
            cluster_names[cluster_id] = 'Dense Low-Income Urban'
        elif profile['transport_hub_exposure'] > 0.5 and profile['population_density'] > data['population_density'].quantile(0.6):
            cluster_names[cluster_id] = 'Transit-Oriented High-Density'
        elif profile['median_income'] > data['median_income'].quantile(0.6) and profile['population_density'] > data['population_density'].quantile(0.7):
            cluster_names[cluster_id] = 'Affluent Urban Core'
        else:
            cluster_names[cluster_id] = f'Mixed-Use Residential {cluster_id}'
    
    return data_clustered, cluster_profiles, cluster_names

# Perform neighborhood clustering
urban_clustered_df, cluster_profiles, cluster_names = cluster_neighborhoods(urban_transport_df)

# Create clustering visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Plot 1: Cluster map
cluster_values = create_spatial_map(urban_clustered_df, 'cluster', 'Neighborhood Clusters')
im1 = axes[0,0].imshow(cluster_values, cmap='tab10', interpolation='nearest')
axes[0,0].set_title('Neighborhood Typology Clusters', fontsize=14, fontweight='bold')
axes[0,0].set_xticks([])
axes[0,0].set_yticks([])

# Add cluster legend
unique_clusters = sorted(urban_clustered_df['cluster'].unique())
legend_text = [f"C{i}: {cluster_names[i]}" for i in unique_clusters]
axes[0,0].text(1.02, 0.5, '\n'.join(legend_text), transform=axes[0,0].transAxes, 
               fontsize=10, verticalalignment='center', bbox=dict(boxstyle="round", facecolor='wheat', alpha=0.8))

# Plot 2: Cluster characteristics (bar chart)
features_for_plot = ['population_density', 'median_income', 'green_space_access', 'overall_health_risk']
feature_labels = ['Pop Density', 'Income', 'Green Space', 'Health Risk']

# Normalize for comparison
cluster_profiles_norm = cluster_profiles.copy()
for feature in features_for_plot:
    max_val = urban_clustered_df[feature].max()
    min_val = urban_clustered_df[feature].min()
    cluster_profiles_norm[feature] = (cluster_profiles_norm[feature] - min_val) / (max_val - min_val)

# Create grouped bar chart
x = np.arange(len(feature_labels))
width = 0.15
colors = plt.cm.tab10(np.linspace(0, 1, len(unique_clusters)))

for i, cluster_id in enumerate(unique_clusters):
    values = [cluster_profiles_norm.loc[cluster_id, feature] for feature in features_for_plot]
    axes[0,1].bar(x + i*width, values, width, label=f'C{cluster_id}', color=colors[i], alpha=0.8)

axes[0,1].set_xlabel('Characteristics')
axes[0,1].set_ylabel('Normalized Score (0-1)')
axes[0,1].set_title('Cluster Characteristic Profiles')
axes[0,1].set_xticks(x + width * (len(unique_clusters)-1) / 2)
axes[0,1].set_xticklabels(feature_labels)
axes[0,1].legend()
axes[0,1].grid(True, alpha=0.3)

# Plot 3: Health risk by cluster
cluster_risk_data = []
for cluster_id in unique_clusters:
    cluster_data = urban_clustered_df[urban_clustered_df['cluster'] == cluster_id]
    cluster_risk_data.append(cluster_data['overall_health_risk'].values)

box_plot = axes[1,0].boxplot(cluster_risk_data, labels=[f'C{i}' for i in unique_clusters], patch_artist=True)
for patch, color in zip(box_plot['boxes'], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)

axes[1,0].set_xlabel('Cluster')
axes[1,0].set_ylabel('Overall Health Risk Score')
axes[1,0].set_title('Health Risk Distribution by Cluster')
axes[1,0].grid(True, alpha=0.3)

# Plot 4: Cluster population and area statistics
cluster_stats = urban_clustered_df.groupby('cluster').agg({
    'population_density': 'sum',  # Total population proxy
    'cluster': 'count'  # Number of neighborhoods
}).rename(columns={'cluster': 'neighborhood_count', 'population_density': 'total_population_proxy'})

cluster_names_short = [cluster_names[i][:15] + '...' if len(cluster_names[i]) > 15 else cluster_names[i] for i in unique_clusters]

bars = axes[1,1].bar(range(len(unique_clusters)), cluster_stats['neighborhood_count'], 
                       color=colors, alpha=0.7)
axes[1,1].set_xlabel('Cluster Type')
axes[1,1].set_ylabel('Number of Neighborhoods')
axes[1,1].set_title('Neighborhood Count by Cluster')
axes[1,1].set_xticks(range(len(unique_clusters)))
axes[1,1].set_xticklabels(cluster_names_short, rotation=45, ha='right')
axes[1,1].grid(True, alpha=0.3)

# Add value labels on bars
for bar, value in zip(bars, cluster_stats['neighborhood_count']):
    height = bar.get_height()
    axes[1,1].text(bar.get_x() + bar.get_width()/2., height + 0.5,
                   f'{int(value)}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

# Print cluster analysis results
print(f"\n🏘️ Neighborhood Cluster Analysis:")
print(f"Number of clusters identified: {len(unique_clusters)}")

for cluster_id in unique_clusters:
    cluster_data = urban_clustered_df[urban_clustered_df['cluster'] == cluster_id]
    print(f"\n{cluster_names[cluster_id]} (Cluster {cluster_id}):")
    print(f"  Neighborhoods: {len(cluster_data)}")
    print(f"  Avg health risk: {cluster_data['overall_health_risk'].mean():.3f}")
    print(f"  Avg income: ${cluster_data['median_income'].mean():,.0f}")
    print(f"  Avg population density: {cluster_data['population_density'].mean():,.0f}")
    print(f"  Avg green space access: {cluster_data['green_space_access'].mean():.3f}")

# Identify highest and lowest risk cluster types
cluster_avg_risk = urban_clustered_df.groupby('cluster')['overall_health_risk'].mean()
highest_risk_cluster = cluster_avg_risk.idxmax()
lowest_risk_cluster = cluster_avg_risk.idxmin()

print(f"\n🚨 Risk Assessment Summary:")
print(f"Highest risk cluster: {cluster_names[highest_risk_cluster]} (Risk: {cluster_avg_risk[highest_risk_cluster]:.3f})")
print(f"Lowest risk cluster: {cluster_names[lowest_risk_cluster]} (Risk: {cluster_avg_risk[lowest_risk_cluster]:.3f})")
print(f"Risk ratio (highest/lowest): {cluster_avg_risk[highest_risk_cluster] / cluster_avg_risk[lowest_risk_cluster]:.1f}x")

## Part 7: Urban Planning Recommendations

In [None]:
# Generate evidence-based urban planning recommendations
def generate_planning_recommendations(cluster_profiles, cluster_names):
    """
    Generate targeted urban planning recommendations based on analysis
    """
    recommendations = {}
    highest_risk_cluster = cluster_profiles['overall_health_risk'].idxmax()
    
    for cluster_id, name in cluster_names.items():
        recs = []
        profile = cluster_profiles.loc[cluster_id]
        
        if name == 'Dense Low-Income Urban':
            recs.append("**Priority 1: High-Risk Intervention**")
            recs.append("- **Improve Ventilation**: Subsidize HVAC upgrades and mandate better ventilation standards in rental housing.")
            recs.append("- **Increase Green Space**: Develop pocket parks and community gardens to improve mental health and provide safe recreation.")
            recs.append("- **Enhance Healthcare Access**: Establish mobile health clinics to serve neighborhoods with poor healthcare access.")
        
        elif name == 'Transit-Oriented High-Density':
            recs.append("**Priority 2: Transmission Hotspot Mitigation**")
            recs.append("- **Upgrade Transit Hubs**: Improve ventilation and crowd management systems in metro and bus stations.")
            recs.append("- **Promote Mixed-Use Zoning**: Encourage development of local amenities to reduce long-distance commuting.")
        
        elif name == 'Affluent Green Suburbs':
            recs.append("**Priority 4: Preserve Protective Factors**")
            recs.append("- **Protect Green Spaces**: Strengthen zoning laws to prevent the loss of existing parks and natural areas.")
            recs.append("- **Maintain Infrastructure**: Ensure continued investment in high-quality housing and community facilities.")
            
        elif name == 'Affluent Urban Core':
            recs.append("**Priority 3: Monitor and Maintain**")
            recs.append("- **Building Audits**: Conduct regular audits of ventilation systems in older high-rise buildings.")
            recs.append("- **Public Health Campaigns**: Ensure health information reaches even affluent populations who may have high mobility.")
            
        else: # Mixed-Use Residential
            recs.append("**General Recommendations**")
            recs.append("- **Community Engagement**: Work with residents to identify specific local health concerns and co-design solutions.")
            recs.append("- **Targeted Investments**: Use health risk data to guide investments in infrastructure like sidewalks, lighting, and public services.")
            
        recommendations[name] = recs
        
    return recommendations

# Generate and print recommendations
planning_recommendations = generate_planning_recommendations(cluster_profiles, cluster_names)

print("📋 Urban Planning Recommendations by Neighborhood Type:")
for name, recs in planning_recommendations.items():
    print(f"\n### {name}")
    for rec in recs:
        print(rec)

## Conclusion

This notebook demonstrates a comprehensive workflow for urban health risk mapping. By integrating data on population density, building characteristics, socioeconomic factors, and transportation networks, we can move from raw data to actionable insights. The final cluster-based recommendations allow urban planners and public health officials to develop targeted, evidence-based strategies to build healthier and more resilient cities. This approach highlights the critical link between the built environment and public health, providing a powerful tool for preventing future disease outbreaks.