Click here to opern this notebook in Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tsekatm/aws-python-data-engineering-challenge/blob/main/Super_imposing__data_visualization_on_Google_maps.ipynb)

#🌍 AWS Data Engineering Challenge
## Superimposing Analysed Loadshedding Data on Google Maps

**Host**: Tebogo Tseka

**Presenter**: Joyous Konyana

📅 **Date:**: 24 July 2025

**Time**: 19:00 SAST

**Zoom Link**: [bit.ly/3VmV3CK](https://bit.ly/3VmV3CK)

**Meetup Link / Register here**: [meetup.com/mzansi-aws](https://meetup.com/mzansi-aws)


🔗 **GitHub:** [your GitHub link]

---

🎯 **Objective**:
Visualize and superimpose real-world, analysed data points on Google Maps using Python.

This session demonstrates the power of geospatial analysis and interactive visual storytelling for AWS Data Engineering workloads.

🚀 **Features**


- Geospatial data analysis with pandas and folium
- Interactive visual maps with real coordinates
- Python data manipulation for mapping overlays
- Contextual population-based visualizations
- Ready for integration into AWS analytics pipelines

##Step 1: Install required packages with version constraints for stability

In [None]:
import subprocess
import sys

def install_packages():
    """Install required packages with version constraints for stability"""
    packages = [
        'boto3>=1.26.0',
        'pandas>=1.3.0',
        'numpy>=1.21.0',
        'matplotlib>=3.3.0',
        'seaborn>=0.11.0',
        'plotly>=5.0.0',
        'scikit-learn>=1.0.0',
        'tqdm',  # For progress bars
        'psutil'  # For system monitoring
    ]

    print("📦 Installing required packages...")
    for package in packages:
        try:
            subprocess.check_call([sys.executable, '-m', 'pip', 'install', package, '-q'])
            print(f"✅ {package.split('>=')[0]}")
        except Exception as e:
            print(f"❌ Failed to install {package}: {e}")

    print("\n🎉 Package installation complete!")

install_packages()


In [None]:

# Import core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import boto3
from botocore.exceptions import ClientError, NoCredentialsError
import json
from datetime import datetime, timedelta
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.preprocessing import StandardScaler
from io import BytesIO
import joblib
import warnings
from tqdm import tqdm
import psutil
import gc
import os

# Configure plotting for Colab
pio.renderers.default = "colab"
plt.rcParams['figure.dpi'] = 100
%matplotlib inline
warnings.filterwarnings('ignore')

print("✅ All packages imported successfully!")
print("🚀 Ready to start our data engineering journey!")

##Step 2: Setup AWS credentials for Google Colab environment


In [None]:
def setup_aws_credentials():
    """Setup AWS credentials for Google Colab environment"""
    print("🔐 Setting up AWS credentials for Google Colab...")

    try:
        # Option 1: Try using Colab secrets (recommended)
        from google.colab import userdata
        os.environ['AWS_ACCESS_KEY_ID'] = userdata.get('AWS_ACCESS_KEY_ID')
        os.environ['AWS_SECRET_ACCESS_KEY'] = userdata.get('AWS_SECRET_ACCESS_KEY')
        print("✅ Using Colab secrets for AWS credentials")
        print("💡 This is the most secure method!")
        return True

    except Exception as e:
        print("ℹ️ Colab secrets not configured, using manual input...")

        # Option 2: Manual input (fallback)
        from getpass import getpass
        print("\n🔒 Please enter your AWS credentials:")
        print("   (These will not be displayed or stored)")

        access_key = getpass('Enter AWS Access Key ID: ')
        secret_key = getpass('Enter AWS Secret Access Key: ')

        if access_key and secret_key:
            os.environ['AWS_ACCESS_KEY_ID'] = access_key
            os.environ['AWS_SECRET_ACCESS_KEY'] = secret_key
            print("✅ AWS credentials configured successfully!")
            return True
        else:
            print("❌ Invalid credentials provided")
            return False

# Setup credentials
if setup_aws_credentials():
    print("\n📚 Learning checkpoint: AWS credentials configured!")
    print("💡 You can now proceed with cloud operations.")
else:
    print("\n⚠️ Please configure AWS credentials before proceeding")

##Step 3: Create the AWS Configuration Class

This class demonstrates best practices for:

    - Cloud service initialization

    - Error handling and user feedback

    - Resource management in constrained environments

In [None]:
class AWSConfig:
    """
    Educational AWS Configuration Manager - Colab Optimized

    This class demonstrates best practices for:
    - Cloud service initialization
    - Error handling and user feedback
    - Resource management in constrained environments
    """

    def __init__(self):
        # 🌍 AWS Region selection - affects latency and compliance
        self.region = 'us-east-1'  # Northern Virginia - often cheapest for learning

        # 📦 Unique bucket name (S3 bucket names are globally unique)
        timestamp = datetime.now().strftime('%Y%m%d%H%M%S')
        process_id = os.getpid()
        self.bucket_name = f'sa-loadshedding-{timestamp}-{process_id}'

        try:
            # 🔗 Initialize S3 client with error handling
            self.s3_client = boto3.client('s3', region_name=self.region)

            # ✅ Test connection by listing buckets
            self.s3_client.list_buckets()
            print(f"✅ AWS S3 connected successfully!")
            print(f"📍 Region: {self.region}")
            print(f"📦 Target bucket: {self.bucket_name}")

        except NoCredentialsError:
            print("❌ AWS credentials not configured!")
            print("🔧 Please run the credentials setup cell first")
            raise

        except ClientError as e:
            if e.response['Error']['Code'] == 'InvalidAccessKeyId':
                 print("❌ AWS connection error: Invalid Access Key ID.")
                 print("💡 Tip: Please ensure your AWS Access Key ID is correct.")
            elif e.response['Error']['Code'] == 'SignatureDoesNotMatch':
                 print("❌ AWS connection error: Signature Does Not Match.")
                 print("💡 Tip: Please ensure your AWS Secret Access Key is correct.")
            else:
                print(f"❌ AWS connection error: {e}")
                print("💡 Tip: Check your internet connection and AWS credentials")
            raise


        except Exception as e:
            print(f"❌ An unexpected AWS connection error occurred: {e}")
            print("💡 Tip: Check your internet connection and AWS credentials")
            raise


    def create_bucket(self):
        """Creates S3 bucket with proper error handling"""
        try:
            # 🔍 Check if bucket already exists
            self.s3_client.head_bucket(Bucket=self.bucket_name)
            print(f"✅ Bucket already exists: {self.bucket_name}")

        except ClientError as e:
            error_code = int(e.response['Error']['Code'])

            if error_code == 404:
                # 🏗️ Bucket doesn't exist, create it
                try:
                    if self.region == 'us-east-1':
                        # us-east-1 doesn't need LocationConstraint
                        self.s3_client.create_bucket(Bucket=self.bucket_name)
                    else:
                        # Other regions need LocationConstraint
                        self.s3_client.create_bucket(
                            Bucket=self.bucket_name,
                            CreateBucketConfiguration={'LocationConstraint': self.region}
                        )

                    print(f"✅ Created new bucket: {self.bucket_name}")
                    print(f"🌍 Location: {self.region}")

                except Exception as create_error:
                    print(f"❌ Failed to create bucket: {create_error}")
                    print("💡 Tip: Bucket names must be globally unique")
                    raise
            else:
                print(f"❌ Bucket access error: {e}")
                raise

# 🚀 Initialize AWS configuration
print("🔧 Setting up AWS infrastructure...")
aws = AWSConfig()
aws.create_bucket()

print("\n📚 Learning checkpoint: AWS S3 is now ready for our data pipeline!")

##Step 4: Optimize DataFrame memory usage for Colab environment



In [None]:
def optimize_memory_usage(df):
    """Optimize DataFrame memory usage for Colab environment"""
    print("🧠 Optimizing memory usage...")

    # Store original memory usage
    original_memory = df.memory_usage(deep=True).sum() / 1024**2

    # Optimize integer columns
    for col in df.select_dtypes(include=['int64']).columns:
        df[col] = pd.to_numeric(df[col], downcast='integer')

    # Optimize float columns
    for col in df.select_dtypes(include=['float64']).columns:
        df[col] = pd.to_numeric(df[col], downcast='float')

    # Calculate memory savings
    new_memory = df.memory_usage(deep=True).sum() / 1024**2
    savings = original_memory - new_memory

    print(f"💾 Memory optimized: {original_memory:.1f}MB → {new_memory:.1f}MB")
    print(f"✅ Saved: {savings:.1f}MB ({savings/original_memory*100:.1f}%)")

    return df

def monitor_system_resources():
    """Monitor system resources in Colab"""
    # Memory usage
    memory = psutil.virtual_memory()
    print(f"💾 Memory: {memory.used/1024**3:.1f}GB / {memory.total/1024**3:.1f}GB ({memory.percent:.1f}%)")

    # Disk usage
    disk = psutil.disk_usage('/')
    print(f"💿 Disk: {disk.used/1024**3:.1f}GB / {disk.total/1024**3:.1f}GB ({disk.used/disk.total*100:.1f}%)")

    if memory.percent > 80:
        print("⚠️ High memory usage detected!")
        print("💡 Consider restarting runtime if performance degrades")

##Step 5: Generate realistic load shedding data for South Africa

**This function demonstrates:**


    - Time series data generation
    - Incorporating domain knowledge into synthetic data
    - Seasonal and daily pattern modeling
    - Statistical distribution selection

**Args:**


        start_date (str): Start date for data generation
        end_date (str): End date for data generation
        freq (str): Frequency of observations ('6H' = every 6 hours)

**Returns:**


        pd.DataFrame: Generated load shedding data

In [None]:
def generate_loadshedding_data(start_date='2018-01-01', end_date='2024-12-31', freq='6H'):
    """
    Generate realistic load shedding data for South Africa

    This function demonstrates:
    - Time series data generation
    - Incorporating domain knowledge into synthetic data
    - Seasonal and daily pattern modeling
    - Statistical distribution selection

    Args:
        start_date (str): Start date for data generation
        end_date (str): End date for data generation
        freq (str): Frequency of observations ('6H' = every 6 hours)

    Returns:
        pd.DataFrame: Generated load shedding data
    """

    print("🔧 Generating realistic load shedding data...")
    print(f"📅 Period: {start_date} to {end_date}")
    print(f"⏰ Frequency: Every {freq}")

    # 🗓️ Create date range
    np.random.seed(42)  # For reproducible results
    dates = pd.date_range(start_date, end_date, freq=freq)

    print(f"📊 Total time points: {len(dates):,}")

    # 🎲 Base probability distribution for load shedding stages
    # Based on South African historical patterns (Stage 0 most common)
    base_probabilities = [0.70, 0.15, 0.08, 0.05, 0.02]  # Stages 0-4

    # 📊 Initialize DataFrame
    data = pd.DataFrame({
        'timestamp': dates,
        'stage': np.random.choice([0, 1, 2, 3, 4], len(dates), p=base_probabilities)
    })

    # 🌡️ Add seasonal effects (Winter = June, July, August in Southern Hemisphere)
    print("❄️ Adding seasonal patterns...")
    data['month'] = data['timestamp'].dt.month
    data['is_winter'] = data['month'].isin([6, 7, 8])

# Increase load shedding probability in winter
    winter_mask = data['is_winter']
    winter_increase = np.random.binomial(1, 0.4, winter_mask.sum())  # 40% chance of stage increase
    data.loc[winter_mask, 'stage'] = np.minimum(
        data.loc[winter_mask, 'stage'] + winter_increase, 4
    )
    # ⏰ Add daily peak hour effects
    print("🌅 Adding peak hour patterns...")
    data['hour'] = data['timestamp'].dt.hour
    # Peak hours: Morning (6-9) and Evening (17-20)
    morning_peak = data['hour'].isin([6, 12, 18])  # 6AM, 12PM, 6PM (6-hour intervals)
    peak_increase = np.random.binomial(1, 0.25, morning_peak.sum())  # 25% chance of increase
    data.loc[morning_peak, 'stage'] = np.minimum(
        data.loc[morning_peak, 'stage'] + peak_increase, 4
    )
    # 📈 Add yearly trend (crisis worsening over time)
    print("📈 Adding temporal trends...")
    data['year'] = data['timestamp'].dt.year

     # Gradual worsening: 2018-2020 (moderate), 2021-2024 (severe)
    for year in data['year'].unique():
        if year >= 2020:  # Crisis escalation period
            year_mask = data['year'] == year
            escalation_factor = min((year - 2018) * 0.1, 0.3)  # Max 30% increase
            year_increase = np.random.binomial(1, escalation_factor, year_mask.sum())
            data.loc[year_mask, 'stage'] = np.minimum(
                data.loc[year_mask, 'stage'] + year_increase, 4
            )

            # 🎯 Add economic factors (simplified)
    print("💼 Adding economic factors...")
    # Higher load shedding during economic stress periods
    economic_stress_periods = [
        ('2020-03-01', '2020-06-30'),  # COVID-19 impact
        ('2022-01-01', '2022-12-31'),  # Energy crisis peak
    ]

    for start, end in economic_stress_periods:
        stress_mask = (data['timestamp'] >= start) & (data['timestamp'] <= end)
        stress_increase = np.random.binomial(1, 0.35, stress_mask.sum())
        data.loc[stress_mask, 'stage'] = np.minimum(
            data.loc[stress_mask, 'stage'] + stress_increase, 4
        )

        # 🧹 Clean up temporary columns
    data = data[['timestamp', 'stage']].copy()

    # 📊 Generate summary statistics
    stage_counts = data['stage'].value_counts().sort_index()
    total_events = len(data)
    active_events = (data['stage'] > 0).sum()

    print("\n📊 Generated Data Summary:")
    print(f"   📅 Time range: {data['timestamp'].min()} to {data['timestamp'].max()}")
    print(f"   📈 Total observations: {total_events:,}")
    print(f"   ⚡ Active load shedding: {active_events:,} ({100*active_events/total_events:.1f}%)")
    print(f"   📊 Stage distribution:")
    for stage, count in stage_counts.items():
        percentage = 100 * count / total_events
        print(f"      Stage {stage}: {count:,} ({percentage:.1f}%)")

    return data

    # 🚀 Generate our dataset
print("🔋 Creating South African Load Shedding Dataset...")
loadshedding_data = generate_loadshedding_data()

# 🧠 Optimize memory usage
loadshedding_data = optimize_memory_usage(loadshedding_data)

print("\n📚 Learning checkpoint: Realistic load shedding data generated!")
print("💭 Notice how we incorporated domain knowledge into our data generation process.")

# 🎯 Progress checkpoint
print("\n" + "="*50)
print("🎯 PROGRESS CHECKPOINT - Data Generation Complete")
print("="*50)
print(f"✅ Dataset size: {loadshedding_data.shape}")
print(f"💾 Memory usage: {loadshedding_data.memory_usage(deep=True).sum() / 1024**2:.1f} MB")
monitor_system_resources()
print("\n🎯 Next: Upload to AWS S3")
print("="*50)



##Step 6: Augment the existing loadshedding_data with geographical information (Province)
- Assuming loadshedding_data DataFrame is already loaded and contains 'timestamp' and 'stage'

In [None]:


# Augment the existing loadshedding_data with geographical information (Province)
# Assuming loadshedding_data DataFrame is already loaded and contains 'timestamp' and 'stage'

# List of South African Provinces
provinces = [
    'Eastern Cape', 'Free State', 'Gauteng', 'KwaZulu-Natal',
    'Limpopo', 'Mpumalanga', 'North West', 'Northern Cape', 'Western Cape'
]

In [None]:
# Assign a random province to each loadshedding event
np.random.seed(42) # for reproducibility
loadshedding_data['province'] = np.random.choice(provinces, len(loadshedding_data))

In [None]:
print("Augmented loadshedding data with geographical information (Province):")
# Display the first few rows with the new 'province' column
display(loadshedding_data.head())

In [None]:
# Display the columns of the DataFrame
print("\nDataFrame columns:")
print(loadshedding_data.columns)

##Step 7: Geocoding (if necessary)

Convert the province names in the `loadshedding_data` DataFrame into geographical coordinates (latitude and longitude) for mapping.


In [None]:
from geopy.geocoders import Nominatim

# Increase timeout for geolocator (can be kept or adjusted, but lookup is more robust)
geolocator = Nominatim(user_agent="geo_loadshedding_app", timeout=10)

def get_province_coordinates(province_name):
    """Gets latitude and longitude for a given province name."""
    try:
        # Geocode the province name in South Africa
        location = geolocator.geocode(province_name + ", South Africa")
        if location:
            return location.latitude, location.longitude
        else:
            print(f"Could not geocode province: {province_name}")
            return np.nan, np.nan
    except Exception as e:
        print(f"Error geocoding {province_name}: {e}")
        return np.nan, np.nan


In [None]:
# --- Start of modified code ---

print("Converting province names to geographical coordinates...")

# Get unique province names
unique_provinces = loadshedding_data['province'].unique()

# Create a dictionary to store coordinates for each province
province_coordinates = {}

print("Geocoding unique provinces...")
for province in unique_provinces:
    lat, lon = get_province_coordinates(province)
    province_coordinates[province] = {'latitude': lat, 'longitude': lon}
    print(f"Geocoded {province}: Lat={lat}, Lon={lon}")

    # Map the coordinates back to the DataFrame
loadshedding_data['latitude'] = loadshedding_data['province'].map(lambda x: province_coordinates[x]['latitude'])
loadshedding_data['longitude'] = loadshedding_data['province'].map(lambda x: province_coordinates[x]['longitude'])

# --- End of modified code ---

In [None]:
# Display the first few rows with the new columns
print("\nDataFrame with Latitude and Longitude:")
display(loadshedding_data.head())

## Step 8: Acquire or generate loadshedding data with geographical information (e.g., location points, areas, or regions).

In [None]:

from datetime import datetime, timedelta

def generate_geo_loadshedding_data(start_date='2023-01-01', end_date='2024-01-01', freq='H', num_locations=100):
    """
    Generates synthetic load shedding data with simulated geographical locations within South Africa.

    Args:
        start_date (str): Start date for data generation.
        end_date (str): End date for data generation.
        freq (str): Frequency of observations (e.g., 'H' for hourly).
        num_locations (int): Number of unique simulated locations.

    Returns:
        pd.DataFrame: Generated loadshedding data with timestamp, stage, latitude, and longitude.
    """
    print("🔧 Generating synthetic load shedding data with geographical information...")

    dates = pd.date_range(start_date, end_date, freq=freq)
    total_time_points = len(dates)

    # Simulate locations within a simplified bounding box for South Africa
    # South Africa approximate bounding box: Latitude (-35, -22), Longitude (17, 33)
    min_lat, max_lat = -35, -22
    min_lon, max_lon = 17, 33

    np.random.seed(42) # for reproducibility

    # Generate random locations
    locations = pd.DataFrame({
        'location_id': range(num_locations),
        'latitude': np.random.uniform(min_lat, max_lat, num_locations),
        'longitude': np.random.uniform(min_lon, max_lon, num_locations)
    })

    # Create a dataframe with all combinations of dates and locations
    from itertools import product
    data = pd.DataFrame(list(product(dates, locations['location_id'])), columns=['timestamp', 'location_id'])

    # Merge with location coordinates
    data = pd.merge(data, locations, on='location_id')

    # Simulate load shedding stages (simplified logic for demonstration)
    # Based on time of day and a general trend
    data['hour'] = data['timestamp'].dt.hour
    data['stage'] = 0 # Start with no loadshedding

    # Increase stage during peak hours (e.g., 6-9 and 17-20)
    peak_hours_mask = data['hour'].isin(range(6, 10)) | data['hour'].isin(range(17, 21))
    data.loc[peak_hours_mask, 'stage'] = np.random.choice([0, 1, 2], size=peak_hours_mask.sum(), p=[0.4, 0.3, 0.3])

    # Further increase stage randomly for some events to simulate variability
    random_increase_mask = np.random.rand(len(data)) < 0.1 # 10% chance of random increase
    data.loc[random_increase_mask, 'stage'] = np.minimum(data.loc[random_increase_mask, 'stage'] + np.random.randint(0, 3, size=random_increase_mask.sum()), 4)

    # Ensure stage is integer
    data['stage'] = data['stage'].astype(int)

    # Select relevant columns
    data = data[['timestamp', 'latitude', 'longitude', 'stage']]

    print(f"\n📊 Generated Data Summary:")
    print(f"   📅 Time range: {data['timestamp'].min()} to {data['timestamp'].max()}")
    print(f"   📈 Total observations: {len(data):,}")
    print(f"   📍 Number of locations: {num_locations}")
    print(f"   ⚡ Active load shedding events: {(data['stage'] > 0).sum():,}")

    return data

# Generate synthetic data with geo-information
geo_loadshedding_data = generate_geo_loadshedding_data()

# Display the first few rows
print("\nSample of generated geo-loadshedding data:")
display(geo_loadshedding_data.head())

# Display info to check data types and non-nulls
print("\nDataFrame Info:")
geo_loadshedding_data.info()

## Step 9: Data Cleaning

- Identify and remove any duplicate data entries.

In [None]:
print("🔍 Identifying and removing duplicate data entries...")

# Check for duplicate rows based on all columns
initial_rows = len(geo_loadshedding_data)
duplicate_rows = geo_loadshedding_data.duplicated().sum()

if duplicate_rows > 0:
    print(f"❗ Found {duplicate_rows:,} duplicate rows.")
    # Remove duplicate rows
    geo_loadshedding_data = geo_loadshedding_data.drop_duplicates()
    print(f"✅ Removed duplicate rows. New number of rows: {len(geo_loadshedding_data):,}")
else:
    print("✅ No duplicate rows found.")

print("\nDataFrame after duplicate removal:")
display(geo_loadshedding_data.head())
print(f"📊 Final dataset shape: {geo_loadshedding_data.shape}")

## Step 10: Map Visualization

- Create an interactive map of South Africa using a library called folium.

In [None]:
import folium

print("Creating a basic interactive map of South Africa...")

# Approximate coordinates for the center of South Africa
south_africa_center = [-28.5, 24.5]

# Create a Folium map centered on South Africa
m = folium.Map(location=south_africa_center, zoom_start=5)

print("✅ Map created. Displaying the map:")

# Display the map
m

## Step 11: Map Visualization

- Superimpose loadshedding data visually onto the map.

In [None]:
print("📍 Superimposing loadshedding data onto the map...")

# Define a color mapping for loadshedding stages
stage_colors = {
    0: 'green',   # No loadshedding
    1: 'yellow',  # Stage 1
    2: 'orange',  # Stage 2
    3: 'red',     # Stage 3
    4: 'darkred'  # Stage 4
}

# Add markers for a sample of the data to the map
# Limiting to the first 1000 rows for performance
sample_data = geo_loadshedding_data.sample(min(1000, len(geo_loadshedding_data)), random_state=42)

for index, row in sample_data.iterrows():
    folium.CircleMarker(
        location=[row['latitude'], row['longitude']],
        radius=5,
        color=stage_colors.get(row['stage'], 'gray'),
        fill=True,
        fill_color=stage_colors.get(row['stage'], 'gray'),
        fill_opacity=0.7,
        tooltip=f"Stage: {row['stage']}<br>Timestamp: {row['timestamp']}"
    ).add_to(m)

print(f"✅ Added {len(sample_data)} markers to the map.")
print("💡 Displaying the map with loadshedding data:")

# Display the map with markers
m

## Step 12: Add interactivity and refine the visualization



In [None]:
print("✨ Adding interactivity and refining map visualization...")

# Create a FeatureGroup for each loadshedding stage
stage_layers = {}
for stage, color in stage_colors.items():
    stage_layers[stage] = folium.FeatureGroup(name=f'Stage {stage}')

# Add markers to their respective FeatureGroups
for index, row in sample_data.iterrows():
    stage = row['stage']
    folium.CircleMarker(
        location=[row['latitude'], row['longitude']],
        radius=5,
        color=stage_colors.get(stage, 'gray'),
        fill=True,
        fill_color=stage_colors.get(stage, 'gray'),
        fill_opacity=0.7,
        tooltip=f"Stage: {stage}<br>Timestamp: {row['timestamp']}"
    ).add_to(stage_layers[stage])

# Add FeatureGroups to the map
for stage, layer in stage_layers.items():
    layer.add_to(m)

# Add a layer control to the map
folium.LayerControl().add_to(m)

print("✅ Interactivity added. Displaying the enhanced map:")
print("💡 You can now toggle different loadshedding stages on/off using the layer control.")

# Display the enhanced map
m

## Step 13: Map Visualization (Heatmap)

- Create a heatmap visualization of loadshedding data.

In [None]:
import folium.plugins

print("🔥 Creating a heatmap visualization of loadshedding data...")

# Ensure data is in the correct format for HeatMap (list of lists or numpy array)
# [latitude, longitude, intensity] - using stage as intensity
heatmap_data = geo_loadshedding_data[['latitude', 'longitude', 'stage']].values.tolist()

# Create a Folium map centered on South Africa (reuse the existing map object 'm' or create a new one if needed)
# For this example, we will reuse the existing map 'm' from previous steps.
# If you run this cell independently, you might need to create a new map:
# south_africa_center = [-28.5, 24.5]
# m_heatmap = folium.Map(location=south_africa_center, zoom_start=5)

# Add the heatmap layer to the map
folium.plugins.HeatMap(heatmap_data).add_to(m)

print("✅ Heatmap layer added to the map.")
print("💡 Displaying the map with heatmap:")

# Display the map with the heatmap layer
m

## ✅ Key Takeaways
- Loading and inspecting geospatial data in pandas
- Using Folium for interactive base maps
- Plotting scalable and clustered markers
- (Optional) Visualizing density via heatmaps
