# 🌍 Milieuschutz Environmental Protection Zones - Database Analysis

## 📚 Project Overview
This notebook analyzes and populates Berlin's **Milieuschutz** (Environmental Protection Zones) data into our collaborative database. Milieuschutz areas are special districts in Berlin designed to preserve the social composition and prevent gentrification.

### 🎯 **Mission Objectives:**
1. **Connect** to the existing collaborative database
2. **Investigate** current schema and data structure  
3. **Analyze** existing neighborhood foundation
4. **Prepare** for Milieuschutz data integration

### 🏗️ **Database Foundation:**
Building on our team's collaborative work:
- **Districts & Neighborhoods**: Foundation tables (already populated)
- **Crime Statistics**: Safety analysis data
- **Hospitals, Schools, Transport**: Infrastructure data
- **Rental Statistics**: Housing market data

---

## 1. 📦 Import Required Libraries (Step 1/5)

### 🎯 **What We'll Do in This Step:**
Import all necessary libraries for database connectivity, spatial data processing, and data analysis.

### 📚 **Key Libraries:**
- **pandas/geopandas**: Data manipulation and spatial analysis
- **sqlalchemy**: Database connectivity and ORM
- **psycopg2**: PostgreSQL adapter for Python
- **datetime**: Timestamp functionality

### 🔧 **What This Step Accomplishes:**
- Load all required dependencies
- Verify library availability
- Prepare for database operations

**Ready to import our data science toolkit?**

In [4]:
import pandas as pd
import geopandas as gpd
import os
from datetime import datetime
from sqlalchemy import create_engine, text, inspect
from sqlalchemy.exc import SQLAlchemyError
import psycopg2

print("✅ Libraries imported successfully!")

✅ Libraries imported successfully!


---

## 1.2 🔌 Database Connection Setup (Step 1/5)

### 🎯 **What We'll Do in This Step:**
Establish secure connection to our collaborative Neon PostgreSQL database with PostGIS support.

### 🔧 **Key Components:**
1. **Connection String Setup** - Secure database URL handling
2. **Database Validation** - Testing connectivity and extensions
3. **PostGIS Verification** - Confirming spatial capabilities

### 🧠 **Why This Step Matters:**
- **Security First**: Proper credential management
- **Validation**: Confirm all systems operational
- **Foundation**: Essential for all subsequent operations

### 🗄️ **Database Details:**
Our collaborative database contains multiple interconnected tables following our ERD design. We'll connect to verify we can integrate our Milieuschutz data seamlessly.

### 🔧 **What This Step Accomplishes:**
- Secure connection establishment
- PostGIS extension verification
- Database readiness confirmation

**Ready to connect to our collaborative data infrastructure? 🚀**

In [5]:
# 🔌 Step 2: Database Connection Setup

print("🔌 DATABASE CONNECTION SETUP")
print("=" * 40)

# Database connection parameters
# 🔒 SECURITY NOTE: In production, use environment variables!
print("📋 Setting up connection parameters...")

# Neon database connection (learning environment)
DATABASE_URL = (
    "postgresql+psycopg2://neondb_owner:npg_CeS9fJg2azZD"
    "@ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech:5432/neondb"
    "?sslmode=require"
)

# For display purposes, parse the URL components
DB_CONFIG = {
    'host': 'ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech',
    'port': '5432',
    'database': 'neondb',
    'username': 'neondb_owner',
    'password': 'npg_CeS9fJg2azZD'
}

print(f"   🖥️  Host: {DB_CONFIG['host']}")
print(f"   🔌 Port: {DB_CONFIG['port']}")
print(f"   🗄️  Database: {DB_CONFIG['database']}")
print(f"   👤 Username: {DB_CONFIG['username']}")
print(f"   🔒 Password: {'*' * len(DB_CONFIG['password'])}")

# Create connection string
connection_string = DATABASE_URL
engine = create_engine(connection_string, echo=False)

print(f"\n🔗 Connection String Format:")
print(f"   postgresql+psycopg2://username:password@host:port/database")

# Test connection (without actually connecting yet)
print(f"\n🧪 TESTING CONNECTION SETUP:")
try:
    # Create engine (this doesn't connect yet, just validates the URL)
    engine = create_engine(connection_string, echo=False)
    print("✅ Connection string format is valid!")
    
    # Test if we can actually connect
    print("🔍 Testing actual database connection...")
    
    with engine.connect() as conn:
        # Test basic connection
        result = conn.execute(text("SELECT version();"))
        version = result.fetchone()[0]
        print(f"✅ Connected successfully!")
        print(f"   📊 PostgreSQL version: {version[:50]}...")
        
        # Check if PostGIS is available
        try:
            result = conn.execute(text("SELECT PostGIS_version();"))
            postgis_version = result.fetchone()[0]
            print(f"✅ PostGIS is available!")
            print(f"   🗺️  PostGIS version: {postgis_version}")
        except Exception as e:
            print(f"⚠️  PostGIS not detected - you may need to enable it")
            print(f"   💡 Run: CREATE EXTENSION IF NOT EXISTS postgis;")
            
except SQLAlchemyError as e:
    print(f"❌ Database connection failed!")
    print(f"   Error type: {type(e).__name__}")
    print(f"   Details: {str(e)[:100]}...")
    print(f"\n💡 TROUBLESHOOTING TIPS:")
    print(f"   1. Check if PostgreSQL is running")
    print(f"   2. Verify host, port, username, password")
    print(f"   3. Ensure database '{DB_CONFIG['database']}' exists")
    print(f"   4. Check firewall/network settings")
    
except Exception as e:
    print(f"❌ Unexpected error: {str(e)[:100]}...")

print(f"💡 TIP: If connection failed, fix the issue before proceeding!")

🔌 DATABASE CONNECTION SETUP
📋 Setting up connection parameters...
   🖥️  Host: ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech
   🔌 Port: 5432
   🗄️  Database: neondb
   👤 Username: neondb_owner
   🔒 Password: ****************

🔗 Connection String Format:
   postgresql+psycopg2://username:password@host:port/database

🧪 TESTING CONNECTION SETUP:
✅ Connection string format is valid!
🔍 Testing actual database connection...
❌ Database connection failed!
   Error type: OperationalError
   Details: (psycopg2.OperationalError) connection to server at "ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aw...

💡 TROUBLESHOOTING TIPS:
   1. Check if PostgreSQL is running
   2. Verify host, port, username, password
   3. Ensure database 'neondb' exists
   4. Check firewall/network settings
💡 TIP: If connection failed, fix the issue before proceeding!
❌ Database connection failed!
   Error type: OperationalError
   Details: (psycopg2.OperationalError) connection to server at "ep-falling-glitt

---

## 2. 🔍 Comprehensive Database Schema Investigation (Step 2/5)

### 🎯 **What We'll Do in This Step:**
Explore and understand our collaborative database structure to ensure seamless integration of Milieuschutz data.

### 🧠 **Why This Step Matters:**
- **ERD Compliance**: Understand existing relationships and constraints
- **Data Integration**: Identify connection points for our new data
- **Quality Assurance**: Verify foreign key requirements and data types

### 🔧 **Investigation Areas:**
1. **Existing Tables**: What tables are already populated?
2. **Schema Structure**: Columns, data types, and constraints
3. **Relationships**: Foreign keys and referential integrity
4. **District Data**: Foundation for our geographic linkages

### 🏗️ **Expected Table Structure:**
Based on our ERD design, we expect tables like:
- `districts` - Administrative districts (our foreign key target!)
- `neighborhoods` - Detailed neighborhood boundaries
- `schools` - Educational facilities by district
- `hospitals` - Healthcare facilities
- `crime_statistics` - Safety data by area
- `transport_stations` - Public transit infrastructure
- `rental_statistics` - Housing market data

### 🎯 **Key Questions We'll Answer:**
- Which tables exist and are populated?
- What districts are available for foreign key relationships?
- Are there any constraints that will affect our data insertion?
- What spatial reference systems are being used?

**Ready to investigate our collaborative database foundation? 🕵️**

In [6]:
# 🔍 Step 1: Schema Existence Check

print("🔍 STEP 1: CHECKING SCHEMA EXISTENCE")
print("=" * 40)

try:
    with engine.connect() as conn:
        # Check if test_berlin_data schema exists
        schema_exists = conn.execute(text("""
            SELECT schema_name 
            FROM information_schema.schemata 
            WHERE schema_name = 'test_berlin_data'
        """)).fetchone()
        
        if schema_exists:
            print("✅ test_berlin_data schema EXISTS!")
        else:
            print("❌ test_berlin_data schema NOT FOUND")
            print("💡 We may need to create it or use a different schema")

except Exception as e:
    print(f"❌ Error checking schema existence: {e}")
    print("� This might indicate connection issues or permission problems")

print("✅ Schema existence check complete!")

🔍 STEP 1: CHECKING SCHEMA EXISTENCE
❌ Error checking schema existence: (psycopg2.OperationalError) connection to server at "ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech" (3.23.186.13), port 5432 failed: ERROR:  Your project has exceeded the data transfer quota. Upgrade your plan to increase limits.

(Background on this error at: https://sqlalche.me/e/20/e3q8)
� This might indicate connection issues or permission problems
✅ Schema existence check complete!
❌ Error checking schema existence: (psycopg2.OperationalError) connection to server at "ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech" (3.23.186.13), port 5432 failed: ERROR:  Your project has exceeded the data transfer quota. Upgrade your plan to increase limits.

(Background on this error at: https://sqlalche.me/e/20/e3q8)
� This might indicate connection issues or permission problems
✅ Schema existence check complete!


In [23]:
# 🔍 Step 2: List All Tables in Schema

print("🔍 STEP 2: TABLES IN test_berlin_data SCHEMA")
print("=" * 40)

try:
    with engine.connect() as conn:
        # List all tables in test_berlin_data schema
        tables_result = conn.execute(text("""
            SELECT table_name, table_type
            FROM information_schema.tables 
            WHERE table_schema = 'test_berlin_data'
            ORDER BY table_name
        """))
        
        tables = tables_result.fetchall()
        
        if tables:
            print(f"Found {len(tables)} tables in test_berlin_data schema:")
            for table in tables:
                print(f"   📋 {table[0]} ({table[1]})")
        else:
            print("❌ No tables found in test_berlin_data schema")

except Exception as e:
    print(f"❌ Error listing tables: {e}")

print("✅ Table listing complete!")

🔍 STEP 2: TABLES IN test_berlin_data SCHEMA
Found 17 tables in test_berlin_data schema:
   📋 crime_statistics (BASE TABLE)
   📋 districts (BASE TABLE)
   📋 green_spaces (BASE TABLE)
   📋 hospitals (BASE TABLE)
   📋 land_prices (BASE TABLE)
   📋 long_term_rentals (BASE TABLE)
   📋 milieuschutz_protection_zones (BASE TABLE)
   📋 neighborhood (BASE TABLE)
   📋 neighborhood_pop_stat (BASE TABLE)
   📋 neighborhoods (BASE TABLE)
   📋 playgrounds (BASE TABLE)
   📋 regional_statistics (BASE TABLE)
   📋 rent_stats_per_neighborhood (BASE TABLE)
   📋 rent_stats_per_street (BASE TABLE)
   📋 rent_stats_per_street_kai (BASE TABLE)
   📋 short_time_listings (BASE TABLE)
   📋 ubahn (BASE TABLE)
✅ Table listing complete!


In [7]:
# 🔍 Step 3: Neighborhoods Table Analysis

print("🔍 STEP 3: NEIGHBORHOODS TABLE ANALYSIS")
print("=" * 40)

try:
    with engine.connect() as conn:
        # Check if neighborhoods table exists and its structure
        neighborhoods_cols = conn.execute(text("""
            SELECT column_name, data_type, is_nullable, column_default
            FROM information_schema.columns 
            WHERE table_schema = 'test_berlin_data' AND table_name = 'neighborhoods'
            ORDER BY ordinal_position
        """)).fetchall()
        
        if neighborhoods_cols:
            print("✅ neighborhoods table found! Structure:")
            for col in neighborhoods_cols:
                nullable = "NULL" if col[2] == "YES" else "NOT NULL"
                default = f" DEFAULT {col[3]}" if col[3] else ""
                print(f"      • {col[0]}: {col[1]} {nullable}{default}")
            
            # Check data count
            row_count = conn.execute(text("""
                SELECT COUNT(*) FROM test_berlin_data.neighborhoods
            """)).scalar()
            print(f"\n   📊 Records in neighborhoods table: {row_count}")
            
            # Show sample data
            if row_count > 0:
                sample_data = conn.execute(text("""
                    SELECT * FROM test_berlin_data.neighborhoods LIMIT 3
                """)).fetchall()
                print("   📋 Sample data:")
                for row in sample_data:
                    print(f"      {dict(row._mapping)}")
                    
        else:
            print("❌ neighborhoods table not found in test_berlin_data schema")
            
except Exception as e:
    print(f"⚠️ Error checking neighborhoods table: {str(e)[:60]}...")

print("✅ Neighborhoods table analysis complete!")

🔍 STEP 3: NEIGHBORHOODS TABLE ANALYSIS
⚠️ Error checking neighborhoods table: (psycopg2.OperationalError) connection to server at "ep-fall...
✅ Neighborhoods table analysis complete!
⚠️ Error checking neighborhoods table: (psycopg2.OperationalError) connection to server at "ep-fall...
✅ Neighborhoods table analysis complete!


In [6]:
# 🔍 Step 4: Districts Table Analysis

print("🔍 STEP 4: DISTRICTS TABLE ANALYSIS")
print("=" * 40)

try:
    with engine.connect() as conn:
        # Check if districts table exists and its structure
        districts_cols = conn.execute(text("""
            SELECT column_name, data_type, is_nullable, column_default
            FROM information_schema.columns 
            WHERE table_schema = 'test_berlin_data' AND table_name = 'districts'
            ORDER BY ordinal_position
        """)).fetchall()
        
        if districts_cols:
            print("✅ districts table found! Structure:")
            for col in districts_cols:
                nullable = "NULL" if col[2] == "YES" else "NOT NULL"
                default = f" DEFAULT {col[3]}" if col[3] else ""
                print(f"      • {col[0]}: {col[1]} {nullable}{default}")
            
            # Check data count
            row_count = conn.execute(text("""
                SELECT COUNT(*) FROM test_berlin_data.districts
            """)).scalar()
            print(f"\n   📊 Records in districts table: {row_count}")
            
            # Show sample data
            if row_count > 0:
                sample_data = conn.execute(text("""
                    SELECT * FROM test_berlin_data.districts LIMIT 3
                """)).fetchall()
                print("   📋 Sample data:")
                for row in sample_data:
                    print(f"      {dict(row._mapping)}")
                    
        else:
            print("❌ districts table not found in test_berlin_data schema")
        
        # unique districts
        districts_unique = conn.execute(text("""
            SELECT DISTINCT district FROM test_berlin_data.districts
        """)).fetchall()
        print("\n🔍 Unique districts in test_berlin_data:")
        for d in districts_unique:
            print(f"   • {d[0]}")
            
except Exception as e:
    print(f"⚠️ Error checking districts table: {str(e)[:60]}...")

print("✅ Districts table analysis complete!")

🔍 STEP 4: DISTRICTS TABLE ANALYSIS
✅ districts table found! Structure:
      • district: character varying NOT NULL
      • geometry: USER-DEFINED NOT NULL
      • geometry_str: text NULL

   📊 Records in districts table: 12
   📋 Sample data:
      {'district': 'Reinickendorf', 'geometry': '0106000020E610000001000000010300000001000000840900008B3CBC9938A42A409DEDEA6534504A40F5DF8C0612A42A40F6ACE6E534504A4056D4BE83EBA32A4053CCB56535504A40BA599AE6C4A32A40CA4BDFE535504A409823A8C6A1A32A401A88565A36504A40328EC49F83A32A405E1B66AB37504A408F9EC88777A32A409FD4F62E38504A40B2E4FEE065A32A401EB3A4F238504A401E1C1F0D5EA32A402891824939504A40A3DA1BDB5AA32A40C219EC6C39504A4002B445FC44A32A40B418605F3A504A40AFAE536344A32A40B288AC6A3A504A40057BFD3400A32A40CFB9CB663F504A40CDA4250FFAA22A4064B66ED93F504A4062869421F7A22A407B6D041040504A40C7EA8CD6E5A22A40C1EC745441504A4080811A5AD7A22A40E4C6173D42504A4021F4A2CAD5A22A40EC8F125642504A40C6B28210B3A22A40C0E2678344504A4060CA419EA6A22A40DA54FD4D45504A40332BB7F69DA22A40

In [8]:
# 🔍 Step 5: Foreign Key Relationships Analysis

print("🔍 STEP 5: FOREIGN KEY RELATIONSHIPS")
print("=" * 40)

try:
    with engine.connect() as conn:
        # Check foreign key relationships
        fk_result = conn.execute(text("""
            SELECT 
                tc.table_name as child_table,
                kcu.column_name as child_column,
                ccu.table_name AS parent_table,
                ccu.column_name AS parent_column,
                rc.delete_rule,
                rc.update_rule
            FROM information_schema.table_constraints AS tc 
            JOIN information_schema.key_column_usage AS kcu
                ON tc.constraint_name = kcu.constraint_name
                AND tc.table_schema = kcu.table_schema
            JOIN information_schema.constraint_column_usage AS ccu
                ON ccu.constraint_name = tc.constraint_name
                AND ccu.table_schema = tc.table_schema
            JOIN information_schema.referential_constraints AS rc
                ON tc.constraint_name = rc.constraint_name
                AND tc.table_schema = rc.constraint_schema
            WHERE tc.constraint_type = 'FOREIGN KEY'
                AND tc.table_schema = 'test_berlin_data'
            ORDER BY tc.table_name
        """))
        
        foreign_keys = fk_result.fetchall()
        
        if foreign_keys:
            print(f"Found {len(foreign_keys)} foreign key relationships:")
            for fk in foreign_keys:
                print(f"   🔗 {fk[0]}.{fk[1]} → {fk[2]}.{fk[3]} (DEL: {fk[4]}, UPD: {fk[5]})")
        else:
            print("❌ No foreign key relationships found")

except Exception as e:
    print(f"❌ Error checking foreign keys: {e}")

print("✅ Foreign key analysis complete!")

🔍 STEP 5: FOREIGN KEY RELATIONSHIPS
❌ Error checking foreign keys: (psycopg2.OperationalError) connection to server at "ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech" (3.143.47.40), port 5432 failed: ERROR:  Your project has exceeded the data transfer quota. Upgrade your plan to increase limits.

(Background on this error at: https://sqlalche.me/e/20/e3q8)
✅ Foreign key analysis complete!
❌ Error checking foreign keys: (psycopg2.OperationalError) connection to server at "ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech" (3.143.47.40), port 5432 failed: ERROR:  Your project has exceeded the data transfer quota. Upgrade your plan to increase limits.

(Background on this error at: https://sqlalche.me/e/20/e3q8)
✅ Foreign key analysis complete!


In [9]:
# 🔍 Step 6: Summary and Recommendations

print("🔍 STEP 6: SUMMARY & RECOMMENDATIONS")
print("=" * 40)

try:
    with engine.connect() as conn:
        # Get final status for recommendations
        schema_exists = conn.execute(text("""
            SELECT schema_name 
            FROM information_schema.schemata 
            WHERE schema_name = 'test_berlin_data'
        """)).fetchone()
        
        tables_count = conn.execute(text("""
            SELECT COUNT(*) 
            FROM information_schema.tables 
            WHERE table_schema = 'test_berlin_data'
        """)).scalar()
        
        print("💡 ANALYSIS SUMMARY:")
        print(f"   🏗️  Schema Exists: {'✅ YES' if schema_exists else '❌ NO'}")
        print(f"   📊 Total Tables: {tables_count}")
        
        print("\n💡 RECOMMENDATIONS:")
        print("-" * 40)
        
        if schema_exists and tables_count > 0:
            print("✅ SCHEMA EXISTS - We should work with existing structure!")
            print("📋 Options:")
            print("   A) Use existing neighborhoods & districts tables for spatial relationships")
            print("   B) Add data to existing tables if they're empty") 
            print("   C) Coordinate with team about existing structure")
            print("\n🎯 NEXT STEP: Check if existing geographic data matches your cleaned data")
        else:
            print("⚠️ Schema or tables missing - may need to create them")
            print("💡 Consider creating test_berlin_data schema if it doesn't exist")

except Exception as e:
    print(f"❌ Error generating recommendations: {e}")

print(f"\n🔍 COMPREHENSIVE SCHEMA INVESTIGATION COMPLETE!")
print(f"💬 Ready for next phase of Milieuschutz development!")

🔍 STEP 6: SUMMARY & RECOMMENDATIONS
❌ Error generating recommendations: (psycopg2.OperationalError) connection to server at "ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech" (3.143.47.40), port 5432 failed: ERROR:  Your project has exceeded the data transfer quota. Upgrade your plan to increase limits.

(Background on this error at: https://sqlalche.me/e/20/e3q8)

🔍 COMPREHENSIVE SCHEMA INVESTIGATION COMPLETE!
💬 Ready for next phase of Milieuschutz development!
❌ Error generating recommendations: (psycopg2.OperationalError) connection to server at "ep-falling-glitter-a5m0j5gk-pooler.us-east-2.aws.neon.tech" (3.143.47.40), port 5432 failed: ERROR:  Your project has exceeded the data transfer quota. Upgrade your plan to increase limits.

(Background on this error at: https://sqlalche.me/e/20/e3q8)

🔍 COMPREHENSIVE SCHEMA INVESTIGATION COMPLETE!
💬 Ready for next phase of Milieuschutz development!


---

## 3. 📂 Load Clean Milieuschutz Data (Step 3/5)

### 🎯 **What We'll Do in This Step:**
Load our pre-processed, clean Milieuschutz data that's ready for database integration.

### 📁 **Data Sources:**
- **EM Zones**: Residential protection zones (prevents displacement)
- **ES Zones**: Urban character preservation zones (maintains neighborhood vibe)

### 🧹 **What Makes Our Data "Clean":**
- **Standardized columns**: All German names converted to English
- **Proper data types**: Dates, numerics, and categories optimized
- **Spatial validation**: All geometries verified and valid
- **Database-ready formats**: Perfect for PostgreSQL + PostGIS

### 🔧 **Key Tasks:**
- Load both EM and ES zone datasets
- Verify data structure and quality
- Prepare for database insertion

### 🧠 **Why This Step Matters:**
Understanding our data before insertion prevents errors and ensures smooth database population.

**Ready to load our clean, analysis-ready Milieuschutz data? 📊**

In [10]:
# 📂 Load Clean Milieuschutz Data

print("📂 LOADING CLEAN MILIEUSCHUTZ DATA")
print("=" * 40)

# Load the clean GeoJSON files
gdf_em = gpd.read_file("../sources/milieuschutz_residential_protection_zones_em_clean.geojson")
gdf_es = gpd.read_file("../sources/milieuschutz_urban_character_preservation_zones_es_clean.geojson")

# Display summary
print(f"✅ EM Zones (Residential Protection): {len(gdf_em)} records")
print(f"✅ ES Zones (Urban Character): {len(gdf_es)} records")
print(f"📈 Total Protection Zones: {len(gdf_em) + len(gdf_es)}")

# Show coordinate systems
print(f"\n🗺️ Coordinate Systems:")
print(f"   EM CRS: {gdf_em.crs}")
print(f"   ES CRS: {gdf_es.crs}")

# Show column structure
print(f"\n📋 Columns: {list(gdf_em.columns)}")

print(f"\n✅ Clean data loaded and ready for database insertion!")

📂 LOADING CLEAN MILIEUSCHUTZ DATA
✅ EM Zones (Residential Protection): 81 records
✅ ES Zones (Urban Character): 94 records
📈 Total Protection Zones: 175

🗺️ Coordinate Systems:
   EM CRS: EPSG:25833
   ES CRS: EPSG:25833

📋 Columns: ['protection_zone_id', 'protection_zone_key', 'district', 'district_id', 'protection_zone_name', 'date_announced', 'date_effective', 'amendment_announced', 'amendment_effective', 'area_ha', 'geometry']

✅ Clean data loaded and ready for database insertion!
✅ EM Zones (Residential Protection): 81 records
✅ ES Zones (Urban Character): 94 records
📈 Total Protection Zones: 175

🗺️ Coordinate Systems:
   EM CRS: EPSG:25833
   ES CRS: EPSG:25833

📋 Columns: ['protection_zone_id', 'protection_zone_key', 'district', 'district_id', 'protection_zone_name', 'date_announced', 'date_effective', 'amendment_announced', 'amendment_effective', 'area_ha', 'geometry']

✅ Clean data loaded and ready for database insertion!


---

## 3.2 🔗 Combine Our Two Datasets (Step 3/5)

### 🎯 **What We'll Do in This Step:**
Merge EM and ES zone datasets into a unified dataset ready for database insertion.

### 🔧 **Combination Strategy:**
1. **Add zone_type column**: Distinguish between EM and ES zones
2. **Concatenate datasets**: Stack both datasets vertically  
3. **Validate structure**: Ensure consistent schema across both datasets
4. **Final preparation**: Ready for database table creation

### 🧠 **Why Combine Instead of Separate Tables:**
- **Simplified queries**: One table for all Milieuschutz zones
- **Consistent structure**: Both zone types share identical attributes
- **ERD compliance**: Matches our collaborative database design
- **Analysis efficiency**: Easier comparative analysis

### 🎯 **Expected Output:**
- **Single combined dataset** with all 175 zones (81 EM + 94 ES)
- **zone_type column** to differentiate protection types
- **Consistent data structure** ready for PostgreSQL insertion
- **Spatial data preserved** in proper format

**Ready to unify our Milieuschutz protection zones? 🔗**

In [11]:
# 🔍 Quick EDA - Simple Data Check

print("🔍 QUICK DATA OVERVIEW")
print("=" * 40)

print("📋 EM ZONES - Column Names:")
print(f"   {list(gdf_em.columns)}")

print(f"\n📋 ES ZONES - Column Names:")
print(f"   {list(gdf_es.columns)}")

print(f"\n📊 EM ZONES - First Few Records:")
print(gdf_em.head(2))

print(f"\n📊 ES ZONES - First Few Records:")
print(gdf_es.head(2))

print(f"\n📏 Data Shapes:")
print(f"   EM: {gdf_em.shape}")
print(f"   ES: {gdf_es.shape}")

print(f"\n✅ Quick overview complete!")

🔍 QUICK DATA OVERVIEW
📋 EM ZONES - Column Names:
   ['protection_zone_id', 'protection_zone_key', 'district', 'district_id', 'protection_zone_name', 'date_announced', 'date_effective', 'amendment_announced', 'amendment_effective', 'area_ha', 'geometry']

📋 ES ZONES - Column Names:
   ['protection_zone_id', 'protection_zone_key', 'district', 'district_id', 'protection_zone_name', 'date_announced', 'date_effective', 'amendment_announced', 'amendment_effective', 'area_ha', 'geometry']

📊 EM ZONES - First Few Records:
    protection_zone_id protection_zone_key district district_id  \
0  erhaltgeb_em.EM0105              EM0105    Mitte          01   
1  erhaltgeb_em.EM0106              EM0106    Mitte          01   

  protection_zone_name date_announced date_effective amendment_announced  \
0           Sparrplatz     2016-05-24     2016-05-25                 NaT   
1         Leopoldplatz     2016-05-24     2016-05-25                 NaT   

  amendment_effective  area_ha  \
0              

---

## 4. 🏗️ Create Database Table Following ERD Schema (Step 4/5)

### 🎯 **What We'll Do in This Step:**
Create the `milieuschutz_protection_zones` table in our collaborative database following our ERD design specifications.

### 🏗️ **Table Design Specifications:**
- **ERD Compliance**: Follows our collaborative database schema
- **Foreign Key Constraints**: Links to existing `districts` table
- **Spatial Support**: PostGIS geometry column for GIS operations
- **Data Integrity**: Proper constraints and validation rules

### 🔧 **Key Components:**
1. **Schema Definition**: Column names, types, and constraints
2. **Spatial Column**: PostGIS geometry with SRID 25833  
3. **Foreign Key Setup**: Referential integrity with districts table
4. **Index Creation**: Spatial indexing for query performance

### 📋 **Table Structure:**
```sql
CREATE TABLE test_berlin_data.milieuschutz_protection_zones (
    id VARCHAR(50) PRIMARY KEY,
    protection_zone_key VARCHAR(50),
    district VARCHAR(100) REFERENCES districts(district),
    protection_zone_name VARCHAR(200),
    zone_type VARCHAR(2) CHECK (zone_type IN ('EM', 'ES')),
    date_announced DATE,
    date_effective DATE,
    amendment_announced DATE,
    amendment_effective DATE,
    area_ha DECIMAL(10,2),
    geometry GEOMETRY(MULTIPOLYGON, 25833)
);
```

### 🧠 **Why This Step Matters:**
- **Data Integrity**: Proper constraints prevent bad data
- **Performance**: Spatial indexing enables fast GIS queries
- **Collaboration**: ERD compliance ensures team compatibility
- **Scalability**: Proper design supports future enhancements

*🖖 "The logical structure of a well-designed database table reflects both current needs and future possibilities. Most fascinating - the intersection of spatial geometry and relational integrity." - Spock*

**Ready to build our ERD-compliant Milieuschutz table? 🏗️**

In [11]:
# 🔄 Step 5: Combine Our Two Datasets

print("🔄 COMBINING OUR DATASETS")
print("=" * 30)

# Step 1: Make copies and add labels
print("1️⃣ Making copies and adding labels...")
em_data = gdf_em.copy()
es_data = gdf_es.copy()

# Add a column to identify which type each zone is
em_data['zone_type'] = 'EM'
es_data['zone_type'] = 'ES'

print(f"   ✅ EM data: {len(em_data)} zones labeled")
print(f"   ✅ ES data: {len(es_data)} zones labeled")

# Step 2: Combine them together
print("\n2️⃣ Combining datasets...")
combined_data = pd.concat([em_data, es_data], ignore_index=True)

print(f"   ✅ Combined! Total zones: {len(combined_data)}")

# Step 3: Check our work
print("\n3️⃣ Checking our combined data...")
print(f"   � Total records: {len(combined_data)}")
print(f"   📋 Total columns: {len(combined_data.columns)}")

# Count how many of each type
zone_counts = combined_data['zone_type'].value_counts()
print(f"\n   📈 Zone counts:")
print(f"      • EM zones: {zone_counts['EM']}")
print(f"      • ES zones: {zone_counts['ES']}")

# Show a quick preview
print(f"\n   � Quick preview:")
preview = combined_data[['zone_type', 'district', 'protection_zone_name']].head(3)
print(preview)

print(f"\n✅ Success! Our data is now combined and ready!")

🔄 COMBINING OUR DATASETS
1️⃣ Making copies and adding labels...
   ✅ EM data: 81 zones labeled
   ✅ ES data: 94 zones labeled

2️⃣ Combining datasets...
   ✅ Combined! Total zones: 175

3️⃣ Checking our combined data...
   � Total records: 175
   📋 Total columns: 11

   📈 Zone counts:
      • EM zones: 81
      • ES zones: 94

   � Quick preview:
  zone_type district protection_zone_name
0        EM    Mitte           Sparrplatz
1        EM    Mitte         Leopoldplatz
2        EM    Mitte           Waldstraße

✅ Success! Our data is now combined and ready!


## 3.3 💾 Quick Export Our Combined Data (Step 3/5)

### 🎯 **What we're doing:**
Save our combined dataset to the `sources` folder for backup and sharing!

### 📁 **Where we're saving:**
- **Location**: `../sources/` folder (same as our input files)
- **GeoJSON**: Complete spatial data
- **CSV with WKT**: Geometry as Well-Known Text for easy use

### 🤔 **Why WKT format?**
- **WKT** = Well-Known Text (standard geometry format)
- **Easy to read**: Can see coordinates as text
- **Database ready**: Perfect for importing to databases

**Ready to save our work?**

In [12]:
# 💾 Step 5.5: Quick Export Our Combined Data

print("💾 EXPORTING TO SOURCES FOLDER")
print("=" * 30)

# Step 1: Set up file paths
print("1️⃣ Setting up file paths...")
sources_folder = "../sources"
geojson_file = f"{sources_folder}/milieuschutz_combined.geojson"
csv_file = f"{sources_folder}/milieuschutz_combined.csv"

print(f"   📁 Target folder: {sources_folder}")
print(f"   🗺️ GeoJSON file: milieuschutz_combined.geojson")
print(f"   📊 CSV file: milieuschutz_combined.csv")

# Step 2: Export as GeoJSON
print("\n2️⃣ Saving GeoJSON...")
combined_data.to_file(geojson_file, driver='GeoJSON')
print(f"   ✅ Saved: {geojson_file}")

# Step 3: Export CSV with WKT geometry
print("\n3️⃣ Saving CSV with WKT geometry...")
# Create copy for CSV export
csv_data = combined_data.copy()
# Convert geometry to WKT (Well-Known Text)
csv_data['geometry_wkt'] = csv_data['geometry'].apply(lambda x: x.wkt)
# Remove the original geometry column for CSV
csv_data = csv_data.drop('geometry', axis=1)
# Save to CSV
csv_data.to_csv(csv_file, index=False)
print(f"   ✅ Saved: {csv_file}")
print(f"   📋 Geometry saved as WKT in 'geometry_wkt' column")

# Step 4: Quick summary
print("\n4️⃣ Export summary...")
print(f"   📊 Records exported: {len(combined_data)}")
print(f"   📁 Files created: 2")
print(f"   📍 Location: {sources_folder}/")

print(f"\n✅ Export complete! Files ready in sources folder!")

💾 EXPORTING TO SOURCES FOLDER
1️⃣ Setting up file paths...
   📁 Target folder: ../sources
   🗺️ GeoJSON file: milieuschutz_combined.geojson
   📊 CSV file: milieuschutz_combined.csv

2️⃣ Saving GeoJSON...


NameError: name 'combined_data' is not defined

## 4.1 🗺️ Enable PostGIS Extension (Step 4/5)

### 🎯 **What we're doing:**
Enable PostGIS extension for spatial data support in our database!

### 🔧 **What is PostGIS?**
- **Spatial Extension**: Adds geography and geometry support to PostgreSQL
- **Required for**: Storing polygons, points, lines, and spatial queries
- **Essential**: Without PostGIS, we can't store our protection zone shapes

### 🧠 **Why we need this:**
- **Store geometry**: Our protection zones are spatial polygons
- **Spatial queries**: Find zones by location, intersections, etc.
- **Performance**: Spatial indexes for fast location searches
- **Standards**: Industry standard for spatial databases

### ⚠️ **Important Note:**
This step must be completed before creating tables with geometry columns!

**Ready to enable spatial superpowers? 🌍**

In [13]:
# 🗺️ Step 5.6: Enable PostGIS Extension

print("🗺️ ENABLING POSTGIS EXTENSION")
print("=" * 35)

# Step 1: Check current PostGIS status
print("1️⃣ Checking current PostGIS status...")
try:
    with engine.connect() as conn:
        # Try to get PostGIS version
        try:
            result = conn.execute(text("SELECT PostGIS_version();"))
            postgis_version = result.fetchone()[0]
            print(f"   ✅ PostGIS is already enabled!")
            print(f"   🗺️ PostGIS version: {postgis_version}")
            postgis_enabled = True
        except Exception as e:
            print(f"   ⚠️ PostGIS not detected - needs to be enabled")
            print(f"   💡 We'll enable it now...")
            postgis_enabled = False

except Exception as e:
    print(f"   ❌ Error checking PostGIS status: {e}")
    postgis_enabled = False

# Step 2: Enable PostGIS if needed
if not postgis_enabled:
    print("\n2️⃣ Enabling PostGIS extension...")
    try:
        with engine.connect() as conn:
            # Enable PostGIS extension
            conn.execute(text("CREATE EXTENSION IF NOT EXISTS postgis;"))
            conn.commit()
            print(f"   ✅ PostGIS extension enabled!")
            
            # Verify it's working
            result = conn.execute(text("SELECT PostGIS_version();"))
            postgis_version = result.fetchone()[0]
            print(f"   🗺️ PostGIS version: {postgis_version}")
            
    except Exception as e:
        print(f"   ❌ Error enabling PostGIS: {e}")
        print(f"   💡 This might be a permissions issue")
        print(f"   💡 Contact your database administrator")
else:
    print("\n2️⃣ PostGIS already enabled - skipping!")

# Step 3: Test spatial functions
print("\n3️⃣ Testing spatial functions...")
try:
    with engine.connect() as conn:
        # Test basic spatial function
        test_query = "SELECT ST_GeomFromText('POINT(13.404954 52.520008)', 4326) as test_point;"
        result = conn.execute(text(test_query))
        test_result = result.fetchone()
        
        if test_result:
            print(f"   ✅ Spatial functions working!")
            print(f"   🎯 Test point created successfully")
        else:
            print(f"   ❌ Spatial functions not working")
            
except Exception as e:
    print(f"   ⚠️ Error testing spatial functions: {e}")

# Step 4: Check available spatial reference systems
print("\n4️⃣ Checking spatial reference systems...")
try:
    with engine.connect() as conn:
        # Check if EPSG:25833 (Berlin coordinate system) is available
        epsg_check = conn.execute(text("""
            SELECT auth_name, auth_srid, srtext 
            FROM spatial_ref_sys 
            WHERE auth_srid = 25833 
            LIMIT 1;
        """)).fetchone()
        
        if epsg_check:
            print(f"   ✅ EPSG:25833 (Berlin CRS) available!")
            print(f"   🗺️ Authority: {epsg_check[0]} SRID: {epsg_check[1]}")
        else:
            print(f"   ⚠️ EPSG:25833 not found - might need to be added")

except Exception as e:
    print(f"   ⚠️ Error checking spatial reference systems: {e}")

print(f"\n✅ PostGIS setup complete! Ready for spatial tables!")

🗺️ ENABLING POSTGIS EXTENSION
1️⃣ Checking current PostGIS status...
   ✅ PostGIS is already enabled!
   🗺️ PostGIS version: 3.5 USE_GEOS=1 USE_PROJ=1 USE_STATS=1

2️⃣ PostGIS already enabled - skipping!

3️⃣ Testing spatial functions...
   ✅ Spatial functions working!
   🎯 Test point created successfully

4️⃣ Checking spatial reference systems...
   ✅ EPSG:25833 (Berlin CRS) available!
   🗺️ Authority: EPSG SRID: 25833

✅ PostGIS setup complete! Ready for spatial tables!


## 4.2 🏗️ Create Database Table Following ERD Schema (Step 4/5)

### 🎯 **What we're doing:**
Create a properly structured database table that follows your team's ERD (Entity Relationship Diagram) patterns and foreign key relationships!

### 📊 **ERD Analysis - Key Findings:**

#### **🔑 Primary Reference Table:**
- **`districts`**: Master table with PRIMARY KEY `district` (VARCHAR(100), NN, UQ)
- **Pattern**: All other tables reference `districts.district` as FOREIGN KEY

#### **🔗 Foreign Key Pattern in Your Database:**
Based on your ERD, **ALL major tables** use `district` as Foreign Key:
- `regional_statistics.district` → `districts.district`
- `crime_statistics_table.neighborhood` → `districts.district` 
- `hospitals.district` → `districts.district`
- `rent_stats_per_neighborhood.district` → `districts.district`
- `playground_area.district` → `districts.district`
- And many more...

#### **📋 Our Milieuschutz Table Design (ERD-Compliant):**

**Table Name**: `milieuschutz_protection_zones`

**🔑 Constraints & Restrictions Following ERD Pattern:**

1. **Primary Key**: `protection_zone_key` (VARCHAR(20), NN, UQ)
2. **Foreign Key**: `district` → `districts.district` (VARCHAR(100), NN)
3. **Zone Type Validation**: `zone_type` CHECK constraint ("EM" or "ES")
4. **Business Rules**: Area validation, date constraints
5. **Spatial Geometry**: PostGIS POLYGON with EPSG:25833
6. **Performance Indexes**: Spatial (GIST), FK (district), zone_type, area

#### **🎯 ERD Integration Benefits:**
- **Referential Integrity**: Ensures all districts exist in master table
- **Consistent Joins**: Same pattern as other team tables
- **Data Quality**: Prevents orphaned records
- **Team Collaboration**: Follows established database architecture

### 🧠 **Why ERD Compliance Matters:**
- **Team Standards**: Matches existing table patterns
- **Data Integrity**: Foreign key prevents invalid districts
- **Query Performance**: Consistent join patterns across all tables
- **Future Analysis**: Easy cross-table analysis with other datasets

**Ready to create an ERD-compliant Milieuschutz table? 🏗️**

In [14]:
# 🏗️ Step 6: Create ERD-Compliant Database Table

print("🏗️ CREATING ERD-COMPLIANT MILIEUSCHUTZ TABLE")
print("=" * 50)

# Step 1: Verify districts table exists (our FK target)
print("1️⃣ Verifying districts table (FK target)...")
try:
    with engine.connect() as conn:
        # Check districts table exists
        districts_check = conn.execute(text("""
            SELECT table_name 
            FROM information_schema.tables 
            WHERE table_schema = 'test_berlin_data' 
            AND table_name = 'districts'
        """)).fetchone()
        
        if districts_check:
            print("   ✅ districts table found - FK target exists!")
            
            # Check unique districts in our data vs database
            db_districts = conn.execute(text("""
                SELECT DISTINCT district FROM test_berlin_data.districts
                ORDER BY district
            """)).fetchall()
            
            # Get districts from our data
            our_districts = combined_data['district'].unique()
            
            print(f"   📊 Database districts: {len(db_districts)}")
            print(f"   📊 Our data districts: {len(our_districts)}")
            
            # Check for mismatches
            db_district_names = [d[0] for d in db_districts]
            missing_districts = [d for d in our_districts if d not in db_district_names]
            
            if missing_districts:
                print(f"   ⚠️ Districts in our data but NOT in database:")
                for district in missing_districts:
                    print(f"      • {district}")
                print(f"   💡 These may cause FK constraint violations!")
            else:
                print(f"   ✅ All our districts exist in database!")
                
        else:
            print("   ❌ districts table NOT found!")
            print("   💡 Cannot create FK constraint without target table")
            
except Exception as e:
    print(f"   ❌ Error checking districts table: {e}")

# Step 2: Define ERD-compliant table structure
print("\n2️⃣ Defining ERD-compliant table structure...")

# Table creation SQL following ERD patterns
create_table_sql = """
CREATE TABLE IF NOT EXISTS test_berlin_data.milieuschutz_protection_zones (
    -- 🔑 Primary Key (following ERD pattern)
    protection_zone_key VARCHAR(20) PRIMARY KEY,
    
    -- 🔗 Foreign Key to districts table (ERD compliance)
    district VARCHAR(100) NOT NULL,
    
    -- 📊 Zone Classification and Details  
    zone_type VARCHAR(2) NOT NULL 
        CHECK (zone_type IN ('EM', 'ES')),
    protection_zone_name TEXT NOT NULL,
    
    -- 📅 Date Information (nullable as per original data)
    date_announced DATE,
    date_effective DATE,
    amendment_announced DATE,
    amendment_effective DATE,
    
    -- 📏 Area Information with validation
    area_ha DECIMAL(10,4) NOT NULL 
        CHECK (area_ha > 0),
    
    -- 🗺️ Spatial Geometry (PostGIS with Berlin CRS)
    geometry GEOMETRY(POLYGON, 25833) NOT NULL,
    
    -- 📅 Metadata (following ERD timestamp patterns)
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    -- 🔗 Foreign Key Constraint (ERD compliance)
    CONSTRAINT fk_milieuschutz_district 
        FOREIGN KEY (district) 
        REFERENCES test_berlin_data.districts(district)
        ON DELETE RESTRICT 
        ON UPDATE CASCADE
);
"""

print("   ✅ ERD-compliant table structure defined:")
print("      🔑 PRIMARY KEY: protection_zone_key (VARCHAR(20))")
print("      🔗 FOREIGN KEY: district → districts.district (VARCHAR(100))")
print("      ✅ CHECK: zone_type IN ('EM', 'ES')")
print("      ✅ CHECK: area_ha > 0")
print("      ⚠️ FK CONSTRAINT: ON DELETE RESTRICT, ON UPDATE CASCADE")
print("      🗺️ SPATIAL: geometry POLYGON with EPSG:25833")

# Step 3: Create the table with FK constraints
print("\n3️⃣ Creating table with foreign key constraints...")
try:
    with engine.connect() as conn:
        # Execute table creation
        conn.execute(text(create_table_sql))
        conn.commit()
        print("   ✅ Table 'milieuschutz_protection_zones' created successfully!")
        print("   🔗 Foreign key constraint to districts table established!")
        
except Exception as e:
    print(f"   ❌ Error creating table: {e}")
    if "foreign key constraint" in str(e).lower():
        print("   💡 FK constraint issue - check districts table exists")
    elif "already exists" in str(e).lower():
        print("   💡 Table already exists - that's okay!")

# Step 4: Create performance indexes (ERD best practices)
print("\n4️⃣ Creating performance indexes...")

indexes_sql = [
    # Spatial index (most important for geometry queries)
    """
    CREATE INDEX IF NOT EXISTS idx_milieuschutz_geometry 
    ON test_berlin_data.milieuschutz_protection_zones 
    USING GIST (geometry);
    """,
    
    # Foreign key index (critical for joins with districts)
    """
    CREATE INDEX IF NOT EXISTS idx_milieuschutz_district_fk 
    ON test_berlin_data.milieuschutz_protection_zones (district);
    """,
    
    # Zone type index for EM/ES filtering
    """
    CREATE INDEX IF NOT EXISTS idx_milieuschutz_zone_type 
    ON test_berlin_data.milieuschutz_protection_zones (zone_type);
    """,
    
    # Area index for size-based queries
    """
    CREATE INDEX IF NOT EXISTS idx_milieuschutz_area 
    ON test_berlin_data.milieuschutz_protection_zones (area_ha);
    """,
    
    # Composite index for common queries (district + zone_type)
    """
    CREATE INDEX IF NOT EXISTS idx_milieuschutz_district_zone 
    ON test_berlin_data.milieuschutz_protection_zones (district, zone_type);
    """
]

try:
    with engine.connect() as conn:
        for i, index_sql in enumerate(indexes_sql, 1):
            conn.execute(text(index_sql))
            index_name = index_sql.split('idx_milieuschutz_')[1].split(' ')[0]
            print(f"   ✅ Index {i}: idx_milieuschutz_{index_name}")
        
        conn.commit()
        print("   🚀 All performance indexes created!")
        
except Exception as e:
    print(f"   ❌ Error creating indexes: {e}")

# Step 5: Verify ERD compliance
print("\n5️⃣ Verifying ERD compliance...")
try:
    with engine.connect() as conn:
        # Check table exists
        table_check = conn.execute(text("""
            SELECT table_name 
            FROM information_schema.tables 
            WHERE table_schema = 'test_berlin_data' 
            AND table_name = 'milieuschutz_protection_zones'
        """)).fetchone()
        
        if table_check:
            print("   ✅ Table exists in database!")
            
            # Check foreign key constraints
            fk_constraints = conn.execute(text("""
                SELECT 
                    tc.constraint_name,
                    ccu.table_name AS foreign_table_name,
                    ccu.column_name AS foreign_column_name,
                    rc.delete_rule,
                    rc.update_rule
                FROM information_schema.table_constraints AS tc 
                JOIN information_schema.constraint_column_usage AS ccu
                    ON ccu.constraint_name = tc.constraint_name
                JOIN information_schema.referential_constraints AS rc
                    ON tc.constraint_name = rc.constraint_name
                WHERE tc.constraint_type = 'FOREIGN KEY'
                    AND tc.table_schema = 'test_berlin_data'
                    AND tc.table_name = 'milieuschutz_protection_zones'
            """)).fetchall()
            
            if fk_constraints:
                print("   🔗 Foreign key constraints:")
                for fk in fk_constraints:
                    print(f"      • {fk[0]} → {fk[1]}.{fk[2]} (DEL: {fk[3]}, UPD: {fk[4]})")
            else:
                print("   ⚠️ No foreign key constraints found")
                
        else:
            print("   ❌ Table not found!")
            
except Exception as e:
    print(f"   ❌ Error verifying ERD compliance: {e}")

# Step 6: ERD compliance summary
print("\n6️⃣ ERD compliance summary...")
print("   🏗️ Table: milieuschutz_protection_zones")
print("   📊 Schema: test_berlin_data (following team standard)")
print("   🔑 Primary Key: protection_zone_key (VARCHAR(20))")
print("   🔗 Foreign Key: district → districts.district (VARCHAR(100))")
print("   ✅ Constraints: zone_type validation, area validation, NOT NULL fields")
print("   🗺️ Spatial: PostGIS geometry with EPSG:25833 (Berlin CRS)")
print("   🚀 Indexes: Spatial (GIST), FK (district), zone_type, area, composite")
print("   📋 ERD Pattern: Matches existing table foreign key structure")

print(f"\n✅ ERD-COMPLIANT DATABASE TABLE READY FOR DATA INSERTION!")
print(f"🔗 Foreign key ensures referential integrity with districts table!")

🏗️ CREATING ERD-COMPLIANT MILIEUSCHUTZ TABLE
1️⃣ Verifying districts table (FK target)...
   ✅ districts table found - FK target exists!
   📊 Database districts: 12
   📊 Our data districts: 11
   ✅ All our districts exist in database!

2️⃣ Defining ERD-compliant table structure...
   ✅ ERD-compliant table structure defined:
      🔑 PRIMARY KEY: protection_zone_key (VARCHAR(20))
      🔗 FOREIGN KEY: district → districts.district (VARCHAR(100))
      ✅ CHECK: zone_type IN ('EM', 'ES')
      ✅ CHECK: area_ha > 0
      ⚠️ FK CONSTRAINT: ON DELETE RESTRICT, ON UPDATE CASCADE
      🗺️ SPATIAL: geometry POLYGON with EPSG:25833

3️⃣ Creating table with foreign key constraints...
   ✅ Table 'milieuschutz_protection_zones' created successfully!
   🔗 Foreign key constraint to districts table established!

4️⃣ Creating performance indexes...
   ✅ Index 1: idx_milieuschutz_geometry
   ✅ Index 2: idx_milieuschutz_district_fk
   ✅ Index 3: idx_milieuschutz_zone_type
   ✅ Index 4: idx_milieuschutz_are

In [15]:
# 🔍 Simple Geometry Check - Student Version

print("🔍 SIMPLE GEOMETRY COMPATIBILITY CHECK")
print("=" * 40)

# Step 1: Check our combined_data geometry basics
print("1️⃣ Our combined_data geometry info...")
print(f"   📊 Records: {len(combined_data)}")
print(f"   🗺️ CRS: {combined_data.crs}")

# Sample geometry
sample_geom = combined_data['geometry'].iloc[0]
print(f"   📐 First geometry type: {sample_geom.geom_type}")
print(f"   📝 First 50 chars: {str(sample_geom)[:50]}...")

# Step 2: Check districts table geometry in database
print("\n2️⃣ Database districts table geometry...")
try:
    with engine.connect() as conn:
        # Simple check - does districts table have geometry?
        result = conn.execute(text("""
            SELECT column_name, data_type 
            FROM information_schema.columns 
            WHERE table_schema = 'test_berlin_data' 
            AND table_name = 'districts'
            AND column_name LIKE '%geom%'
        """)).fetchall()
        
        if result:
            for col in result:
                print(f"   ✅ Found geometry column: {col[0]} ({col[1]})")
        else:
            print(f"   ❌ No geometry column found in districts table")
            
        # Quick sample from districts
        sample = conn.execute(text("""
            SELECT district FROM test_berlin_data.districts LIMIT 3
        """)).fetchall()
        
        print(f"   📋 Sample districts in database:")
        for row in sample:
            print(f"      • {row[0]}")
            
except Exception as e:
    print(f"   ❌ Error: {e}")

print(f"\n✅ Simple compatibility check complete!")

🔍 SIMPLE GEOMETRY COMPATIBILITY CHECK
1️⃣ Our combined_data geometry info...
   📊 Records: 175
   🗺️ CRS: EPSG:25833
   📐 First geometry type: MultiPolygon
   📝 First 50 chars: MULTIPOLYGON (((13.347047220754405 52.540337908351...

2️⃣ Database districts table geometry...
   ✅ Found geometry column: geometry (USER-DEFINED)
   ✅ Found geometry column: geometry_str (text)
   📋 Sample districts in database:
      • Reinickendorf
      • Charlottenburg-Wilmersdorf
      • Treptow-Köpenick

✅ Simple compatibility check complete!


---

## 5. 🚀 Final Data Insertion and Database Population (Step 5/5)

### 🎯 **Mission Objective:**
Complete the database population by inserting all 175 Milieuschutz protection zones into our ERD-compliant table with proper error handling and data validation.

### 🔧 **Insertion Strategy:**
1. **Geometry Handling**: Convert GeoPandas geometries to PostGIS-compatible format
2. **Batch Processing**: Efficient row-by-row insertion with transaction management
3. **Error Recovery**: Robust handling of any insertion issues
4. **Validation**: Post-insertion verification of data integrity

### 🛡️ **Quality Assurance:**
- **Spatial Data Integrity**: Ensure geometries are properly formatted
- **Foreign Key Validation**: Verify all district references are valid
- **Transaction Safety**: Rollback capability if errors occur
- **Complete Verification**: Confirm all 175 records inserted successfully

### 🏆 **Success Metrics:**
- **175/175 records inserted** (100% success rate target)
- **Zero spatial errors** (all geometries valid)
- **Complete foreign key integrity** (all districts properly referenced)
- **Optimal performance** (spatial indexing operational)

*🖖 "This final step represents the culmination of our logical approach to database population. The systematic handling of spatial data types, referential integrity, and error recovery demonstrates both technical precision and collaborative excellence. Most fascinating - the transformation of raw WFS data into production-ready collaborative database infrastructure." - Spock*

**Ready to complete our Milieuschutz database population mission? 🚀**

In [19]:
# 🔧 Fix Table Geometry Type

print("🔧 FIXING TABLE GEOMETRY TYPE")
print("=" * 35)

# Step 1: Drop and recreate table with correct geometry type
print("1️⃣ Updating table to accept MultiPolygon...")
try:
    with engine.connect() as conn:
        # Drop existing table
        conn.execute(text("DROP TABLE IF EXISTS test_berlin_data.milieuschutz_protection_zones;"))
        
        # Create table with MULTIPOLYGON instead of POLYGON
        create_table_sql = """
        CREATE TABLE test_berlin_data.milieuschutz_protection_zones (
            protection_zone_key VARCHAR(20) PRIMARY KEY,
            district VARCHAR(100) NOT NULL,
            zone_type VARCHAR(2) NOT NULL CHECK (zone_type IN ('EM', 'ES')),
            protection_zone_name TEXT NOT NULL,
            date_announced DATE,
            date_effective DATE,
            amendment_announced DATE,
            amendment_effective DATE,
            area_ha DECIMAL(10,4) NOT NULL CHECK (area_ha > 0),
            geometry GEOMETRY(MULTIPOLYGON, 25833) NOT NULL,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            CONSTRAINT fk_milieuschutz_district 
                FOREIGN KEY (district) 
                REFERENCES test_berlin_data.districts(district)
                ON DELETE RESTRICT ON UPDATE CASCADE
        );
        """
        
        conn.execute(text(create_table_sql))
        conn.commit()
        print("   ✅ Table recreated with MULTIPOLYGON geometry type!")
        
except Exception as e:
    print(f"   ❌ Error updating table: {e}")

# Step 2: Now insert the data
print("\n2️⃣ Inserting data with correct geometry type...")
try:
    with engine.connect() as conn:
        for idx, row in combined_data.iterrows():
            insert_sql = text("""
                INSERT INTO test_berlin_data.milieuschutz_protection_zones 
                (protection_zone_key, district, zone_type, protection_zone_name, 
                 date_announced, date_effective, amendment_announced, amendment_effective, 
                 area_ha, geometry)
                VALUES (:protection_zone_key, :district, :zone_type, :protection_zone_name,
                        :date_announced, :date_effective, :amendment_announced, :amendment_effective,
                        :area_ha, ST_GeomFromWKB(:geometry, 25833))
            """)
            
            row_data = {
                'protection_zone_key': row['protection_zone_key'],
                'district': row['district'],
                'zone_type': row['zone_type'],
                'protection_zone_name': row['protection_zone_name'],
                'date_announced': None if pd.isna(row['date_announced']) else row['date_announced'],
                'date_effective': None if pd.isna(row['date_effective']) else row['date_effective'],
                'amendment_announced': None if pd.isna(row['amendment_announced']) else row['amendment_announced'],
                'amendment_effective': None if pd.isna(row['amendment_effective']) else row['amendment_effective'],
                'area_ha': row['area_ha'],
                'geometry': row['geometry'].wkb
            }
            
            conn.execute(insert_sql, row_data)
        
        conn.commit()
        print("   ✅ Data inserted successfully!")
        
except Exception as e:
    print(f"   ❌ Insertion failed: {e}")

# Step 3: Verify insertion
print("\n3️⃣ Verifying insertion...")
try:
    with engine.connect() as conn:
        count = conn.execute(text("""
            SELECT COUNT(*) FROM test_berlin_data.milieuschutz_protection_zones
        """)).scalar()
        
        print(f"   📊 Records in database: {count}")
        print(f"   📊 Expected records: {len(combined_data)}")
        
        if count == len(combined_data):
            print("   ✅ All data inserted successfully!")
        else:
            print("   ⚠️ Insertion incomplete")
            
except Exception as e:
    print(f"   ❌ Error verifying: {e}")

print(f"\n🖖 Fixed and inserted!")

🔧 FIXING TABLE GEOMETRY TYPE
1️⃣ Updating table to accept MultiPolygon...
   ✅ Table recreated with MULTIPOLYGON geometry type!

2️⃣ Inserting data with correct geometry type...
   ✅ Data inserted successfully!

3️⃣ Verifying insertion...
   📊 Records in database: 175
   📊 Expected records: 175
   ✅ All data inserted successfully!

🖖 Fixed and inserted!
