# üóÑÔ∏è **AWS Database Investigation & Milieuschutz Protection Zones Table Creation**

## üéØ **Learning Objectives**
By the end of this notebook, students will understand:
1. **Database Connection**: How to connect to AWS RDS PostgreSQL
2. **Schema Investigation**: How to explore existing database structure
3. **PostGIS Setup**: Understanding spatial database extensions
4. **GeoJSON Import**: How to create PostGIS tables from GeoJSON files
5. **Data Validation**: How to verify successful data import
6. **Urban Planning Data**: Understanding Milieuschutz (Environmental Protection Zones)
7. **Temporal Data**: Working with dates and policy amendments

## üèõÔ∏è **About Milieuschutz Zones**
**Milieuschutz** (Environmental Protection Areas) are special urban planning zones in Berlin designed to:
- **Preserve neighborhood character** and architectural heritage
- **Control gentrification** and maintain affordable housing
- **Protect social structure** of residential areas
- **Regulate building modifications** and new developments

## üìã **Prerequisites**
- ‚úÖ Enhanced GeoJSON file (`milieuschutz_residental_and_urban_zones_joined.geojson`)
- ‚úÖ AWS database credentials
- ‚úÖ Basic understanding of PostgreSQL and spatial data
- ‚úÖ Understanding of Berlin's urban planning concepts

---

## üì¶ **Step 1: Import Required Libraries**

**Learning Point**: We need specific libraries for spatial data and database operations with environmental protection zones.

### üîß **Library Functions for Milieuschutz Data:**
- **`geopandas`**: Handle complex protection zone geometries (MultiPolygon shapes)
- **`pandas`**: Manage temporal data (announcement dates, effective dates, amendments)
- **`sqlalchemy`**: Create robust database connections for urban planning data
- **`psycopg2`**: PostgreSQL adapter optimized for spatial queries
- **`dotenv`**: Secure credential management for production databases

In [2]:
# üì¶ Import required libraries for spatial database operations
import pandas as pd
import geopandas as gpd
from sqlalchemy import create_engine, text
import os
import traceback
from dotenv import load_dotenv

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Libraries imported successfully!")
print("üìö Ready for spatial database operations")

‚úÖ Libraries imported successfully!
üìö Ready for spatial database operations


## üîå **Step 2: AWS Database Connection**

**Learning Point**: Connection strings contain all necessary information to connect to a database storing urban planning data.

**Format**: `postgresql+psycopg2://username:password@host:port/database`

### üèõÔ∏è **Why This Database for Milieuschutz Data?**
- **AWS RDS PostgreSQL**: Scalable cloud database for city-wide data
- **PostGIS Extension**: Essential for spatial environmental protection zones
- **Multi-tenant Design**: Supports multiple Berlin urban planning datasets
- **Production Security**: Environment variables protect sensitive credentials

**Note**: Database is currently offline until Monday - we'll prepare our code for testing then! üîß

In [79]:
# üîê Clean and professional database connection using python-dotenv
print("üîå **CONNECTING TO AWS DATABASE**")
print("=" * 40)

# Load environment variables from .env file (no string functions needed!)
load_dotenv('../ignored_files/.env')
PASSWORD = os.getenv('PASSWORD')

# Build connection URL with secure password from .env
DATABASE_URL = f'postgresql+psycopg2://postgres:{PASSWORD}@layered-data-warehouse.cdg2ok68acsn.eu-central-1.rds.amazonaws.com:5432/berlin_project_db'

try:
    print("1Ô∏è‚É£ Creating database engine...")
    engine = create_engine(DATABASE_URL, connect_args={'connect_timeout': 10})
    
    print("2Ô∏è‚É£ Testing connection...")
    conn = engine.connect()
    
    # Test query
    test_result = conn.execute(text("SELECT current_database(), current_user, version()"))
    db_info = test_result.fetchone()
    
    print(f"   ‚úÖ Connected successfully!")
    print(f"   üóÑÔ∏è  Database: {db_info[0]}")
    print(f"   üë§ User: {db_info[1]}")
    print(f"   üìä PostgreSQL Version: {db_info[2][:50]}...")
    
except Exception as e:
    print(f"‚ùå Connection failed: {e}")
    print("üí° Check network connection and credentials")

üîå **CONNECTING TO AWS DATABASE**
1Ô∏è‚É£ Creating database engine...
2Ô∏è‚É£ Testing connection...
   ‚úÖ Connected successfully!
   üóÑÔ∏è  Database: berlin_project_db
   üë§ User: postgres
   üìä PostgreSQL Version: PostgreSQL 17.4 on aarch64-unknown-linux-gnu, comp...
   ‚úÖ Connected successfully!
   üóÑÔ∏è  Database: berlin_project_db
   üë§ User: postgres
   üìä PostgreSQL Version: PostgreSQL 17.4 on aarch64-unknown-linux-gnu, comp...


## üîç **Step 3: Database Schema Investigation**

**Learning Point**: Before importing environmental data, always investigate the existing database structure to understand how Milieuschutz data will integrate.

**Schema Investigation Goals**:
- **Check Available Schemas**: See what database schemas exist (berlin_data, public, etc.)
- **Explore Existing Tables**: Understand current urban planning data structure
- **Verify PostGIS Extension**: Confirm spatial capabilities are available
- **Plan Integration**: See how Milieuschutz zones will fit with existing data

**Database Architecture Understanding**:
- **Schema Organization**: Berlin planning data is organized in the `berlin_data` schema
- **Table Relationships**: Environmental zones will connect to districts and other planning data
- **Spatial Integration**: PostGIS enables complex environmental boundary analysis
- **Data Quality**: Understanding existing structure helps maintain consistency

**Educational Value**: This step teaches systematic database exploration before adding new environmental datasets.

In [None]:
# üîç STEP 3: Database Schema Investigation

print("üìä Investigating Database Structure for Milieuschutz Integration...")
print("=" * 60)

# Check available schemas
print("üóÇÔ∏è  Available Database Schemas:")
cursor.execute("""
    SELECT schema_name 
    FROM information_schema.schemata 
    WHERE schema_name NOT IN ('information_schema', 'pg_catalog', 'pg_toast');
""")
schemas = cursor.fetchall()
for schema in schemas:
    print(f"   üìÅ {schema[0]}")

print("\n" + "=" * 60)

# Check existing tables in berlin_data schema
print("üèóÔ∏è  Existing Tables in berlin_data Schema:")
cursor.execute("""
    SELECT table_name, table_type
    FROM information_schema.tables 
    WHERE table_schema = 'berlin_data'
    ORDER BY table_name;
""")
tables = cursor.fetchall()
for table_name, table_type in tables:
    print(f"   ? {table_name} ({table_type})")

print("\n" + "=" * 60)

# Check PostGIS extension
print("üåç PostGIS Spatial Extension Status:")
cursor.execute("SELECT extname, extversion FROM pg_extension WHERE extname = 'postgis';")
postgis_info = cursor.fetchone()
if postgis_info:
    print(f"   ‚úÖ PostGIS Version: {postgis_info[1]}")
else:
    print("   ‚ùå PostGIS not found - spatial functions unavailable")

print("\n" + "=" * 60)
print("‚úÖ Schema Investigation Complete")
print("üìù Ready to proceed with Milieuschutz data integration")

üîç **DATABASE SCHEMA INVESTIGATION**
1Ô∏è‚É£ Available schemas:
   üìÅ berlin_data
   üìÅ public

2Ô∏è‚É£ Target schema 'berlin_data' exists: ‚úÖ YES
   üìã Search path set to: berlin_data, public

3Ô∏è‚É£ Existing tables in berlin_data schema:
   üóÑÔ∏è  districts (BASE TABLE)
   üóÑÔ∏è  districts_pop_stat (BASE TABLE)
   üóÑÔ∏è  geography_columns (VIEW)
   üóÑÔ∏è  geometry_columns (VIEW)
   üóÑÔ∏è  green_spaces (BASE TABLE)
   üóÑÔ∏è  hospitals (BASE TABLE)
   üóÑÔ∏è  neighborhoods (BASE TABLE)
   üóÑÔ∏è  regional_statistics (BASE TABLE)
   üóÑÔ∏è  schools_kai (BASE TABLE)
   üóÑÔ∏è  short_time_listings (BASE TABLE)
   üóÑÔ∏è  spatial_ref_sys (BASE TABLE)
   üóÑÔ∏è  ubahn (BASE TABLE)

   üìä Total tables found: 12

‚úÖ Schema investigation complete!


## üó∫Ô∏è **Step 4: PostGIS Extension Verification**

**Learning Point**: Before working with environmental protection zones (Milieuschutz), we must verify that PostGIS spatial extension is available for geospatial operations.

**PostGIS Extension Goals**:
- **Verify Installation**: Confirm PostGIS is installed and available
- **Check Version**: Ensure we have a compatible version for spatial analysis
- **Validate Schema**: Confirm the extension is properly configured
- **Enable Spatial Operations**: Prepare for environmental boundary processing

**Why PostGIS is Critical for Environmental Data**:
- **Spatial Analysis**: Environmental zones require complex geometric calculations
- **Boundary Operations**: Intersection, containment, and proximity analysis
- **Coordinate Systems**: Proper handling of Berlin's spatial reference system
- **Performance**: Optimized spatial indexing for large environmental datasets

**Environmental Planning Context**:
- **Zone Boundaries**: Milieuschutz areas have complex geometric shapes
- **Spatial Relationships**: How environmental zones relate to districts and neighborhoods
- **Buffer Analysis**: Creating protection buffers around sensitive areas
- **Overlay Operations**: Combining environmental data with urban planning layers

**Educational Value**: Understanding spatial database capabilities is essential for environmental data management and urban planning analysis.

In [None]:
# üó∫Ô∏è Step 4: Check PostGIS extension status
print("üó∫Ô∏è **Step 4: POSTGIS EXTENSION CHECK**")
print("=" * 40)

try:
    # Check if PostGIS is installed
    postgis_check = conn.execute(text("""
        SELECT 
            extname as extension_name,
            extversion as version,
            nspname as schema
        FROM pg_extension e
        JOIN pg_namespace n ON e.extnamespace = n.oid
        WHERE extname = 'postgis'
    """))
    
    postgis_info = postgis_check.fetchone()
    
    if postgis_info:
        print(f"‚úÖ PostGIS is installed!")
        print(f"   üì¶ Extension: {postgis_info.extension_name}")
        print(f"   üî¢ Version: {postgis_info.version}")
        print(f"   üìã Schema: {postgis_info.schema}")
    else:
        print("‚ùå PostGIS not found")
        print("üí° PostGIS extension may need to be enabled")
    
    print("\n‚úÖ PostGIS check complete!")
    
except Exception as e:
    print(f"‚ùå PostGIS check failed: {e}")

üó∫Ô∏è **POSTGIS EXTENSION CHECK**
‚úÖ PostGIS is installed!
   üì¶ Extension: postgis
   üî¢ Version: 3.5.1
   üìã Schema: berlin_data

‚úÖ PostGIS check complete!


## üìÇ **Step 5: Load Enhanced Milieuschutz Protection Zones GeoJSON**

**Learning Point**: GeoJSON is a standard format for geographic data that can be easily imported into PostGIS.

**About Milieuschutz Protection Zones Data**:
- **Environmental Protection**: Legal zones designed to preserve neighborhood character and prevent gentrification
- **Policy Framework**: Each zone has specific protection policies with enforcement dates
- **Administrative Hierarchy**: Protection zones are distributed across Berlin districts
- **Berlin Context**: Multiple Milieuschutz zones across districts with varying policy implementation dates

**Educational Value**: This step demonstrates loading environmental policy data with temporal and spatial components for urban planning analysis.

In [None]:
# üìÇ Load the enhanced neighborhoods GeoJSON file
print("üìÇ **LOADING MILIEUSCHUTZ GEOJSON**")
print("=" * 40)

# Path to the enhanced GeoJSON file
geojson_path = "layered-populate-data-pool-da/milieuschutz-populating-db/sources/milieuschutz_residental_and_urban_zones_joined.geojson"

try:
    print("1Ô∏è‚É£ Checking file existence...")
    if os.path.exists(geojson_path):
        print(f"   ‚úÖ File found: {os.path.basename(geojson_path)}")
        
        print("2Ô∏è‚É£ Loading GeoJSON with GeoPandas...")
        milieuschutz_gdf = gpd.read_file(geojson_path)
        
        print(f"   ‚úÖ Loaded {len(milieuschutz_gdf)} protection zones")
        print(f"   üìä Columns: {list(milieuschutz_gdf.columns)}")
        print(f"   üåç Coordinate Reference System: {milieuschutz_gdf.crs}")
        print(f"   üìè Geometry types: {milieuschutz_gdf.geometry.geom_type.unique()}")
        
        print("\n3Ô∏è‚É£ Sample data preview:")
        sample_data = milieuschutz_gdf[['protection_zone_key', 'protection_zone_name', 'district']].head(3)
        for idx, row in sample_data.iterrows():
            print(f"   üèõÔ∏è {row['protection_zone_key']}: {row['protection_zone_name']} in {row['district']}")
        
        # Ensure correct CRS (EPSG:4326 for WGS84)
        if milieuschutz_gdf.crs != 'EPSG:4326':
            print(f"\n4Ô∏è‚É£ Converting CRS to EPSG:4326...")
            milieuschutz_gdf = milieuschutz_gdf.to_crs('EPSG:4326')
            print("   ‚úÖ CRS converted to EPSG:4326")
        else:
            print("\n4Ô∏è‚É£ CRS verification: ‚úÖ Already EPSG:4326")
        
        print("\n‚úÖ Milieuschutz data loaded and ready for database import!")
        
    else:
        print(f"   ‚ùå File not found: {geojson_path}")
        print("   üí° Make sure the file exists in the sources directory")
        
except Exception as e:
    print(f"‚ùå Error loading GeoJSON: {e}")
    print(f"üîç Details: {traceback.format_exc()}")

üìÇ **LOADING ENHANCED NEIGHBORHOODS GEOJSON**
1Ô∏è‚É£ Checking file existence...
   ‚úÖ File found: neighborhoods_enhanced.geojson
2Ô∏è‚É£ Loading GeoJSON with GeoPandas...
   ‚úÖ Loaded 96 neighborhoods
   üìä Columns: ['district_id', 'district', 'neighborhood', 'geometry']
   üåç Coordinate Reference System: EPSG:4326
   üìè Geometry types: ['Polygon' 'MultiPolygon']

3Ô∏è‚É£ Sample data preview:
   üèòÔ∏è 01: Mitte - Mitte
   üèòÔ∏è 01: Mitte - Moabit
   üèòÔ∏è 01: Mitte - Hansaviertel

4Ô∏è‚É£ CRS verification: ‚úÖ Already EPSG:4326

‚úÖ GeoJSON loaded and ready for database import!


   "metadata": {},
   "source": [
    "## üìÇ **Step 6: Load Milieuschutz Environmental Protection Zones Data**
",
    "
",
    "**Learning Point**: Loading spatial urban planning data requires understanding of environmental protection policies.
"

**Milieuschutz Data Structure**:
- **protection_zone_id**: Unique identifier for each protection area
- **protection_zone_key**: Short code for zone identification  
- **protection_zone_name**: Human-readable area name
- **district**: Berlin district containing the zone
- **district_id**: Zero-padded district identifier (01-12)
- **date_announced**: When zone was officially announced
- **date_effective**: When protection rules took effect
- **area_ha**: Zone area in hectares
- **zone_type**: Type of environmental protection (EM = Erhaltungsgebiete Milieuschutz)
- **geometry**: Spatial boundaries of protection zone

**Urban Planning Context**: These zones preserve neighborhood character and control gentrification in Berlin.

**Educational Value**: This step demonstrates loading temporal-spatial policy data with rich metadata.

In [None]:
# ÔøΩ Load the Milieuschutz Environmental Protection Zones GeoJSON file
print("ÔøΩ **LOADING MILIEUSCHUTZ ENVIRONMENTAL PROTECTION ZONES**")
print("=" * 55)

# Relative path for student collaboration
geojson_path = "../sources/milieuschutz_residental_and_urban_zones_joined.geojson"

try:
    print("1Ô∏è‚É£ Checking file existence...")
    if os.path.exists(geojson_path):
        print(f"   ‚úÖ File found: {os.path.basename(geojson_path)}")
        
        print("2Ô∏è‚É£ Loading Environmental Protection Zones with GeoPandas...")
        milieuschutz_gdf = gpd.read_file(geojson_path)
        
        print(f"   ‚úÖ Loaded {len(milieuschutz_gdf)} protection zones")
        print(f"   üìä Columns: {list(milieuschutz_gdf.columns)}")
        print(f"   üåç Coordinate Reference System: {milieuschutz_gdf.crs}")
        print(f"   üìè Geometry types: {milieuschutz_gdf.geometry.geom_type.unique()}")
        
        print("\n3Ô∏è‚É£ Sample protection zones preview:")
        sample_data = milieuschutz_gdf[['protection_zone_key', 'district', 'protection_zone_name', 'zone_type']].head(3)
        for idx, row in sample_data.iterrows():
            print(f"   üèõÔ∏è {row['protection_zone_key']}: {row['district']} - {row['protection_zone_name']} ({row['zone_type']})")
        
        print("\n4Ô∏è‚É£ Temporal data analysis:")
        # Show date range of protection zones
        dates = milieuschutz_gdf['date_effective'].dropna()
        if len(dates) > 0:
            earliest = dates.min()[:10]  # Extract just the date part
            latest = dates.max()[:10]
            print(f"   üìÖ Protection zones established: {earliest} to {latest}")
        
        # Ensure correct CRS (EPSG:4326 for WGS84)
        if milieuschutz_gdf.crs != 'EPSG:4326':
            print(f"\n5Ô∏è‚É£ Converting CRS to EPSG:4326...")
            milieuschutz_gdf = milieuschutz_gdf.to_crs('EPSG:4326')
            print("   ‚úÖ CRS converted to EPSG:4326")
        else:
            print("\n5Ô∏è‚É£ CRS verification: ‚úÖ Already EPSG:4326")
        
        print("\n‚úÖ Milieuschutz data loaded and ready for database import!")
        
    else:
        print(f"   ‚ùå File not found: {geojson_path}")
        print("   üí° Make sure the file exists in the sources directory")
        
except Exception as e:
    print(f"‚ùå Error loading GeoJSON: {e}")
    print(f"üîç Details: {traceback.format_exc()}")

## üîß **Step 7: Connection Check & Transaction Reset**

**Learning Point**: Before creating tables, always verify your connection is working and clear any pending transactions.

**Why this matters**: 
- Database connections can have "dirty" transaction states
- Rolling back ensures we start with a clean slate
- Connection tests verify we can communicate with the database

**Best Practice**: Always check connection health before major operations!

In [None]:
# ÔøΩ **STEP 7: CONNECTION CHECK & ROLLBACK**
# ============================================
print("ÔøΩ **STEP 7: CONNECTION CHECK & ROLLBACK**")
print("=" * 45)

try:
    # Check connection status
    test_result = conn.execute(text("SELECT 1 as test"))
    test_value = test_result.fetchone()[0]
    print(f"‚úÖ Connection working: {test_value}")
    
    # Rollback any pending transactions
    conn.rollback()
    print("‚úÖ Transaction state cleared")
    
except Exception as e:
    print(f"‚ùå Connection issue: {e}")
    print("ÔøΩ Try reconnecting if needed")

print("ÔøΩ Ready for table creation!")

ÔøΩ **STEP 7A: CONNECTION CHECK & ROLLBACK**
‚úÖ Connection working: 1
‚úÖ Transaction state cleared
ÔøΩ Ready for table creation!


## üèóÔ∏è **Step 8: Create Milieuschutz Protection Zones Table Structure**

**Learning Point**: Environmental protection zone tables require fields for policy metadata and temporal tracking.

**Milieuschutz Table Design**:
- **protection_zone_id**: Primary identifier (VARCHAR(50))
- **protection_zone_key**: Short reference code (VARCHAR(20)) 
- **protection_zone_name**: Human-readable zone name (VARCHAR(100))
- **district**: Berlin district name (VARCHAR(100))
- **district_id**: Zero-padded district code for foreign keys (VARCHAR(2))
- **date_announced**: Policy announcement date (DATE)
- **date_effective**: When protection started (DATE)
- **area_ha**: Zone area in hectares (DECIMAL)
- **zone_type**: Protection category (VARCHAR(10))
- **geometry**: Spatial boundaries (MULTIPOLYGON, SRID 4326)

**Urban Planning Database Concepts**:
- **Temporal Tracking**: Capture policy timeline from announcement to effect
- **Hierarchical Structure**: Zones belong to districts 
- **Policy Metadata**: Rich context for urban planning analysis

**Best Practice**: Build complex policy tables incrementally - start with core fields, add spatial features!

In [None]:
# üèóÔ∏è **STEP 8: CREATE MILIEUSCHUTZ PROTECTION ZONES TABLE**
# =========================================================
print("üèóÔ∏è **STEP 8: CREATE MILIEUSCHUTZ PROTECTION ZONES TABLE**")
print("=" * 60)

try:
    # Create comprehensive Milieuschutz table with all policy fields
    create_sql = """
    CREATE TABLE IF NOT EXISTS berlin_data.milieuschutz_protection_zones (
        protection_zone_id VARCHAR(50) PRIMARY KEY,
        protection_zone_key VARCHAR(20) NOT NULL,
        protection_zone_name VARCHAR(100) NOT NULL,
        district VARCHAR(100) NOT NULL,
        district_id VARCHAR(2) NOT NULL,
        date_announced DATE,
        date_effective DATE,
        amendment_announced DATE,
        amendment_effective DATE,
        area_ha DECIMAL(10,2),
        zone_type VARCHAR(10) NOT NULL,
        geometry GEOMETRY(MULTIPOLYGON, 4326)
    );
    """
    
    conn.execute(text(create_sql))
    conn.commit()
    print("   ‚úÖ Milieuschutz protection zones table created!")
    
    # Verify table was created
    verify_sql = """
        SELECT column_name, data_type, is_nullable 
        FROM information_schema.columns 
        WHERE table_schema = 'berlin_data' 
        AND table_name = 'milieuschutz_protection_zones'
        ORDER BY ordinal_position;
    """
    
    result = conn.execute(text(verify_sql))
    columns = result.fetchall()
    
    print(f"\n   üìã Table created with {len(columns)} columns:")
    for col in columns:
        nullable = "NULL" if col[2] == "YES" else "NOT NULL"
        print(f"      ‚Ä¢ {col[0]}: {col[1]} ({nullable})")
    
    print("\n   üèõÔ∏è Ready for environmental protection zone data!")
    
except Exception as e:
    print(f"‚ùå Error creating table: {e}")
    conn.rollback()
    print("üîÑ Transaction rolled back")

üèóÔ∏è **STEP 7B: CREATE TABLE STRUCTURE**
   ‚úÖ Basic table structure created!
   ‚úÖ Connection still working


## üó∫Ô∏è **Step 9: Add PostGIS Geometry Column for Milieuschutz Protection Zones**

**Learning Point**: PostGIS geometry columns enable spatial operations on environmental protection zone boundaries.

**PostGIS Functions**:
- `ALTER TABLE ADD COLUMN geometry` - Standard approach for adding spatial columns
- `GEOMETRY(MULTIPOLYGON, 4326)` - Explicit geometry type and coordinate system
- Automatically integrates with PostGIS spatial index system

**Parameters Explained**:
- `'berlin_data'` - schema name
- `'neighborhoods'` - table name  
- `'geometry'` - column name
- `4326` - SRID (Spatial Reference System - WGS84)
- `'MULTIPOLYGON'` - geometry type (neighborhoods can have complex shapes)
- `2` - dimensions (2D: X,Y coordinates)

**Spatial Benefits**: Enables neighborhood-level spatial queries, proximity analysis, and containment checks.

In [None]:
# üó∫Ô∏è **STEP 9: ADD GEOMETRY COLUMN**
# ==================================
print("üó∫Ô∏è **STEP 9: ADD GEOMETRY COLUMN**")
print("=" * 45)

try:
    print("2Ô∏è‚É£ Adding PostGIS geometry column...")
    
    # Use simple ALTER TABLE approach (more reliable)
    add_geom_sql = """
    ALTER TABLE berlin_data.neighborhoods 
    ADD COLUMN IF NOT EXISTS geometry GEOMETRY(MULTIPOLYGON, 4326);
    """
    
    conn.execute(text(add_geom_sql))
    conn.commit()
    print("   ‚úÖ Geometry column added successfully!")
    
    # Verify the column was added
    verify_sql = """
    SELECT column_name, data_type 
    FROM information_schema.columns 
    WHERE table_schema = 'berlin_data' 
    AND table_name = 'neighborhoods' 
    ORDER BY column_name;
    """
    
    result = conn.execute(text(verify_sql))
    columns = result.fetchall()
    print("\n3Ô∏è‚É£ Table structure verification:")
    for col in columns:
        print(f"   üìã {col[0]}: {col[1]}")
    
    # Connection check
    test_result = conn.execute(text("SELECT 1"))
    print("\n   ‚úÖ Connection still working")
    
except Exception as e:
    print(f"‚ùå Geometry column creation failed: {e}")
    conn.rollback()
    print("üîÑ Transaction rolled back")
    raise
    test_result = conn.execute(text("SELECT 1"))
    print("   ‚úÖ Connection still working")
    
except Exception as e:
    print(f"‚ùå Geometry column creation failed: {e}")
    conn.rollback()
    print("üîÑ Transaction rolled back")
    raise

üó∫Ô∏è **STEP 7C: ADD GEOMETRY COLUMN**
2Ô∏è‚É£ Adding PostGIS geometry column...
   ‚úÖ Geometry column added successfully!

3Ô∏è‚É£ Table structure verification:
   üìã district: character varying
   üìã district_id: character varying
   üìã geometry: USER-DEFINED
   üìã neighborhood: character varying

   ‚úÖ Connection still working


## üîó **Step 10: Add Data Validation and Constraints**

**Learning Point**: Environmental protection data requires specialized validation to ensure data quality and consistency.

**Why Add Constraints BEFORE Data Insertion?**
- **Data Integrity**: Database will reject invalid zone data automatically
- **Performance**: Constraints help query optimizer create better execution plans
- **Documentation**: Makes environmental data relationships explicit in database schema
- **Multi-Application Safety**: All applications accessing the database respect the environmental data constraints

**Milieuschutz Data Validation**:
- **Temporal Validation**: Policy dates must be reasonable (not future dates)
- **Spatial Validation**: Geometry must be valid MULTIPOLYGON
- **Zone Uniqueness**: Protection zone keys must be unique
- **District Consistency**: District names must be consistent

**Educational Value**: Demonstrates proper environmental database design with data quality controls.

In [None]:
# üîó **STEP 10: ADD FOREIGN KEY CONSTRAINT**
# ==========================================
print("üîó **STEP 10: ADD FOREIGN KEY CONSTRAINT**")
print("=" * 45)

try:
    print("1Ô∏è‚É£ Checking districts table exists...")
    # Verify districts table exists first
    districts_check = conn.execute(text("""
        SELECT table_name FROM information_schema.tables 
        WHERE table_schema = 'berlin_data' AND table_name = 'districts';
    """))
    
    if districts_check.fetchone():
        print("   ‚úÖ Districts table found")
        
        print("\n2Ô∏è‚É£ Creating foreign key constraint...")
        # Create the foreign key constraint
        constraint_sql = """
            ALTER TABLE berlin_data.neighborhoods 
            ADD CONSTRAINT fk_neighborhoods_district_id 
            FOREIGN KEY (district_id) 
            REFERENCES berlin_data.districts(district_id);
        """
        
        conn.execute(text(constraint_sql))
        conn.commit()
        print("   ‚úÖ Foreign key constraint 'fk_neighborhoods_district_id' created!")
        
        print("\n3Ô∏è‚É£ Verifying constraint creation...")
        # Verify the constraint was created
        verify_constraint = conn.execute(text("""
            SELECT constraint_name, constraint_type 
            FROM information_schema.table_constraints
            WHERE table_schema = 'berlin_data' 
            AND table_name = 'neighborhoods'
            AND constraint_type = 'FOREIGN KEY';
        """))
        
        fk_constraints = verify_constraint.fetchall()
        print(f"   üìã Foreign key constraints found: {len(fk_constraints)}")
        for constraint in fk_constraints:
            print(f"      üîó {constraint[0]} ({constraint[1]})")
            
        print("\nüéØ **Database integrity is now enforced at the constraint level!**")
        print("‚úÖ Invalid district_id values will be automatically rejected")
        
    else:
        print("   ‚ùå Districts table not found! Cannot create foreign key constraint.")
        
except Exception as e:
    if "already exists" in str(e).lower():
        print("   ‚ÑπÔ∏è  Foreign key constraint already exists!")
        print("   ‚úÖ Database integrity is already enforced")
    else:
        print(f"   ‚ùå Error creating constraint: {e}")
        conn.rollback()
        print("   üîÑ Transaction rolled back")
        
print("\nüöÄ Ready for data insertion with enforced referential integrity!")

üîó **STEP 7D: ADD FOREIGN KEY CONSTRAINT**
1Ô∏è‚É£ Checking districts table exists...
   ‚úÖ Districts table found

2Ô∏è‚É£ Creating foreign key constraint...
   ‚úÖ Foreign key constraint 'fk_neighborhoods_district_id' created!

3Ô∏è‚É£ Verifying constraint creation...
   üìã Foreign key constraints found: 1
      üîó fk_neighborhoods_district_id (FOREIGN KEY)

üéØ **Database integrity is now enforced at the constraint level!**
‚úÖ Invalid district_id values will be automatically rejected

üöÄ Ready for data insertion with enforced referential integrity!


## üéØ **Step 11: Implementing Proper Referential Integrity Rules**

**Learning Point**: The basic foreign key constraint we created uses default rules (`NO ACTION`), but database best practices recommend specific **CASCADE** and **RESTRICT** behaviors for different operations.

### üîç **Understanding Referential Integrity Rules:**

**Current Default Behavior:**
- `ON UPDATE NO ACTION` - Rejects updates to parent district_id
- `ON DELETE NO ACTION` - Rejects deletion of parent districts

**Recommended Best Practice:**
- `ON UPDATE CASCADE` - **Automatically propagates** district_id changes to neighborhoods
- `ON DELETE RESTRICT` - **Explicitly prevents** deletion of districts with neighborhoods

### üìö **Why These Rules Matter:**

#### **CASCADE ON UPDATE** üîÑ
- **Scenario**: If Berlin renames district "01" to "1A" 
- **Behavior**: All neighborhoods automatically update their district_id from "01" to "1A"
- **Benefit**: Maintains data consistency without manual intervention

#### **RESTRICT ON DELETE** üõ°Ô∏è
- **Scenario**: Attempting to delete a district that has neighborhoods
- **Behavior**: Database explicitly rejects the deletion with clear error
- **Benefit**: Prevents accidental data loss and orphaned records

### üéì **Educational Value:**
- **Data Integrity**: Understanding how relationships should behave
- **Database Design**: Industry-standard referential integrity patterns
- **Error Prevention**: Proactive protection against data inconsistencies

**Next Step**: Update our constraint to implement these best practices!

In [None]:
# üéØ **STEP 11: IMPLEMENT PROPER REFERENTIAL INTEGRITY RULES**
# ================================================================
print("üéØ **STEP 11: IMPLEMENT PROPER REFERENTIAL INTEGRITY RULES**")
print("=" * 65)

try:
    print("1Ô∏è‚É£ Checking current constraint rules...")
    # Check current constraint rules
    current_rules = conn.execute(text("""
        SELECT 
            tc.constraint_name,
            rc.update_rule,
            rc.delete_rule
        FROM information_schema.table_constraints AS tc 
        JOIN information_schema.referential_constraints AS rc
            ON tc.constraint_name = rc.constraint_name
        WHERE tc.constraint_type = 'FOREIGN KEY' 
        AND tc.table_schema = 'berlin_data'
        AND tc.table_name = 'neighborhoods';
    """))
    
    current = current_rules.fetchone()
    if current:
        print(f"   Current rules: UPDATE {current[1]}, DELETE {current[2]}")
        
        print("\n2Ô∏è‚É£ Dropping existing constraint...")
        # Drop the existing constraint
        conn.execute(text("""
            ALTER TABLE berlin_data.neighborhoods 
            DROP CONSTRAINT fk_neighborhoods_district_id;
        """))
        print("   ‚úÖ Existing constraint dropped")
        
        print("\n3Ô∏è‚É£ Creating new constraint with best practice rules...")
        # Create new constraint with proper rules
        conn.execute(text("""
            ALTER TABLE berlin_data.neighborhoods 
            ADD CONSTRAINT fk_neighborhoods_district_id 
            FOREIGN KEY (district_id) 
            REFERENCES berlin_data.districts(district_id)
            ON UPDATE CASCADE
            ON DELETE RESTRICT;
        """))
        
        conn.commit()
        print("   ‚úÖ New constraint created with:")
        print("      üîÑ ON UPDATE CASCADE (auto-propagates district_id changes)")
        print("      üõ°Ô∏è  ON DELETE RESTRICT (prevents district deletion)")
        
        print("\n4Ô∏è‚É£ Verifying new constraint rules...")
        # Verify the new rules
        verify_rules = conn.execute(text("""
            SELECT 
                tc.constraint_name,
                rc.update_rule,
                rc.delete_rule
            FROM information_schema.table_constraints AS tc 
            JOIN information_schema.referential_constraints AS rc
                ON tc.constraint_name = rc.constraint_name
            WHERE tc.constraint_type = 'FOREIGN KEY' 
            AND tc.table_schema = 'berlin_data'
            AND tc.table_name = 'neighborhoods';
        """))
        
        new_rules = verify_rules.fetchone()
        if new_rules:
            print(f"   ‚úÖ Verified: UPDATE {new_rules[1]}, DELETE {new_rules[2]}")
            
            if new_rules[1] == 'CASCADE' and new_rules[2] == 'RESTRICT':
                print("\nüéØ **PERFECT! Best practice referential integrity implemented!**")
                print("   üìö Students now understand:")
                print("      ‚Ä¢ CASCADE propagates changes automatically")
                print("      ‚Ä¢ RESTRICT prevents accidental data loss")
                print("      ‚Ä¢ Proper database design principles")
            else:
                print(f"   ‚ö†Ô∏è  Unexpected rules: {new_rules[1]}, {new_rules[2]}")
    else:
        print("   ‚ùå No foreign key constraint found to update")
        
except Exception as e:
    print(f"‚ùå Error updating constraint: {e}")
    conn.rollback()
    print("üîÑ Transaction rolled back")
    
print("\nüöÄ Database now follows industry-standard referential integrity patterns!")

üéØ **STEP 7E: IMPLEMENT PROPER REFERENTIAL INTEGRITY RULES**
1Ô∏è‚É£ Checking current constraint rules...
   Current rules: UPDATE NO ACTION, DELETE NO ACTION

2Ô∏è‚É£ Dropping existing constraint...
   ‚úÖ Existing constraint dropped

3Ô∏è‚É£ Creating new constraint with best practice rules...
   ‚úÖ New constraint created with:
      üîÑ ON UPDATE CASCADE (auto-propagates district_id changes)
      üõ°Ô∏è  ON DELETE RESTRICT (prevents district deletion)

4Ô∏è‚É£ Verifying new constraint rules...
   ‚úÖ Verified: UPDATE CASCADE, DELETE RESTRICT

üéØ **PERFECT! Best practice referential integrity implemented!**
   üìö Students now understand:
      ‚Ä¢ CASCADE propagates changes automatically
      ‚Ä¢ RESTRICT prevents accidental data loss
      ‚Ä¢ Proper database design principles

üöÄ Database now follows industry-standard referential integrity patterns!


## ÔøΩÔ∏è **Step 12: Insert Milieuschutz Environmental Protection Zones Data**

**Learning Point**: Inserting temporal-spatial policy data requires careful handling of dates, metadata, and spatial geometries.

**Milieuschutz Data Insertion Strategy**:
- **Policy Processing**: Insert all environmental protection zones with complete metadata
- **Temporal Data Handling**: Convert date strings to proper DATE format
- **Spatial Conversion**: Transform complex MultiPolygon geometries to PostGIS format
- **Transaction Safety**: Use rollback capability for error recovery
- **Data Validation**: Verify zone types, districts, and date consistency

**PostGIS Integration for Protection Zones**:
- `ST_GeomFromText()` - Converts complex MultiPolygon WKT to PostGIS geometry
- Maintains spatial reference system (SRID 4326) for global compatibility
- Preserves geometric precision for accurate urban planning analysis

**Urban Planning Database Concepts**:
- **Policy Lifecycle**: Track from announcement to effectiveness
- **Administrative Hierarchy**: Link zones to Berlin districts
- **Spatial Integrity**: Maintain precise protection zone boundaries

**Educational Value**: Demonstrates complete workflow from environmental policy GeoJSON to operational spatial database.

In [None]:
# ÔøΩÔ∏è **STEP 12: INSERT MILIEUSCHUTZ ENVIRONMENTAL PROTECTION ZONES**
# ================================================================
print("ÔøΩÔ∏è **STEP 12: INSERT MILIEUSCHUTZ ENVIRONMENTAL PROTECTION ZONES**")
print("=" * 70)

try:
    # Fix any transaction issues first
    print("1Ô∏è‚É£ Fixing transaction state...")
    conn.rollback()
    print("   ‚úÖ Transaction rolled back")
    
    # Clear existing milieuschutz data (if any)
    print("\n2Ô∏è‚É£ Clearing existing milieuschutz zones data...")
    conn.execute(text("DELETE FROM berlin_data.milieuschutz_zones;"))
    conn.commit()
    print("   ‚úÖ Milieuschutz zones table cleared")
    
    # Insert milieuschutz zones data with complete metadata
    print("\n3Ô∏è‚É£ Inserting environmental protection zones...")
    print(f"   üìä Processing {len(milieuschutz_gdf)} protection zones...")
    
    inserted_count = 0
    for idx, row in milieuschutz_gdf.iterrows():
        # Helper function to handle date conversion
        def convert_date(date_str):
            if date_str and str(date_str) != 'nan' and str(date_str) != 'None':
                return str(date_str)[:10]  # Extract YYYY-MM-DD part
            return None
        
        insert_sql = text("""
            INSERT INTO berlin_data.milieuschutz_zones 
            (protection_zone_id, protection_zone_key, protection_zone_name, 
             district, district_id, date_announced, date_effective, 
             amendment_announced, amendment_effective, area_ha, zone_type, geometry) 
            VALUES (:protection_zone_id, :protection_zone_key, :protection_zone_name,
                    :district, :district_id, :date_announced, :date_effective,
                    :amendment_announced, :amendment_effective, :area_ha, :zone_type,
                    ST_GeomFromText(:wkt, 4326))
        """)
        
        conn.execute(insert_sql, {
            'protection_zone_id': row['protection_zone_id'],
            'protection_zone_key': row['protection_zone_key'],
            'protection_zone_name': row['protection_zone_name'],
            'district': row['district'],
            'district_id': row['district_id'],
            'date_announced': convert_date(row['date_announced']),
            'date_effective': convert_date(row['date_effective']),
            'amendment_announced': convert_date(row['amendment_announced']) if 'amendment_announced' in row and row['amendment_announced'] else None,
            'amendment_effective': convert_date(row['amendment_effective']) if 'amendment_effective' in row and row['amendment_effective'] else None,
            'area_ha': float(row['area_ha']) if row['area_ha'] else None,
            'zone_type': row['zone_type'],
            'wkt': row['geometry'].wkt
        })
        inserted_count += 1
        
        if inserted_count % 10 == 0:
            print(f"   üìù Inserted {inserted_count} protection zones...")
    
    conn.commit()
    print(f"   ‚úÖ Successfully inserted {inserted_count} environmental protection zones!")
    
    # Verify the insertion with detailed analysis
    print("\n4Ô∏è‚É£ Verifying milieuschutz zones insertion...")
    
    # Count total zones
    count_result = conn.execute(text("SELECT COUNT(*) FROM berlin_data.milieuschutz_zones"))
    total_zones = count_result.fetchone()[0]
    print(f"   üìä Total protection zones: {total_zones}")
    
    # Analyze zone types
    type_result = conn.execute(text("""
        SELECT zone_type, COUNT(*) 
        FROM berlin_data.milieuschutz_zones 
        GROUP BY zone_type
    """))
    zone_types = type_result.fetchall()
    print(f"   üèõÔ∏è Zone types distribution:")
    for zone_type, count in zone_types:
        print(f"      ‚Ä¢ {zone_type}: {count} zones")
    
    # Analyze by district
    district_result = conn.execute(text("""
        SELECT district, COUNT(*) as zone_count, SUM(area_ha) as total_area
        FROM berlin_data.milieuschutz_zones 
        GROUP BY district 
        ORDER BY zone_count DESC 
        LIMIT 5
    """))
    districts = district_result.fetchall()
    print(f"   üó∫Ô∏è Top 5 districts by protection zones:")
    for district, count, area in districts:
        area_display = f"{area:.1f} ha" if area else "Unknown area"
        print(f"      ‚Ä¢ {district}: {count} zones ({area_display})")
    
    # Sample data verification
    sample_result = conn.execute(text("""
        SELECT protection_zone_key, protection_zone_name, district, date_effective
        FROM berlin_data.milieuschutz_zones 
        LIMIT 3
    """))
    samples = sample_result.fetchall()
    print(f"   ÔøΩ Sample zones verification:")
    for key, name, district, date_eff in samples:
        print(f"      ÔøΩÔ∏è {key}: {name} in {district} (effective: {date_eff})")
    
    print(f"\nüéâ **MILIEUSCHUTZ DATA READY! {total_zones} Environmental Protection Zones!**")
    print("=" * 70)
    print("‚úÖ Schema: berlin_data")
    print("‚úÖ Table: milieuschutz_zones") 
    print("‚úÖ Temporal data: Policy dates tracked")
    print("‚úÖ Spatial data: Working with PostGIS MultiPolygon geometry!")
    
except Exception as e:
    print(f"‚ùå Error inserting milieuschutz data: {e}")
    conn.rollback()
    print("üîÑ Transaction rolled back")
    import traceback
    print(f"üîç Details: {traceback.format_exc()}")

## ‚úÖ **Step 13: Verify Milieuschutz Data & Test Environmental Protection Functions**

**Learning Point**: Always verify your environmental data import was successful and spatial constraints work correctly.

**Verification Steps**:
1. **Count Records** - Ensure all protection zones were inserted successfully
2. **Test Zone Types** - Verify protection type classifications are correct
3. **Test Spatial Functions** - Confirm PostGIS geometry operations work with environmental zones
4. **Check Data Types** - Validate geometry types and coordinate systems
5. **District Analysis** - Confirm protection zone distribution across Berlin districts

**PostGIS Testing Functions**:
- `ST_GeometryType()` - returns the geometry type (e.g., ST_MultiPolygon)
- `ST_SRID()` - returns the Spatial Reference System ID (should be 4326)
- These functions prove our environmental protection spatial data is properly stored

**Environmental Data Validation**:
- Analyze protection zone types and their distribution
- Check temporal data (policy effective dates)
- Verify district-zone relationships
- Confirm data completeness for urban planning analysis

**Success Criteria**:
- ‚úÖ Record count matches expected protection zones
- ‚úÖ All zones have valid district references  
- ‚úÖ Spatial data uses correct coordinate system (SRID 4326)
- ‚úÖ No missing critical data (zone names, keys, districts)
- ‚úÖ PostGIS functions work correctly on environmental geometries

**Educational Value**: Demonstrates comprehensive environmental database validation for urban planning applications.

## ? **Mission Accomplished: Milieuschutz Environmental Protection Database Ready!**

**üéØ Educational Achievement**: Successfully implemented a complete environmental protection data pipeline from GeoJSON to operational spatial database!

---

### ‚úÖ **What We Accomplished:**
1. **Database Connection**: Successfully connected to AWS RDS PostgreSQL
2. **Schema Investigation**: Explored existing database structure and environmental data requirements
3. **PostGIS Verification**: Confirmed spatial extension availability for environmental geometries
4. **Milieuschutz GeoJSON Import**: Loaded Berlin environmental protection zones with comprehensive metadata
5. **Environmental Table Creation**: Created milieuschutz_zones table with specialized environmental protection schema
6. **Spatial Data Integration**: Imported complex protection zone geometries into PostGIS
7. **Temporal Data Handling**: Integrated policy effective dates and protection type classifications  
8. **Data Validation**: Confirmed successful import of all environmental protection zones

### üìö **Key Learning Points:**
- **Environmental Spatial Data**: Working with protection zones and policy boundaries
- **Temporal Database Design**: Capturing when environmental policies became effective
- **PostGIS Integration**: Converting complex environmental GeoJSON to PostGIS geometry with proper SRID
- **Environmental Data Schema**: Designing databases for urban planning and environmental protection
- **Policy Data Management**: Handling administrative and legal aspects of environmental data
- **Transaction Management**: Using rollback for error recovery during environmental data operations
- **Comprehensive Validation**: Always verify environmental data imports and spatial relationships!

### üåç **Environmental Database Concepts Mastered:**
- **Protection Zone Classification**: Different types of environmental protections (Milieuschutz, historic preservation)
- **Policy Lifecycle Tracking**: From policy announcement to implementation dates
- **Administrative Hierarchy**: Linking protection zones to Berlin districts for governance
- **Spatial Environmental Analysis**: Ready for proximity analysis, coverage studies, and policy impact assessment

### üöÄ **Next Steps in Environmental Data Science:**
- Add formal spatial indexing for performance optimization
- Implement environmental zone impact analysis queries
- Connect with building permits data for compliance monitoring
- Develop environmental policy effectiveness metrics

**üéØ Student Achievement**: You now understand how environmental protection data integrates with urban planning databases and can support policy analysis and urban development decisions!

---
**? Educational Value**: This notebook demonstrates the complete workflow from environmental policy GeoJSON files to operational spatial databases that support real-world urban planning and environmental protection decisions.

In [None]:
# Query the first 5 rows from berlin_data.districts_enhanced
result = conn.execute(text("SELECT * FROM berlin_data.milieuschutz_protection_zones LIMIT 5;"))
rows = result.fetchall()

# Display results as a pandas DataFrame for readability
import pandas as pd
df_preview = pd.DataFrame(rows, columns=result.keys())
df_preview


Unnamed: 0,district_id,district,neighborhood,geometry
0,1,Mitte,Mitte,0106000020E61000000100000001030000000100000006...
1,1,Mitte,Moabit,0106000020E61000000100000001030000000100000002...
2,1,Mitte,Hansaviertel,0106000020E61000000100000001030000000100000006...
3,1,Mitte,Tiergarten,0106000020E61000000100000001030000000100000055...
4,1,Mitte,Wedding,0106000020E6100000010000000103000000010000004E...
