# [Data Integration] Milieuschutz in Berlin #67

This issue outlines the process for integrating a new data layer on topic Milieuschutz in Berlin into the database. The work should be completed in 3 PRs, one for each major step.

# Project Overview & Current Status

This notebook is part of a data analysis project focused on Berlin's social conservation areas (Milieuschutzgebiete). The main goals are to collect, clean, and analyze geospatial and tabular data related to these protected areas, using a variety of open data sources and APIs.

**Key Data Sources:**
- Official Berlin city planning and Milieuschutz information
- OGC WMS/WFS geospatial services
- GENESIS-Online statistical API
- PDF and CSV data on protected areas and street lists

**Current Progress:**
- Data sources have been identified and documented (see readme.md)
- Geospatial data from WFS services has been downloaded and imported
- PDF street lists are being extracted and converted to CSV for analysis
- Data cleaning and merging steps are underway

**Next Steps:**
- Complete data cleaning and standardization
- Merge geospatial and tabular datasets
- Begin exploratory data analysis and visualization

_This cell provides a quick summary for colleagues to understand the project scope, data sources, and current status. For more details, see the readme.md or the following notebook cells._

## 🧪 Step 1: Research & Data Modelling


### 0.1 Importing essential Python libraries

The next three code cells import essential Python libraries for data analysis and geospatial processing:

- **Cell 65:** Imports `numpy`, `pandas`, and `seaborn` for numerical operations, data manipulation, and visualization.
- **Cell 66:** Imports database and SQL-related libraries (`psycopg2`, `sqlalchemy`) and configures warnings.
- **Cell 67:** Imports geospatial and web service libraries (`owslib`, `geopandas`, `shapely`, `requests`, `zipfile`, `bs4`, `tabula`, `jpype`) for handling geospatial data, web requests, and PDF extraction.

In [222]:
import numpy as np
import pandas as pd
import seaborn as sns

In [223]:
import psycopg2
from sqlalchemy import create_engine, text, inspect
import warnings
warnings.filterwarnings("ignore")

In [224]:
from owslib.wfs import WebFeatureService # For web feature services 
import geopandas as gpd # For geospatial data handling
from owslib.wms import WebMapService # For web map services
from owslib.fes import PropertyIsEqualTo, BBox # For filtering and bounding box queries
from owslib.util import ServiceException # For handling service exceptions
from owslib.wcs import WebCoverageService # For web coverage services 
import requests # For HTTP requests
from shapely.geometry import shape, mapping # For geometry operations 
from io import BytesIO # For handling byte streams 
from zipfile import ZipFile # For handling zip files 
from bs4 import BeautifulSoup # For parsing HTML/XML content
import tabula # For reading tables from PDFs
import jpype # For Java-based PDF processing  

### 1.1 Data Source Discovery

#### Research Focus
- Conducted research on the topic **Milieuschutz** (social conservation areas) in Berlin.

Relevant Data Sources Identified

1. **Berlin Erhaltungsverordnungsgebiete (WFS)**
    - **Source & Origin:** Official Berlin Geoportal, OGC WFS service  
      [https://gdi.berlin.de/services/wfs/erhaltungsverordnungsgebiete](https://gdi.berlin.de/services/wfs/erhaltungsverordnungsgebiete)
    - **Update Frequency:** Irregular, updated as new conservation areas are designated or changed
    - **Data Type:** Dynamic (API-based, supports ongoing updates)

2. **Milieuschutzgebiet Street Lists (PDF/CSV)**
    - **Source & Origin:** Berlin Senate Department for Urban Development, downloadable PDF  
      [https://www.berlin.de/sen/stadtentwicklung/wohnen/wohnraum/milieuschutz/](https://www.berlin.de/sen/stadtentwicklung/wohnen/wohnraum/milieuschutz/)
    - **Update Frequency:** Irregular, updated with new designations or changes
    - **Data Type:** Static (manual download and conversion to CSV)

3. **GENESIS-Online Statistical API**
    - **Source & Origin:** Statistisches Bundesamt (Federal Statistical Office), API access  
      [https://www-genesis.destatis.de/genesis/online](https://www-genesis.destatis.de/genesis/online)
    - **Update Frequency:** Regular (monthly/quarterly)
    - **Data Type:** Dynamic (API-based)

4. **Berlin Open Data Portal**
    - **Source & Origin:** [https://daten.berlin.de/](https://daten.berlin.de/)
    - **Update Frequency:** Varies by dataset
    - **Data Type:** Static and dynamic datasets available

Documentation Approach

For each data source:
- **Source and origin:** Documented above
- **Update frequency:** As specified per source
- **Data type:** Noted as static or dynamic

_These sources collectively enable comprehensive integration and enrichment of listings and neighborhood datasets for Milieuschutz analysis in Berlin._



#### ELT ➡ OGC WFS service to pandas_df:


In [225]:
wfs_url = "https://gdi.berlin.de/services/wfs/erhaltungsverordnungsgebiete?service=WFS&version=2.0.0&request=GetCapabilities"
wfs = WebFeatureService(url=wfs_url, version='2.0.0')
print(list(wfs.contents))  # Should show the two layers above

['erhaltungsverordnungsgebiete:erhaltgeb_em', 'erhaltungsverordnungsgebiete:erhaltgeb_es']


In [226]:
response = wfs.getfeature(typename='erhaltungsverordnungsgebiete:erhaltgeb_em', outputFormat='application/json')
with open('erhaltgeb_em.geojson', 'wb') as f:
    f.write(response.read())

In [227]:
from owslib.wfs import WebFeatureService

wfs_url = "https://gdi.berlin.de/services/wfs/erhaltungsverordnungsgebiete?service=WFS&version=2.0.0&request=GetCapabilities"
wfs = WebFeatureService(url=wfs_url, version='2.0.0')

response = wfs.getfeature(
    typename='erhaltungsverordnungsgebiete:erhaltgeb_em',
    outputFormat='application/json'
)

with open('erhaltgeb_em.geojson', 'wb') as f:
    f.write(response.read())

In [228]:
import geopandas as gpd

# Read the GeoJSON file into a GeoDataFrame
gdf = gpd.read_file('erhaltgeb_em.geojson')

# If you want a regular pandas DataFrame (without geometry):
df_em = gdf.drop(columns='geometry')

print(df_em.head())

                    id schluessel  \
0  erhaltgeb_em.EM0105     EM0105   
1  erhaltgeb_em.EM0106     EM0106   
2  erhaltgeb_em.EM0107     EM0107   
3  erhaltgeb_em.EM0108     EM0108   
4  erhaltgeb_em.EM0109     EM0109   

                                            pdf_link bezirk   gebietsname  \
0  https://www.berlin.de/sen/stadtentwicklung/_as...  Mitte    Sparrplatz   
1  https://www.berlin.de/sen/stadtentwicklung/_as...  Mitte  Leopoldplatz   
2  https://www.berlin.de/sen/stadtentwicklung/_as...  Mitte    Waldstraße   
3  https://www.berlin.de/sen/stadtentwicklung/_as...  Mitte  Birkenstraße   
4  https://www.berlin.de/sen/stadtentwicklung/_as...  Mitte     Seestraße   

   f_gvbl_dat  f_in_kraft ae_gvbldat ae_inkraft fl_in_ha  
0  24.05.2016  25.05.2016        NaN        NaN     51.3  
1  24.05.2016  25.05.2016        NaN        NaN     62.1  
2  24.05.2016  25.05.2016        NaN        NaN     72.6  
3  24.05.2016  25.05.2016        NaN        NaN     81.6  
4  24.05.2016  25.0

In [229]:
df_em.shape

(81, 10)

In [230]:
df_em.columns

Index(['id', 'schluessel', 'pdf_link', 'bezirk', 'gebietsname', 'f_gvbl_dat',
       'f_in_kraft', 'ae_gvbldat', 'ae_inkraft', 'fl_in_ha'],
      dtype='object')

In [231]:
df_em['pdf_link'].iloc[0]

'https://www.berlin.de/sen/stadtentwicklung/_assets/quartiersentwicklung/stadterneuerung/soziales-erhaltungsrecht/gebiete/em0105.pdf'

In [232]:
bezirk_unique = df_em['bezirk'].unique()
print(bezirk_unique)
bezirk_unique.shape # Number of unique districts is 11

['Mitte' 'Friedrichshain-Kreuzberg' 'Pankow' 'Charlottenburg-Wilmersdorf'
 'Spandau' 'Steglitz-Zehlendorf' 'Tempelhof-Schöneberg' 'Neukölln'
 'Treptow-Köpenick' 'Lichtenberg' 'Reinickendorf']


['Mitte' 'Friedrichshain-Kreuzberg' 'Pankow' 'Charlottenburg-Wilmersdorf'
 'Spandau' 'Steglitz-Zehlendorf' 'Tempelhof-Schöneberg' 'Neukölln'
 'Treptow-Köpenick' 'Lichtenberg' 'Reinickendorf']


(11,)

In [233]:
# group gebietsname by bezirk
districts_areas = df_em.groupby('bezirk')['gebietsname'].unique().reset_index()
districts_areas.columns = ['bezirk', 'gebietsname_list']
print(districts_areas)


                        bezirk  \
0   Charlottenburg-Wilmersdorf   
1     Friedrichshain-Kreuzberg   
2                  Lichtenberg   
3                        Mitte   
4                     Neukölln   
5                       Pankow   
6                Reinickendorf   
7                      Spandau   
8          Steglitz-Zehlendorf   
9         Tempelhof-Schöneberg   
10            Treptow-Köpenick   

                                     gebietsname_list  
0   [Mierendorff-Insel, Gierkeplatz, Klausenerplat...  
1   [Graefestraße, Luisenstadt, Bergmannstraße Nor...  
2     [Kaskelstraße, Weitlingstraße, Fanningerstraße]  
3   [Sparrplatz, Leopoldplatz, Waldstraße, Birkens...  
4   [Schillerpromenade, Reuterplatz, Flughafenstra...  
5   [Falkplatz, Arnimplatz, Humannplatz, Ostseestr...  
6          [Letteplatz, Scharnweberstraße/Klixstraße]  
7                  [Wilhelmstadt, Spandauer Neustadt]  
8   [Feuerbachstraße, Gritznerstraße Nord, Mittels...  
9   [Barbarossaplatz / Bayerisc

In [234]:
response = wfs.getfeature(typename='erhaltungsverordnungsgebiete:erhaltgeb_es', outputFormat='application/json')
with open('erhaltgeb_em.geojson', 'wb') as f:
    f.write(response.read())

In [235]:
wfs_url = "https://gdi.berlin.de/services/wfs/erhaltungsverordnungsgebiete?service=WFS&version=2.0.0&request=GetCapabilities"
wfs = WebFeatureService(url=wfs_url, version='2.0.0')

response = wfs.getfeature(
    typename='erhaltungsverordnungsgebiete:erhaltgeb_es',
    outputFormat='application/json'
)

with open('erhaltgeb_em.geojson', 'wb') as f:
    f.write(response.read())

In [236]:
gdf = gpd.read_file('erhaltgeb_em.geojson')
df_es = gdf.drop(columns='geometry')
print(df_es.head())


                    id schluessel bezirk  \
0  erhaltgeb_es.ES0101     ES0101  Mitte   
1  erhaltgeb_es.ES0102     ES0102  Mitte   
2  erhaltgeb_es.ES0103     ES0103  Mitte   
3  erhaltgeb_es.ES0104     ES0104  Mitte   
4  erhaltgeb_es.ES0105     ES0105  Mitte   

                                         gebietsname  f_gvbl_dat  f_in_kraft  \
0                                        Poststadion  30.12.1988  31.12.1988   
1                                 Spandauer Vorstadt  25.06.1993  26.06.1993   
2  Südliche Brunnenstraße Teile der Rosenthaler V...  09.12.1995  10.12.1995   
3                            Friedrich-Wilhelm-Stadt  31.08.1996  01.09.1996   
4                     Dorotheenstadt, Friedrichstadt  10.04.1997  11.04.1997   

  ae_gvbldat ae_inkraft fl_in_ha  
0        NaN        NaN     53.2  
1        NaN        NaN    109.1  
2        NaN        NaN     18.8  
3        NaN        NaN     69.5  
4        NaN        NaN     98.7  


In [237]:
df_es.shape

(94, 9)

In [238]:
df_es.columns

Index(['id', 'schluessel', 'bezirk', 'gebietsname', 'f_gvbl_dat', 'f_in_kraft',
       'ae_gvbldat', 'ae_inkraft', 'fl_in_ha'],
      dtype='object')

In [239]:
df_em.head()

Unnamed: 0,id,schluessel,pdf_link,bezirk,gebietsname,f_gvbl_dat,f_in_kraft,ae_gvbldat,ae_inkraft,fl_in_ha
0,erhaltgeb_em.EM0105,EM0105,https://www.berlin.de/sen/stadtentwicklung/_as...,Mitte,Sparrplatz,24.05.2016,25.05.2016,,,51.3
1,erhaltgeb_em.EM0106,EM0106,https://www.berlin.de/sen/stadtentwicklung/_as...,Mitte,Leopoldplatz,24.05.2016,25.05.2016,,,62.1
2,erhaltgeb_em.EM0107,EM0107,https://www.berlin.de/sen/stadtentwicklung/_as...,Mitte,Waldstraße,24.05.2016,25.05.2016,,,72.6
3,erhaltgeb_em.EM0108,EM0108,https://www.berlin.de/sen/stadtentwicklung/_as...,Mitte,Birkenstraße,24.05.2016,25.05.2016,,,81.6
4,erhaltgeb_em.EM0109,EM0109,https://www.berlin.de/sen/stadtentwicklung/_as...,Mitte,Seestraße,24.05.2016,25.05.2016,,,48.6


In [240]:
df_em['gebietsname'].unique()
df_es['gebietsname'].shape # Number of unique area names is 94

(94,)

In [257]:
# Download both available layers from the WFS endpoint as GeoDataFrames
wfs_url = "https://gdi.berlin.de/services/wfs/erhaltungsverordnungsgebiete"
layers = [
    "erhaltungsverordnungsgebiete:erhaltgeb_em",  # Erhaltung der Zusammensetzung der Wohnbevölkerung
    "erhaltungsverordnungsgebiete:erhaltgeb_es",  # Erhaltung der städtebaulichen Eigenart
]

# Download as GeoJSON
em_url = f"{wfs_url}?service=WFS&version=2.0.0&request=GetFeature&typeNames={layers[0]}&outputFormat=application/json"
es_url = f"{wfs_url}?service=WFS&version=2.0.0&request=GetFeature&typeNames={layers[1]}&outputFormat=application/json"

gdf_em = gpd.read_file(em_url)
gdf_es = gpd.read_file(es_url)

print(f"Downloaded {len(gdf_em)} features from erhaltgeb_em")
print(f"Downloaded {len(gdf_es)} features from erhaltgeb_es")
gdf_em.head(), gdf_es.head()

Downloaded 81 features from erhaltgeb_em
Downloaded 94 features from erhaltgeb_es


(                    id schluessel  \
 0  erhaltgeb_em.EM0105     EM0105   
 1  erhaltgeb_em.EM0106     EM0106   
 2  erhaltgeb_em.EM0107     EM0107   
 3  erhaltgeb_em.EM0108     EM0108   
 4  erhaltgeb_em.EM0109     EM0109   
 
                                             pdf_link bezirk   gebietsname  \
 0  https://www.berlin.de/sen/stadtentwicklung/_as...  Mitte    Sparrplatz   
 1  https://www.berlin.de/sen/stadtentwicklung/_as...  Mitte  Leopoldplatz   
 2  https://www.berlin.de/sen/stadtentwicklung/_as...  Mitte    Waldstraße   
 3  https://www.berlin.de/sen/stadtentwicklung/_as...  Mitte  Birkenstraße   
 4  https://www.berlin.de/sen/stadtentwicklung/_as...  Mitte     Seestraße   
 
    f_gvbl_dat  f_in_kraft ae_gvbldat ae_inkraft fl_in_ha  \
 0  24.05.2016  25.05.2016        NaN        NaN     51.3   
 1  24.05.2016  25.05.2016        NaN        NaN     62.1   
 2  24.05.2016  25.05.2016        NaN        NaN     72.6   
 3  24.05.2016  25.05.2016        NaN        NaN     81.

- gdf_em.head() displays the first few rows of the GeoDataFrame for the layer "erhaltgeb_em" (Erhaltung der Zusammensetzung der Wohnbevölkerung – social preservation/milieuschutz).
- gdf_es.head() displays the first few rows of the GeoDataFrame for the layer "erhaltgeb_es" (Erhaltung der städtebaulichen Eigenart – preservation of urban character).

Both are tables with geospatial features, but each comes from a different legal basis and may cover different areas or have different attributes.

#### ELT ➡ .pdf to .csv:
In the next cells, the workflow focuses on extracting, loading, and exploring tabular data about Berlin's social conservation areas (Milieuschutzgebiete) from a PDF street list:

- Reads the previously converted CSV file (from the PDF street list) into a DataFrame called `milieuschutzgebiete_df` and displays its first few rows for a quick check.
- Prints detailed information about the DataFrame, including column types, non-null counts, and memory usage, to assess data quality and completeness.
- Outputs descriptive statistics for the DataFrame, providing insights into the distribution and summary of the data.

These steps are essential for verifying the successful extraction of street-level data, understanding its structure, and preparing for further cleaning, merging, or analysis with the geospatial datasets loaded earlier in the notebook.

In [241]:
# witch version of tabula-py is installed
print(tabula.__version__)

2.10.0


In [242]:
import jpype
print(jpype.getDefaultJVMPath())


/Library/Java/JavaVirtualMachines/temurin-24.jdk/Contents/Home/lib/libjli.dylib


In [243]:
import jpype

# The error indicates that the Java Virtual Machine (JVM) could not be found.
# Tabula requires Java to be installed and accessible.
# Let's check if Java is installed and if JPype can find the JVM.


try:
    jvm_path = jpype.getDefaultJVMPath()
    print("JVM found at:", jvm_path)
except jpype.JVMNotFoundException as e:
    print("JVM not found. Please ensure Java is installed and JAVA_HOME is set correctly.")
    print(e)

JVM found at: /Library/Java/JavaVirtualMachines/temurin-24.jdk/Contents/Home/lib/libjli.dylib


In [244]:
# Check if JPype is installed correctly
print(jpype.__version__)

1.5.2


In [245]:
# Convert PDF to CSV using tabula
'''
tabula.convert_into(
    "../sources/.pdf/strassenliste-milieuschutzgebiete-berlin.pdf",
    "strassenliste-milieuschutzgebiete-berlin.csv",
    output_format="csv",
    pages="all"
)
'''
   

'\ntabula.convert_into(\n    "../sources/.pdf/strassenliste-milieuschutzgebiete-berlin.pdf",\n    "strassenliste-milieuschutzgebiete-berlin.csv",\n    output_format="csv",\n    pages="all"\n)\n'

### 1.2 Modelling & Planning


#### Select and document key parameters/columns from raw data relevant to the use case:

Following the guidance to use "neighborhoods" as the primary unifying column, the data has been updated accordingly. All relevant DataFrames now use the column name `neighborhoods` (instead of "district" or "Bezirk") for consistency and improved user intuitiveness. This approach aligns with common user expectations and supports easier integration across datasets, as most users identify locations by neighborhood rather than administrative district.


In [246]:
# Read the CSV file into a DataFrame
milieuschutzgebiete_df = pd.read_csv(
    "strassenliste-milieuschutzgebiete-berlin.csv")
# Display the first few rows of the DataFrame
print(milieuschutzgebiete_df.head())

                              Straße und Hausnummern        Bezirk  \
0                                Aachener Str. 17-26  Charl.-Wilm.   
1                       Aachener Straße alle Nummern  Charl.-Wilm.   
2  Aalesunder Straße: 1-12 fortlaufend, alle Nummern        Pankow   
3                      Achenbachstraße: alle Nummern       Spandau   
4                           Ackerstraße: 2-19, 25-41       Spandau   

               Gebiet  
0     Brabanter Platz  
1     Brabanter Platz  
2          Arnimplatz  
3  Spandauer Neustadt  
4  Spandauer Neustadt  


In [247]:
print(milieuschutzgebiete_df.info()) 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1907 entries, 0 to 1906
Data columns (total 3 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Straße und Hausnummern  1907 non-null   object
 1   Bezirk                  1905 non-null   object
 2   Gebiet                  1905 non-null   object
dtypes: object(3)
memory usage: 44.8+ KB
None


In [248]:
print(milieuschutzgebiete_df.describe())

        Straße und Hausnummern       Bezirk       Gebiet
count                     1907         1905         1905
unique                    1876           25           97
top     Straße und Hausnummern  Fhain-Krzb.  Luisenstadt
freq                        30          309           57


In [249]:
# copy the DataFrame to a new variable
social_preservation_areas_df = milieuschutzgebiete_df.copy()

# columns of the DataFrame
print(social_preservation_areas_df.columns) 

# change german columns 'Straße und Hausnummern', 'Bezirk', 'Gebiet' to english column names
social_preservation_areas_df = social_preservation_areas_df.rename(
    columns={
        'Straße und Hausnummern': 'street_and_house_nr',
        'Bezirk': 'neighborhoods',
        'Gebiet': 'area'
    }
)

print(social_preservation_areas_df.columns)


Index(['Straße und Hausnummern', 'Bezirk', 'Gebiet'], dtype='object')
Index(['street_and_house_nr', 'neighborhoods', 'area'], dtype='object')


In [250]:
social_preservation_areas_df.head()

Unnamed: 0,street_and_house_nr,neighborhoods,area
0,Aachener Str. 17-26,Charl.-Wilm.,Brabanter Platz
1,Aachener Straße alle Nummern,Charl.-Wilm.,Brabanter Platz
2,"Aalesunder Straße: 1-12 fortlaufend, alle Nummern",Pankow,Arnimplatz
3,Achenbachstraße: alle Nummern,Spandau,Spandauer Neustadt
4,"Ackerstraße: 2-19, 25-41",Spandau,Spandauer Neustadt


In [251]:
bezirk_unique

array(['Mitte', 'Friedrichshain-Kreuzberg', 'Pankow',
       'Charlottenburg-Wilmersdorf', 'Spandau', 'Steglitz-Zehlendorf',
       'Tempelhof-Schöneberg', 'Neukölln', 'Treptow-Köpenick',
       'Lichtenberg', 'Reinickendorf'], dtype=object)

In [252]:
neighborhoods_unique = social_preservation_areas_df['neighborhoods'].unique()
print(neighborhoods_unique) 

['Charl.-Wilm.' 'Pankow' 'Spandau' 'Fhain-Krzb.' 'Mitte' 'Stg-Zehlend'
 'Reinickendorf' nan 'Thf-Schönb.' 'Bezirk' 'Lichtenberg' 'Neukölln'
 'Schöneberg' 'Treptow' 'Prenzl Berg' 'Trept.-Köp.' 'Charl.- Wilmersd.'
 'Weissensee' 'Tempelholf-\rSchöneberg' 'Neuköln' 'Charl.-Wilm..'
 'Friedrichs.-Krzb.' 'Friedrichshain' 'Prenzl. Berg' 'Pankow\r(Weißensee)'
 'Prenzl-Berg']


In [253]:
# export the DataFrame to a CSV file
#social_preservation_areas_df.to_csv("social_preservation_areas.csv", index=False) 

In [None]:
# Download both available layers from the WFS endpoint as GeoDataFrames
import geopandas as gpd

wfs_url = "https://gdi.berlin.de/services/wfs/erhaltungsverordnungsgebiete"
layers = [
    "erhaltungsverordnungsgebiete:erhaltgeb_em",  # Erhaltung der Zusammensetzung der Wohnbevölkerung
    "erhaltungsverordnungsgebiete:erhaltgeb_es",  # Erhaltung der städtebaulichen Eigenart
]

# Download as GeoJSON
em_url = f"{wfs_url}?service=WFS&version=2.0.0&request=GetFeature&typeNames={layers[0]}&outputFormat=application/json"
es_url = f"{wfs_url}?service=WFS&version=2.0.0&request=GetFeature&typeNames={layers[1]}&outputFormat=application/json"

gdf_em = gpd.read_file(em_url)
gdf_es = gpd.read_file(es_url)

print(f"Downloaded {len(gdf_em)} features from erhaltgeb_em")
print(f"Downloaded {len(gdf_es)} features from erhaltgeb_es")
gdf_em.head(), gdf_es.head()

```markdown
### Downloading and Inspecting Milieuschutz GeoDataFrames

The code above demonstrates how to programmatically download both available layers from the Berlin WFS endpoint as GeoDataFrames using GeoPandas:

- **Layer 1:** `erhaltgeb_em` (Erhaltung der Zusammensetzung der Wohnbevölkerung)  
    Focuses on the preservation of the social composition of residents within protected areas.

- **Layer 2:** `erhaltgeb_es` (Erhaltung der städtebaulichen Eigenart)  
    Focuses on the preservation of the unique urban character of neighborhoods.

Each layer is accessed via a direct GeoJSON URL and loaded into a separate GeoDataFrame (`gdf_em` and `gdf_es`). The code prints the number of features (areas) downloaded for each layer and displays the first few rows for a quick inspection.

This step ensures that both geospatial datasets are available for further analysis, comparison, and integration with other tabular data sources in the project.
```

## Data Transformation Plan

**1. Key Parameters/Columns:**
- Select columns such as street name, district (Bezirk), area name (Gebietsname), and any unique identifiers relevant to the use case.

**2. Data Connections:**
- Link data to existing tables using coordinates, district names, or neighborhood codes to enable spatial joins and enrich analysis.

**3. Planned Schema:**
- Draft a new table schema with fields: `id`, `street_name`, `district`, `area_name`, `coordinates`, and any additional attributes needed for the project.

**4. Known Data Issues:**
- Possible inconsistencies in street naming conventions
- Missing or incomplete coordinate data
- Duplicate entries or overlapping areas

**5. Transformation Steps:**
- Clean and standardize street and district names
- Normalize area names and codes
- Remove duplicates and handle missing values
- Structure the data to match the planned schema
- Validate connections to existing tables (spatial and tabular)

_This plan will guide the cleaning, normalization, and integration of the latest dataframe into the project database for further analysis._