---
title: Research of existing geojson files for Germany on municipality level
date: now
author: Jan Cap
---

We found one data source of geojson files for Germany on opendatalab.de: https://opendatalab.de/projects/geojson-utilities/#
Lets try to load the state level boundaries first. Then we will try the municipality level boundaries.

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt

## State level boundaries load attempt

In [None]:
# Try to load German municipality boundaries from a common source
# Let's start with a sample GeoJSON for German states (Bundesländer)

try:
    # Load GeoJSON data using geopandas
    gdf_state = gpd.read_file("../data/bundeslaender_simplify200.geojson")

    print(f"Loaded GeoDataFrame with {len(gdf_state)} rows")
    print(f"Columns: {list(gdf_state.columns)}")
    print(f"CRS: {gdf_state.crs}")

    # Display first few rows
    display(gdf_state)

except Exception as e:
    print(f"Error loading from URL: {e}")
    print("Let's try a different approach...")

Data columns explanation:
- RS (Regional key): Depending on the level, 2-digit
- GEN (Geographical name): official name of the administrative unit
- BEZ (Official designation): official designation of the administrative unit like "Stadt", "Landkreis", etc.
- Destatis (Destatis data): includes area in square meters and population numbers
- SDV_RS: No idea (12-digit)
- RS_0: No idea (12-digit)

In [None]:
# Visualize the GeoJSON data
if "gdf" in locals() and not gdf_state.empty:
    # Create a simple plot
    fig, ax = plt.subplots(1, 1, figsize=(12, 8))
    gdf_state.plot(ax=ax, color="lightblue", edgecolor="black", linewidth=0.5)
    ax.set_title("German Federal States (Bundesländer)")
    ax.set_axis_off()
    plt.tight_layout()
    plt.show()

    # Show some basic statistics
    print("\nGeoDataFrame Info:")
    print(f"Shape: {gdf_state.shape}")
    print(f"Geometry type: {gdf_state.geometry.geom_type.unique()}")
    print(f"Bounds: {gdf_state.total_bounds}")
else:
    print("No geodata loaded to visualize")

## Municipality boundaries load attempt

In [None]:
# Try to load municipality-level data (Gemeinden)
# Try to load German municipality boundaries from a common source
# Let's start with a sample GeoJSON for German states (Bundesländer)

try:
    # Load GeoJSON data using geopandas
    gdf_mun = gpd.read_file("../data/gemeinden_simplify200.geojson")

    print(f"Loaded GeoDataFrame with {len(gdf_mun)} rows")
    print(f"Columns: {list(gdf_mun.columns)}")
    print(f"CRS: {gdf_mun.crs}")

    # Display first few rows
    print("\nFirst 3 rows:")
    display(gdf_mun.head(3))

except Exception as e:
    print(f"Error loading from URL: {e}")
    print("Let's try a different approach...")

In [None]:
print("Rows with different RS and RS_0:", len(gdf_mun[gdf_mun["RS"] != gdf_mun["RS_0"]]))
print("Rows with different RS and SDV_RS:", len(gdf_mun[gdf_mun["RS"] != gdf_mun["SDV_RS"]]))
print("Rows with different AGS and AGS_0:", len(gdf_mun[gdf_mun["AGS"] != gdf_mun["AGS_0"]]))

RS_0, SDV_RS and AGS_0 are identical to RS and AGS columns. We can drop them.

In [None]:
gdf_mun = gdf_mun.drop(columns=["RS_0", "SDV_RS", "AGS_0"])

Data columns explanation:
- RS (Regional key): 12-digit. (2-digit for Land, 1-digit for Regierungsbezirk, 2-digit for Kreis, 4-digit for Verwaltungsgemeinschaft, 3-digit for Gemeinde)
- AGS (Official municipality key): 8-digit official municipality key
- GEN (Geographical name): official name of the administrative unit
- BEZ (Official designation): official designation of the administrative unit like "Stadt", "Landkreis", etc.
- Destatis (Destatis data): includes area in square meters and population numbers

In [None]:
gdf_mun[gdf_mun["AGS"].str.startswith("0200")]

In [None]:
# Visualize the GeoJSON data
# Create a simple plot
fig, ax = plt.subplots(1, 1, figsize=(12, 8))
gdf_mun.plot(ax=ax, color="lightblue", edgecolor="black", linewidth=0.5)
ax.set_title("German Municipalities (Gemeinden)")
ax.set_axis_off()
plt.tight_layout()
plt.show()

# Show some basic statistics
print("\nGeoDataFrame Info:")
print(f"Shape: {gdf_mun.shape}")
print(f"Geometry type: {gdf_mun.geometry.geom_type.unique()}")
print(f"Bounds: {gdf_mun.total_bounds}")

## Link to municipality data

### Using RS (Regional key) for mapping

In [None]:
from geoscore_de.data_flow.municipality import load_municipality_data

df_muni = load_municipality_data("../data/raw/municipalities_2022.csv")
df_muni.head()

In [None]:
df_merged = gdf_mun.merge(df_muni, left_on="RS", right_on="MU_ID", how="outer", indicator=True)

In [None]:
df_merged.drop_duplicates(subset=["RS", "MU_ID"])["_merge"].value_counts()

There is a lot of unmapped municipalities in the data. This is probably because of the Verbandsgemeinde level in RS key. We also have AGS key in the data, which does not have Verbandsgemeinde level. So lets try mapping with AGS key instead of RS key.

### Using AGS mapping

In [None]:
df_merged = gdf_mun.merge(df_muni, left_on="AGS", right_on="AGS", how="outer", indicator=True)
df_merged.columns

In [None]:
df_merged.drop_duplicates(subset=["AGS"])["_merge"].value_counts()

The counts are much better now. There is still 472 municipalities that are only in geojson data.

## Municipality only in GeoJSON format

In [None]:
df_merged[df_merged["_merge"] == "left_only"][["RS", "AGS", "GEN", "BEZ"]]

## Municipality only in municipality format

In [None]:
df_merged[df_merged["_merge"] == "right_only"][
    ["AGS", "Municipality", "Persons", "Area", "Population density", "_merge"]
].sort_values("AGS")