---
title: Research of federal election 2021
date: now
author: Jan Cap
---

Data about election results for 2021 federal election in Germany is available at:
- https://www.bundeswahlleiterin.de/en/bundestagswahlen/2021/ergebnisse/weitere-ergebnisse.html
- granularity on voting district level

In [None]:
import pandas as pd

dtype_mappings = {
    "Land": str,
    "Regierungsbezirk": str,
    "Kreis": str,
    "Verbandsgemeinde": str,
    "Gemeinde": str,
}

df_elec = pd.read_csv("../../data/raw/features/election_2021/btw21_wbz_ergebnisse.csv", sep=";", dtype=dtype_mappings)
df_elec = df_elec.dropna()

# Add municipality codes from Land, Regierungsbezirk, Kreis, and Gemeinde columns as strings
df_elec["municipality_code"] = (
    df_elec["Land"]
    + df_elec["Regierungsbezirk"]
    + df_elec["Kreis"]
    + df_elec["Verbandsgemeinde"].str.zfill(4)
    + df_elec["Gemeinde"].str.zfill(3)
)

df_elec["AGS"] = df_elec["Land"] + df_elec["Regierungsbezirk"] + df_elec["Kreis"] + df_elec["Gemeinde"].str.zfill(3)

print(f"Number of municipalities in election data: {df_elec['municipality_code'].nunique()}")
print(f"Number of rows in election data: {len(df_elec)}")

df_elec.head(10)

## Load municipality data and compare unique municipality codes

In [None]:
from geoscore_de.data_flow.municipality import load_municipality_data

df_muni = load_municipality_data("../../data/raw/municipalities_2022.csv")
df_muni

In [None]:
print(f"Unique municipality codes in municipality data: {df_muni['AGS'].nunique()}")

In [None]:
# first group election data by AGS to remove duplicates
df_elec_grouped = df_elec.groupby("AGS").first().reset_index()
print(f"Number of rows after grouping election data: {len(df_elec_grouped)}")
# join data and check for missing municipalities
df_merged = df_elec_grouped.merge(df_muni, left_on="AGS", right_on="AGS", how="outer", indicator=True)

In [None]:
df_merged["_merge"].value_counts()

### Extra municipalities in election data

Lets check municipalities present in election data but missing in official municipality data.

In [None]:
df_merged[df_merged["_merge"] == "left_only"]

In [None]:
# check how many of the extra municipalities have 9 at sixth position of AGS code
df_merged[(df_merged["_merge"] == "left_only") & (df_merged["AGS"].str[5] == "9")]

In [None]:
# Check Michaelisdonn municipalities in voting data
df_elec_grouped[df_elec_grouped["Gemeinde Name"].str.contains("Michaelisdonn")]

In [None]:
# check Michaelisdonn in municipality data
df_muni[df_muni["Municipality"].str.contains("Michaelisdonn")]

There are two municipalities for `Michaelisdonn` in the election data. First one represents city itself, second one represents surrounding area. However, in the municipality data there is only one entry for `Michaelisdonn`, representing the city itself. This causes mismatch when merging the datasets. 
We can try to ignore all municipalities with number 9 at sixth position of the AGS code, as these represent surrounding areas.

We can also see that from extra 645 municipalities in the election data, 621 of them have 9 at sixth position of AGS code. If we ignore these, we will have almost perfect match between the datasets.


### Missing Municipalities in Election Data

Now we focus on missing municipalities in the election data. There are 151 such municipalities.

In [None]:
df_merged[(df_merged["_merge"] == "right_only")]

#### Missing Friedrichsgabekoog

In [None]:
df_elec_grouped[df_elec_grouped["Gemeinde Name"].str.contains("Friedrichsgabekoog")]

In [None]:
df_muni[df_muni["Municipality"].str.contains("Friedrichsgabekoog")]

In [None]:
df_muni[df_muni["Municipality"].str.contains("Reinsb端ttel")]

Here we can see that one municipality in voting data is `Reinsb端ttel (einschl. Friedrichsgabekoog)` and it represents two municipalities in the official municipality data: `Reinsb端ttel` and `Friedrichsgabekoog`. This causes mismatch when merging the datasets. At least the AGS codes match for one of the municipalities.
Should we try to add the voting results for `Reinsb端ttel (einschl. Friedrichsgabekoog)` to both municipalities in the official municipality data? 
It would be more complicated to implement, but would lead to more accurate results overall.

### Hillgroven

In [None]:
df_elec_grouped[df_elec_grouped["Gemeinde Name"].str.contains("Hillgroven")]

In [None]:
df_muni[df_muni["Municipality"].str.contains("Norddeich")]

In [None]:
df_muni[df_muni["Municipality"].str.contains("Hillgroven")]

Same here. One municipality in voting data is `Norddeich (einschl. Hillgroven)` representing two municipalities in official municipality data: `Norddeich` and `Hillgroven`.


### Other merged municipalities

In [None]:
df_elec_grouped[df_elec_grouped["Gemeinde Name"].str.contains("einschl.")]

There is 240 municipalities in the election data with `einschl.` in their name, representing multiple municipalities in the official municipality data. 
It is even more than count of missing municipalities (151). So it seems that all missing municipalities can be found in these merged entries.