# Wetterdaten vom Deutschen Wetterdienst, gibt es Trends?
gestellt von Dr. Lauer-Baré für Kurs 21DSB, Fortgeschrittene Programmierung, 2022

* Regen und Temperatur sollen roh und "gelättet" dargestellt werden
* Expamplarisch Daten aus Bundesländern im Süden, Norden, Osten, Westen vergleichen
* Durchschnitt prüfen

## Notebook by: Jan Henrik Bertrand, David Hoffmann, Leonard Jung

## Tools

* pandas: DataFrame, drop, read_csv, rolling, index, plot, etc..
* matplotlib.pylab: Plotlabeling, Fontsize, Save Plots,...
* geopandas: pandas extension for geospacial data.
* numpy: for scientific calculaions with python.
* pathlib: provide script execution paths.
* os: provide system level functionality
* sklearn: used here for its validation metrics

## Quellen

* [Pandas Doc](https://pandas.pydata.org/)
* [Matplotlib subplot](https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.subplots.html)
* [Matplotlib plot](https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.plot.html)
* [Geopandas](https://geopandas.org/en/stable/docs.html)
* [Pathlib](https://docs.python.org/3/library/pathlib.html)
* [Os](https://docs.python.org/3/library/os.html)
* [SKlearn metrics](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_absolute_error.html)
* [State Weights](https://de.statista.com/statistik/daten/studie/154868/umfrage/flaeche-der-deutschen-bundeslaender/)

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#import geopandas as gpd
from pathlib import Path
import os
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

pd.options.plotting.backend = 'plotly' 


First, we need to load the data into the notebook:
* geo_rain_df is a dataset with data from 1881 to 2019 that contains the average amount of rainfall for each German state and entire Germany.
* geo_air_df is a dataset with data from 1881 to 2019 that contains the average temperature per year for each German state and entire Germany.

In [3]:
# Load data into pandas DataFrame
#geo_rain_df = pd.read_csv('data/rainGermanyHistorical.csv', sep=';', header=1, index_col='Jahr') # Loads data into a so called DataFrame which is similar to an Excel sheet
geo_air_df =  pd.read_csv('data/airGermany_historical.csv', sep=';', header=1, index_col='Jahr')

In [None]:
# Prepare DataFrame
geo_rain_df.replace(" ", "", inplace=True) # replaces white spaces with nothing in entire DataFrame
geo_air_df.replace(" ", "", inplace=True)

# Drop column "Jahr.1" containing the string "year" and the empty one at the end
geo_rain_df.drop(labels=["Jahr.1", "Unnamed: 19"], axis=1, inplace=True) # .drop() is used to remove the columns along the axis 1, which are the columns
geo_air_df.drop(labels=["Jahr.1", "Unnamed: 19"], axis=1, inplace=True) # inplace means that the operation is performed on the actual DataFrame itself, not on a copy

In [None]:
geo_rain_df

In [None]:
geo_rain_df.plot()

In [None]:
geo_rain_df.rolling(40, 3).mean().plot.bar()

In [None]:
geo_rain_df.T.plot.bar()


# 1.) Visualizing rain and temperature raw and smothed 

In [None]:
def visualize(x_air_iterable, y_air_iterable, x_rain_iterable, y_rain_iterable, col_title=None,
              rolling_window_size=40, mov_avg=True, reg=True, pred_year=None) -> None:
    """
    Creates a plot for weather and rainfall data using matplotlib.
    It also creates a moving average and a polynomial regression of degree 2 from the data.
    """

    # Prepare X-values according to pred_year if pred_year is higher than both max x values
    if pred_year and (pred_year > min(max(x_air_iterable), max(x_rain_iterable))):
        # Generates additional x values for all the year to be predicted
        x_air_extension = [i+max(x_air_iterable)+1 for i in range(pred_year-max(x_air_iterable))]
        x_rain_extension = [i+max(x_rain_iterable)+1 for i in range(pred_year-max(x_rain_iterable))]

        # Merges the extensions with exisiting x-values after converting them into a pandas Series (similar to a list)
        x_air_extended = pd.concat([x_air_iterable, pd.Series(x_air_extension)])
        x_rain_extended = pd.concat([x_rain_iterable, pd.Series(x_rain_extension)])

        if not reg:
            print("WARN: You are trying to get predictions with regression disabled!")
    else:
        # Assign the raw iterables to the variables used for visualization if not predictions should be made
        x_air_extended = x_air_iterable
        x_rain_extended = x_rain_iterable


    # Create Figure, Axes and Title
    fig, (ax1, ax2) = plt.subplots(2, 1) # .subplots() creates two plots inside the matplotlib window that are ordered in a column and two rows
    fig.suptitle(col_title) # Sets the title for the figure (which contains the plots)
    fig.canvas.manager.set_window_title(f"{col_title} - Air Temp. & Rainfall") # Sets the matplotlib window (containing the figure) title

    # Regression for air temperature
    model_air = np.poly1d(np.polyfit(x_air_iterable, y_air_iterable, deg=2)) # np.polyfit() does the polynomial regression, while np.poly1d() converts the regression results to a model that can be used for predictions

    # Plot according to function parameters
    ax1.plot(x_air_iterable, y_air_iterable, color='red', marker='.', label=f"{col_title} - Air Temp.") # plots temperature values on the first and upper plot
    if mov_avg:
        ax1.plot(x_air_iterable, y_air_iterable.rolling(rolling_window_size, 3).mean(), 
                 color='black', marker='', label=f"{col_title} - Air Temp. Moving Avg.") # Plots the moving average by using a rolling window (using .rolling())
                                                                                         # For each window, the mean vlaue is calculated which is then used to plot the moving average.
    if reg:
        ax1.plot(x_air_extended, model_air(x_air_extended), color='orange', linewidth=2, label=f"{col_title} - Air Temp. Regression") # Uses the regression model to create regression line
    ax1.set_ylabel("Avg. Temperature") # Setting the label for the y axis
    ax1.legend(loc='upper left') # Generating the legend located in the upper left of the plot
    ax1.grid() # Grid for better readability

    # Regression for rainfall - similar use of methods & functions as above
    model_rain = np.poly1d(np.polyfit(x_rain_iterable, y_rain_iterable, deg=2))

    # Plot according to function parameters
    ax2.plot(x_rain_iterable, y_rain_iterable, color='blue', marker='.', label=f"{col_title} - Rainfall")
    if mov_avg:
        ax2.plot(x_rain_iterable, y_rain_iterable.rolling(rolling_window_size, 3).mean(),
                 color='black', marker='', label=f"{col_title} - Rainfall Moving Avg.")
    if reg:
        ax2.plot(x_rain_extended, model_rain(x_air_extended), color='orange', linewidth=2, label=f"{col_title} - Rainfall Regression")
    ax2.set_xlabel("Year") # Setting a label for the x axis - only below the lower plot to save space
    ax2.set_ylabel("Avg. Rainfall")
    ax2.legend(loc='upper left')
    ax2.grid() # Grid for better readability
    

    # Save plot into plots directory - create directory if it does not exist
    try:
        cwd = os.getcwd() # retrieves the current working directory which is where the script is executed in our case
        Path(f"{cwd}/plots").mkdir(parents=True, exist_ok=True) # Creates a new folder "plots" using mkdir if it does not exist already
        fig.set_size_inches(15, 10) # Sets size of the matplotlib window for later saving to avoid to much of the plots to be hidden behind the legend
        col_title_save = col_title.replace("/", "-") # Replace "/" in the title for the file name
        plt.savefig(f"{cwd}/plots/air-rainfall-plot-{col_title_save}", dpi=72) # Saves the figure with the plots in it
        print(f"Figure for {col_title} saved to \"<execution directory>/plots\"")
    except Exception as e:
        print("Exception: ", e)
        print("Creating/accessing the directory \"<execution directory>/plots\" failed. Please ensure you are running the script with admin permissions and in the right directory.")

    # Display
    fig.set_size_inches(14, 7) # Reset the figure size for additional display using the matplotlib window
    plt.show() # Displays the plots using matplotlib

In [None]:
# Set X and Y iterables for first state
col_title_state=geo_air_df.columns[1] # Retrieves the name of the column with the index 1 - the 2nd column
x_air = geo_air_df['Jahr'] # Extract the column "Jahr"
y_air_state = geo_air_df[col_title_state] # Extract the temperature data for the given state name

x_rain = geo_rain_df['Jahr'] # Same for the rain data
y_rain_state = geo_rain_df[col_title_state]

visualize(x_air, y_air_state, x_rain, y_rain_state, col_title_state, pred_year=2100) # Calling the function to visulaize the data for the first state

In [None]:
# Set X and Y iterables for Germany - x axis remains the same
col_title_country = "Deutschland" # Same as above for entire Germany
y_air_state = geo_air_df[col_title_country]
y_rain_state = geo_rain_df[col_title_country]

visualize(x_air, y_air_state, x_rain, y_rain_state, col_title_country, pred_year=2100)