# Practicals for lecture 1.4

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vigji/python-cimec/blob/main/practicals/Practicals_1.4.ipynb)

## More on `pandas`

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

#### 1.4.0 `.groupby()` and index broadcasting

In [None]:
# Take the meteo dataset using the function below
def get_meteo_dataset():
    """Get a meteo dataset from the open-meteo API using a fixed window.
    Note how easy it is to get data from the web with pandas! As long as we give the URL of the csv data, pandas can read it.
    """
    np.random.seed(42)
    URL = "https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m,relativehumidity_2m,precipitation,windspeed_10m,winddirection_10m&start_date=2023-02-01&end_date=2023-05-28&format=csv"
    df = pd.read_csv(URL, skiprows=3)  # read the csv file, skipping the first 3 rows (a header)
    df.columns = [col.split(" ")[0] for col in df.columns]  # simplify column names
    df["time"] = pd.to_datetime(df["time"])  # convert the time column to datetime
    df["hour"], df["dayofyear"] = df["time"].dt.hour, df["time"].dt.dayofyear  # extract the hour and day of year

    # Here we artificially corrupt some of the data to make it more interesting
    missing_idx = np.random.choice(df.index[:1000], 100)
    df.loc[missing_idx, :] = np.nan
    return df

# This new meteo dataset has columns for the day of the year, hour of the day, and day of the week.
# Check out the data and make sure we don't have missing values!
ds = get_meteo_dataset()

In [None]:
# Use the .groupby() method to compute the mean temperature for each hour of the day:
ds

In [None]:
# Using pandas index broadcasting, subtract from each day of the year the average temperature:


#### 1.4.1 `.rolling()`

In [None]:
# Compute the maximum and the minimum of the temperature using a rolling window of 24 samples, and show them in a plot:
plt.figure()
plt.plot(ds["temperature_2m"])
plt.plot(ds["temperature_2m"].rolling(24).mean())


#### 1.4.2 Plotting with `matplotlib`


In [None]:
# Use matplotlib obj oriented interface to create a figure and some axes, and scatter the wind speed vs the relative humidity.
# Specify an alpha value < 1 (transparency) to get a better sense of the density of the points.

# Use methods of the axis object to set the axes names, and restrict the wind speed axis between 0 and 30 km/h


In [None]:
# Create two plots one on top of the other using matplotlib's subplots function.

# In the top plot, show both the unsmoothed and the smoothed temperature series.

# In the bottom plot, show the relative humidity series.

# Hint: you can pass the sharex=True argument to the subplots function to make the two plots share the same x axis:



In [None]:
# Create a 2 x 2 grid of plots using matplotlib's subplots function.

# In each plot, we will show the correlation of some variables with the temperature. Temperature will always be on the x axis,
# and the other variable on the y axis.

# Top left should show the correlation of temperature with relative humidity, top right with wind speed,
# bottom left with precipitation, bottom right with time of the day.

# Try to use a for loop to iterate over the axes of the grid, and use the .scatter() method to plot the data!
