# Practicals for lecture 1.4

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vigji/python-cimec/blob/main/practicals/Practicals_1.4.ipynb)

## More on `pandas` and plotting

In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

#### 1.4.0 `.groupby()` and index broadcasting

In [25]:
# Take the meteo dataset using the function below
def get_meteo_dataset():
    """Get a meteo dataset from the open-meteo API using a fixed window.
    Note how easy it is to get data from the web with pandas! As long as we give the URL of the csv data, pandas can read it.
    """
    np.random.seed(42)
    URL = "https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&hourly=temperature_2m,relativehumidity_2m,precipitation,windspeed_10m,winddirection_10m&start_date=2023-02-10&end_date=2023-05-28&format=csv"
    df = pd.read_csv(URL, skiprows=3)  # read the csv file, skipping the first 3 rows (a header)
    df.columns = [col.split(" ")[0] for col in df.columns]  # simplify column names
    df["time"] = pd.to_datetime(df["time"])  # convert the time column to datetime
    df["hour"], df["dayofyear"] = df["time"].dt.hour, df["time"].dt.dayofyear  # extract the hour and day of year

    # Here we artificially corrupt some of the data to make it more interesting
    missing_idx = np.random.choice(df.index[:1000], 100)
    df.loc[missing_idx, :] = np.nan
    return df

# This new meteo dataset has columns for the day of the year, hour of the day, and day of the week.
# Check out the data and make sure we don't have missing values!
ds = get_meteo_dataset()
ds = ds.drop("time", axis=1)
ds = ds.interpolate()

In [26]:
ds

Unnamed: 0,temperature_2m,relativehumidity_2m,precipitation,windspeed_10m,winddirection_10m,hour,dayofyear
0,0.5,69.0,0.0,8.60,255.0,0.0,41.0
1,0.5,73.0,0.0,8.55,256.5,1.0,41.0
2,0.5,77.0,0.0,8.50,258.0,2.0,41.0
3,0.5,79.0,0.0,10.30,258.0,3.0,41.0
4,0.6,84.0,0.0,10.50,239.0,4.0,41.0
...,...,...,...,...,...,...,...
2587,22.5,39.0,0.0,5.20,196.0,19.0,148.0
2588,20.8,45.0,0.0,4.30,180.0,20.0,148.0
2589,19.1,50.0,0.0,3.80,163.0,21.0,148.0
2590,17.5,53.0,0.0,4.00,153.0,22.0,148.0


In [23]:
# Use the .groupby() method to compute the mean temperature for each hour of the day:
ds.groupby("hour").mean()["temperature_2m"]

hour
0.0      6.845714
1.0      6.477451
2.0      6.113208
3.0      6.030612
4.0      5.722857
5.0      5.774286
6.0      6.239216
7.0      6.896190
8.0      7.987619
9.0      9.501980
10.0    10.588235
11.0    11.089623
12.0    11.678846
13.0    12.241748
14.0    12.252885
15.0    12.110577
16.0    11.479245
17.0    10.839252
18.0    10.263208
19.0     9.501942
20.0     8.748077
21.0     8.166667
22.0     7.773077
23.0     7.152830
Name: temperature_2m, dtype: float64

In [31]:
# Using pandas index broadcasting, subtract from each day of the year its average temperature, and plot
# the result to check if it makes sense.
mean_temps = ds.groupby("dayofyear").mean()["temperature_2m"]

ds.set_index("dayofyear")["temperature_2m"] - mean_temps

dayofyear
41.0    -2.016667
41.0    -2.016667
41.0    -2.016667
41.0    -2.016667
41.0    -1.916667
           ...   
148.0    2.820833
148.0    1.120833
148.0   -0.579167
148.0   -2.179167
148.0   -3.779167
Name: temperature_2m, Length: 2592, dtype: float64

#### 1.4.1 `.rolling()`

In [5]:
# Compute the maximum and the minimum of the temperature using a rolling window of 24 samples.
# Plot the original curve and the smoothed one to check what you did:



#### 1.4.2 Plotting with `matplotlib`


In [None]:
# Use matplotlib obj oriented interface to create a figure and some axes, and scatter the wind speed vs the relative humidity.
# Specify an alpha value < 1 (transparency) to get a better sense of the density of the points.

# Use methods of the axis object to set the axes names, and restrict the wind speed axis between 0 and 30 km/h


In [9]:
# Create two plots one on top of the other using matplotlib's subplots function.

# In the top plot, show both the unsmoothed and the smoothed temperature series.

# In the bottom plot, show the relative humidity series.

# Hint: you can pass the sharex=True argument to the subplots function to make the two plots share the same x axis:



In [None]:
# Create a 2 x 2 grid of plots using matplotlib's subplots function.

# In each plot, we will show the correlation of some variables with the temperature. Temperature will always be on the x axis,
# and the other variable on the y axis.

# Top left should show the correlation of temperature with relative humidity, top right with wind speed,
# bottom left with precipitation, bottom right with time of the day.

# Try to use a for loop to iterate over the axes of the grid, and use the .scatter() method to plot the data!
