---
title: "motivation"
execute:
  freeze: auto  # re-render only when source changes
  echo: false  # hide all of the executable source code
---

## Jerusalem, 2019

Data from the [Israel Meteorological Service](https://ims.gov.il/en/data_gov){target="_blank"}, IMS.

See the temperature at a weather station in Jerusalem, for the whole 2019 year. This is an interactive graph: to zoom in, play with the bottom panel.

In [30]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import altair as alt
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
import warnings
# Suppress FutureWarnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import seaborn as sns
sns.set(style="ticks", font_scale=1.5)  # white graphs, with large and legible letters
%matplotlib widget

In [16]:
filename = "../archive/data/jerusalem2019.csv"
df = pd.read_csv(filename, na_values=['-'])
df['date'] = pd.to_datetime(df['Date & Time (Winter)'], dayfirst=True)
df.rename(columns={'Temperature (°C)': 'temperature',
                   'Rainfall (mm)': 'rain'}, inplace=True)

In [23]:
alt.data_transformers.disable_max_rows()

# Altair only recognizes column data; it ignores index values.
# You can plot the index data by first resetting the index
# I know that I've just made 'DATE' the index, but I want to have this here nonetheless so I can refer to this in the future
df_new = df.reset_index()#.replace({0.0:np.nan})
source = df_new[['date', 'temperature']]

brush = alt.selection(type='interval', encodings=['x'])

base = alt.Chart(source).mark_line(color='orange').encode(
    x = 'date:T',
        y=alt.Y('temperature:Q', axis=alt.Axis(title='temperature (°C)'))  # Custom Y-Label here
).properties(
    width=600,
    height=200
)

upper = base.encode(
    alt.X('date:T', scale=alt.Scale(domain=brush)),
    alt.Y('temperature:Q', scale=alt.Scale(domain=(0,40)), axis=alt.Axis(title='temperature (°C)'))
).properties(
    title='Jerusalem, Givat Ram station'
)

lower = base.properties(
    height=60
).add_selection(brush)

alt.vconcat(upper, lower)

<div class="alert alert-info">##### {{< iconify cil chat-bubble >}} discussion {.unnumbered}

The temperature fluctuates on various time scales, from daily to yearly. Let's think together a few questions we'd like to ask about the data above.</div>

Now let's see precipitation data:

In [20]:
alt.data_transformers.disable_max_rows()

# Altair only recognizes column data; it ignores index values.
# You can plot the index data by first resetting the index
# I know that I've just made 'DATE' the index, but I want to have this here nonetheless so I can refer to this in the future
df_new = df.reset_index()#.replace({0.0:np.nan})
source = df_new[['date', 'rain']]

brush = alt.selection(type='interval', encodings=['x'])

base = alt.Chart(source).mark_bar().encode(
    x = 'date:T',
        y=alt.Y('rain:Q', axis=alt.Axis(title='rain (mm)'))  # Custom Y-Label here
).properties(
    width=600,
    height=200
)

upper = base.encode(
    alt.X('date:T', scale=alt.Scale(domain=brush)),
    alt.Y('rain:Q', scale=alt.Scale(domain=(0,3.5)), axis=alt.Axis(title='rain (mm)'))
).properties(
    title='Jerusalem, Givat Ram station'
)

lower = base.properties(
    height=60
).add_selection(brush)

alt.vconcat(upper, lower)

<div class="alert alert-info">##### {{< iconify cil chat-bubble >}} discussion {.unnumbered}

What would be interesting to know about precipitation?</div>

We have not talked about what kind of data we have in our hands here. The csv file provided by the IMS looks like this:

In [69]:
df = pd.read_csv(filename, na_values=['-'])
df

Unnamed: 0,Station,Date & Time (Winter),Diffused radiation (W/m^2),Global radiation (W/m^2),Direct radiation (W/m^2),Relative humidity (%),Temperature (°C),Maximum temperature (°C),Minimum temperature (°C),Wind direction (°),Gust wind direction (°),Wind speed (m/s),Maximum 1 minute wind speed (m/s),Maximum 10 minutes wind speed (m/s),Time ending maximum 10 minutes wind speed (hhmm),Gust wind speed (m/s),Standard deviation wind direction (°),Rainfall (mm)
0,Jerusalem Givat Ram,01/01/2019 00:00,0.0,0.0,0.0,80.0,8.7,8.8,8.6,75.0,84.0,3.3,4.3,3.5,23:58,6.0,15.6,0.0
1,Jerusalem Givat Ram,01/01/2019 00:10,0.0,0.0,0.0,79.0,8.7,8.8,8.7,74.0,82.0,3.3,4.1,3.3,00:01,4.9,14.3,0.0
2,Jerusalem Givat Ram,01/01/2019 00:20,0.0,0.0,0.0,79.0,8.7,8.8,8.7,76.0,82.0,3.2,4.1,3.3,00:19,4.9,9.9,0.0
3,Jerusalem Givat Ram,01/01/2019 00:30,0.0,0.0,0.0,79.0,8.7,8.7,8.6,78.0,73.0,3.6,4.2,3.6,00:30,5.2,11.7,0.0
4,Jerusalem Givat Ram,01/01/2019 00:40,0.0,0.0,0.0,79.0,8.6,8.7,8.5,80.0,74.0,3.6,4.4,3.8,00:35,5.4,10.5,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
52549,Jerusalem Givat Ram,31/12/2019 22:20,0.0,0.0,1.0,81.0,7.4,7.6,7.3,222.0,255.0,0.5,0.9,1.0,22:11,1.0,47.9,0.0
52550,Jerusalem Givat Ram,31/12/2019 22:30,0.0,0.0,1.0,83.0,7.3,7.4,7.3,266.0,259.0,0.6,0.8,0.6,22:28,1.1,22.8,0.0
52551,Jerusalem Givat Ram,31/12/2019 22:40,0.0,0.0,1.0,83.0,7.5,7.6,7.3,331.0,317.0,0.5,0.8,0.6,22:35,1.0,31.6,0.0
52552,Jerusalem Givat Ram,31/12/2019 22:50,0.0,0.0,1.0,83.0,7.5,7.6,7.4,312.0,285.0,0.6,1.0,0.6,22:50,1.4,31.3,0.0


We see that we have data points spaced out evenly every 10 minutes.

## Challenges

Let's try to answer the following questions:

::: {.callout-note collapse="true" icon=false}
## {{< iconify fa question-circle-o >}} What is the mean temperature for each month?

First we have to divide temperature data by month, and then take the average for each month.

<details>
<summary>a possible solution</summary>
```{.python}
df_month = df['temperature'].resample('M').mean()
```
</details>
:::

::: {.callout-note collapse="true" icon=false}
## {{< iconify fa question-circle-o >}}  For each month, what is the mean of the daily maximum temperature? What about the minimun?

This is a bit trickier.

1. We need to find the maximum/minimum temperature for each day.
1. Only then we split the daily data by month and take the average.

<details>
<summary>a possible solution</summary>
```{.python}
df_day['max temp'] = df['temperature'].resample('D').max()
df_month['max temp'] = df_day['max temp'].resample('MS').mean()
```
</details>
:::

::: {.callout-note collapse="true" icon=false}
## {{< iconify fa question-circle-o >}} What is the average night temperature for every season? What about the day temperature?

1. We need to filter our data to contain only night times.
1. We need to divide rain data by seasons (3 months), and then take the mean for each season.

<details>
<summary>a possible solution</summary>
```{.python}
solution
```
</details>
:::

::: {.callout-note collapse="true" icon=false}
## {{< iconify fa question-circle-o >}} What is the daily precipitation?

First we have to divide rain data by day, and then take the sum for each day.
:::

::: {.callout-note collapse="true" icon=false}
## {{< iconify fa question-circle-o >}} How much rain was there every month?

We have to divide rain data by month, and then sum the totals of each month.

:::

::: {.callout-note collapse="true" icon=false}
## {{< iconify fa question-circle-o >}} How many rainy days were there each month?

1. We need to sum rain by day.
1. We need to count how many days are there each month where `rain > 0`.

:::

::: {.callout-note collapse="true" icon=false}
## {{< iconify fa question-circle-o >}} How many days, hours, and minutes were between the last rain of the season (Malkosh) to the first (Yore'h)?

1. We need to divide our data into two: `rainy_season_1` and `rainy_season_2`.
1. We need to find the time of the last rain in `rainy_season_1`.
1. We need to find the time of the first rain in `rainy_season_2`.
1. We need to compute the time difference between the two dates.

:::

::: {.callout-note collapse="true"}
## {{< iconify fa question-circle-o >}} What was the rainiest morning (6am-12pm) of the year? Bonus, what about the rainiest night (6pm-6am)?

1. We need to filter our data to contain only morning times.
1. We need to sum rain by day.
1. We need to find the day with the maximum value.

:::

Note: this whole webpage is actually a Jupyter Notebook rendered as html. If you want to know how to make interactive graphs, go to the top of the page and click on "{{< iconify teenyicons code-solid >}} Code"

Useful functions compatible with `pandas.resample()` can be found [here](https://pandas.pydata.org/docs/reference/resampling.html#computations-descriptive-stats). The full list of resampling frequencies can be found [here](https://pandas.pydata.org/pandas-docs/version/0.12.0/timeseries.html#offset-aliases).