# Visualizing Local Weather

I thought it would be interesting to visualize daily changes in local weather (temperature, precipitation, etc.) over the years. 

After searching around online, I found [*Climate Data Online*](https://www.ncei.noaa.gov/cdo-web/) found on NOAA's National Centers for Environmental Information website. I chose the type of data I wanted and the date range (from 1950 to the present), requested this data, and then downloaded from a link they sent me. Then I ran the following code to create a series of visualizations.

1. Import necessary packages.

In [1]:
import pandas as pd
import pathlib, glob
from pathlib import Path

import matplotlib.pyplot as plt
import seaborn as sns

2. Import the data. *Note: or you can request data of your choosing from [*Climate Data Online*](https://www.ncei.noaa.gov/cdo-web/).

In [2]:
# Alternative Code for working with this notebook on Colab
#! wget https://git.dartmouth.edu/lib-digital-strategies/RDS/workshops/intro-to-python/-/archive/master/intro-to-python-master.zip

#from google.colab import files
#files.upload()

In [3]:
weather_path = Path("~/shared/RR-workshop-data/weather").expanduser() 
weather_df = pd.read_csv(Path(weather_path, "Hanover_maxtemps_1950-2022.csv"))
weather_df

Unnamed: 0.1,Unnamed: 0,STATION,NAME,LATITUDE,LONGITUDE,ELEVATION,DATE,DAPR,DASF,MDPR,...,WT08,WT09,WT11,WT14,month,day,year,TMAX72,diff_from72avg,daynum
0,0,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/1/1950,,,,...,,,,,1,1,1950,28.591549,3.408451,1
1,1,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/1/1951,,,,...,,,,,1,1,1951,28.591549,1.408451,1
2,2,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/1/1952,,,,...,,,,,1,1,1952,28.591549,14.408451,1
3,3,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/1/1953,,,,...,,,,,1,1,1953,28.591549,-0.591549,1
4,4,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/1/1954,,,,...,,,,,1,1,1954,28.591549,-10.591549,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26134,26134,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,2/29/2004,,,,...,,,,,2,29,2004,35.888889,14.111111,366
26135,26135,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,2/29/2008,,,,...,,,,,2,29,2008,35.888889,-12.888889,366
26136,26136,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,2/29/2012,,,,...,,,,,2,29,2012,35.888889,-7.888889,366
26137,26137,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,2/29/2016,,,,...,,,,,2,29,2016,35.888889,16.111111,364


**Note: I created the final six columns in this dataframe. I created the "day", "month", and "year" columns by splitting the date in the "DATE" column into three separate parts. "daynum" then records the number of the day in the annual calendar (so January 1 is 1 and Dec. 31 is 365). Finally, "TMAX72" is the average max temperature for the 73 years between Jan 1, 1950 and Dec 31, 2022 (should really be "TMAX73"!) and "diff_from72avg" is the difference between a particular day's max temp and the average for that day over the previous 72 years.**

3. Let's sort this dataset in chronological order.

In [4]:
weather_df.sort_values(by = ["year", "daynum"])

Unnamed: 0.1,Unnamed: 0,STATION,NAME,LATITUDE,LONGITUDE,ELEVATION,DATE,DAPR,DASF,MDPR,...,WT08,WT09,WT11,WT14,month,day,year,TMAX72,diff_from72avg,daynum
0,0,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/1/1950,,,,...,,,,,1,1,1950,28.591549,3.408451,1
72,72,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/2/1950,,,,...,,,,1.0,1,2,1950,29.830986,3.169014,2
144,144,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/3/1950,,,,...,,,,,1,3,1950,30.263889,11.736111,3
217,217,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/4/1950,,,,...,,,,,1,4,1950,30.847222,29.152778,4
290,290,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/5/1950,,,,...,,,,,1,5,1950,30.291667,30.708333,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25839,25839,USC00273850,"HANOVER, NH US",,,,12/27/2022,,,,...,,,,,12,27,2022,29.441176,-0.441176,361
25908,25908,USC00273850,"HANOVER, NH US",,,,12/28/2022,,,,...,,,,,12,28,2022,30.573529,-1.573529,362
25978,25978,USC00273850,"HANOVER, NH US",,,,12/29/2022,,,,...,,,,,12,29,2022,31.318841,12.681159,363
26049,26049,USC00273850,"HANOVER, NH US",,,,12/30/2022,,,,...,,,,,12,30,2022,29.757143,18.242857,364


3b. Let's save this sorted dataframe into memory.

In [5]:
weather_df = weather_df.sort_values(by = ["year", "daynum"])

4. Let's calculate some summary information for our local weather station in Hanover (from 1950 - 2022).

In [6]:
weather_df['TMAX'].mean()

56.739629572937396

In [7]:
weather_df.head()

Unnamed: 0.1,Unnamed: 0,STATION,NAME,LATITUDE,LONGITUDE,ELEVATION,DATE,DAPR,DASF,MDPR,...,WT08,WT09,WT11,WT14,month,day,year,TMAX72,diff_from72avg,daynum
0,0,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/1/1950,,,,...,,,,,1,1,1950,28.591549,3.408451,1
72,72,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/2/1950,,,,...,,,,1.0,1,2,1950,29.830986,3.169014,2
144,144,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/3/1950,,,,...,,,,,1,3,1950,30.263889,11.736111,3
217,217,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/4/1950,,,,...,,,,,1,4,1950,30.847222,29.152778,4
290,290,USC00273850,"HANOVER, NH US",43.7052,-72.2855,178.0,1/5/1950,,,,...,,,,,1,5,1950,30.291667,30.708333,5


5. First, let's plot the max temperature for 2022 and compare that to the 73 year average. But, in this case we will use **Plotly Express** instead of **matplotlib** and **seaborn** to create interactive plots.

In [8]:
weather_df2022 = weather_df[weather_df['year'] == 2022]
weather_df2022.shape

(365, 32)

In [9]:
import plotly.express as px
#simple_df = flat_avg_df[(flat_avg_df.site_name == 'ALBANY AIRPORT') & (flat_avg_df.datatype == 'tmax')]
fig = px.bar(weather_df2022, x='daynum', y='diff_from72avg', color='diff_from72avg')
fig.show()

6. With one line of code, we can also create facet grids plotting all years in the 21st century against the 73 year average. *From a quick visual overview, which year(s) appears to have been the hottest in the last 22 years?*

In [10]:
fig = px.bar(weather_df[weather_df['year']>1999], x='daynum', y='diff_from72avg', color='diff_from72avg', facet_row="year", height=6000, hover_data = ["month", "day", "TMAX"], facet_row_spacing=0.01)
fig.show()

7. We can also aggregate all 21st century years and then compare them to the 73 year average.


In [11]:
weather_df21C = weather_df[weather_df['year']>=2000]
weather_df21Cagg = weather_df21C.groupby(by = ['day', 'month']).agg({'TMAX':'mean', 'TMAX72':'first', 'daynum': 'first'})
weather_df21Cagg['diff_from72avg'] = weather_df21Cagg['TMAX'] - weather_df21Cagg['TMAX72']
weather_df21Cagg.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,TMAX,TMAX72,daynum,diff_from72avg
day,month,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1,32.217391,28.591549,1,3.625842
1,2,30.913043,30.194444,32,0.718599
1,3,36.869565,37.619718,60,-0.750153
1,4,51.565217,51.492958,87,0.07226
1,5,64.608696,65.902778,117,-1.294082


7b. Observe below how warming temperatures in this century become much more clear when we look at the average for this century versus the average going back to 1950. (note: we are contrasting the average max temp for 2000 - 2022 with that for 1950 - 2022; with more time, it may make more sense to contrast 2000 - 2022 to 1950 - **2000**.)

In [12]:
fig = px.bar(weather_df21Cagg, x='daynum', y='diff_from72avg', color='diff_from72avg')
fig.show()

8. The same code, but focusing on 2010 - 2022.

In [13]:
weather_df2010on = weather_df[weather_df['year']>=2010]
weather_df2010on = weather_df2010on.groupby(by = ['day', 'month']).agg({'TMAX':'mean', 'TMAX72':'first', 'daynum': 'first'})
weather_df2010on['diff_from72avg'] = weather_df2010on['TMAX'] - weather_df2010on['TMAX72']
fig = px.bar(weather_df2010on, x='daynum', y='diff_from72avg', color='diff_from72avg')
fig.show()