# Assignment 2

An NOAA dataset has been stored in the file `data/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv`. The data for this assignment comes from a subset of The National Centers for Environmental Information (NCEI) [Daily Global Historical Climatology Network](https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt) (GHCN-Daily). The GHCN-Daily is comprised of daily climate records from thousands of land surface stations across the globe.

* **id** : station identification code
* **date** : date in YYYY-MM-DD format (e.g. 2012-01-24 = January 24, 2012)
* **element** : indicator of element type
    * TMAX : Maximum temperature (tenths of degrees C)
    * TMIN : Minimum temperature (tenths of degrees C)
* **value** : data value for element (tenths of degrees C)

1. Read the documentation and familiarize yourself with the dataset, then write some python code which returns a line graph of the record high and record low temperatures by day of the year over the period 2005-2014. The area between the record high and record low temperatures for each day should be shaded.
2. Overlay a scatter of the 2015 data for any points (highs and lows) for which the ten year record (2005-2014) record high or record low was broken in 2015.
3. Watch out for leap days (i.e. February 29th), it is reasonable to remove these points from the dataset for the purpose of this visualization.
4. Make the visual nice! Leverage principles from the first module in this course when developing your solution. Consider issues such as legends, labels, and chart junk.

The data you have been given is near **Ann Arbor, Michigan, United States**, and the stations the data comes from are shown on the map below.

In [2]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
#import mplleaflet

def leaflet_plot():
    df= pd.read_csv('Jupyter Projects/applied_data_science/data_visualization/data/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv')
    
    df.set_index('Element',inplace=True)
    
    ##create two dataframes based on min and max
    df_Max = df.loc['TMAX'].rename(columns={'Data_Value':'TMAX'})
    df_Min = df.loc['TMIN'].rename(columns={'Data_Value':'TMIN'})
    df = pd.merge(df_Max, df_Min, on=["Date",'ID'], how='inner')
    df = df[ df.Date.str.contains("-02-29") == False ]
    df["Month-Day"] = df.apply(lambda x: "-".join(x["Date"].split("-")[1:]), axis=1)
    df['Date']= pd.to_datetime(df['Date'])
    df = df.sort_values(by="Date")

    ##create a data frame of values before 2015
    df_bef2015 = df[df['Date']<'2015-01-01']
    
    ##Create df of values in 2015
    df_2015 = df[df['Date']>='2015-01-01']
    df_2015 = df_2015[df_2015['Date']<'2016-01-01']
 
    ##grouping based on the month and day
    tmax = df_bef2015.groupby("Month-Day").max().drop('TMIN',axis=1)
    tmin = df_bef2015.groupby("Month-Day").min().drop('TMAX',axis=1)

    tmax_2015 = df_2015.groupby("Month-Day").max().drop('TMIN',axis=1)
    tmin_2015 = df_2015.groupby("Month-Day").min().drop('TMAX',axis=1)

    ##reset the index and then merge dataframes
    tmin_2015 = tmin_2015.reset_index()
    df_min= tmin.reset_index().merge(tmin_2015, on="Month-Day").set_index("Month-Day")

    tmax_2015 = tmax_2015.reset_index()
    df_max= tmax.reset_index().merge(tmax_2015, on="Month-Day").set_index("Month-Day")
    
    ##renames the columns to reduce confusion
    df_max.rename(columns={"TMAX_x":"Tmax", "TMAX_y" : "Tmax 2015"}, inplace=True)
    df_min.rename(columns={"TMIN_x":"Tmin", "TMIN_y" : "Tmin 2015"}, inplace=True)
    
    ##apply a function which checks if the 2015 values are higher/ lower than the extremes
    df_min["Tmin 2015"] = df_min.apply(lambda row: row["Tmin 2015"] if (row["Tmin 2015"] < row["Tmin"]) else np.NaN , axis=1)
    df_max["Tmax 2015"] = df_max.apply(lambda row: row["Tmax 2015"] if (row["Tmax 2015"] > row["Tmax"]) else np.NaN , axis=1)
    
    ##returns both dataframes
    return df_min, df_max

##places the returned index into new variables
df_min, df_max = leaflet_plot()

##sort and reset the index to be plotted
df_min = df_min.sort_index()
df_max = df_max.sort_index()
df_min = df_min.reset_index()
df_max = df_max.reset_index()

FileNotFoundError: [Errno 2] No such file or directory: 'Jupyter Projects/applied_data_science/data_visualization/data/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv'

In [24]:
##Checking how one of the dataframes looks
df_min.head()

Unnamed: 0,Month-Day,ID_x,Date_x,Tmin,ID_y,Date_y,Tmin 2015
0,01-01,USC00200032,2005-01-01,-160,USC00200032,2015-01-01,
1,01-02,USC00200032,2005-01-02,-267,USC00200032,2015-01-02,
2,01-03,USC00200032,2005-01-03,-267,USC00200032,2015-01-03,
3,01-04,USC00200032,2005-01-04,-261,USC00200032,2015-01-04,
4,01-05,USC00200032,2005-01-05,-150,USC00200032,2015-01-05,-155.0


In [22]:
%matplotlib notebook

In [23]:
import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

days = mdates.DayLocator()   # every day
years = mdates.YearLocator()   # every year
months = mdates.MonthLocator()  # every month
fmt = mdates.DateFormatter('%m-%d')

fig, ax = plt.subplots()

##plot the min and max lines
ax.plot(df_min.index, df_min["Tmin"], label="Min", c='orange')
ax.plot(df_max.index, df_max["Tmax"], label="Max",c='purple')


# format the tick marks
ax.xaxis.set_major_formatter(fmt)

##plot the 2015 min and max values
ax.scatter(df_min.index, df_min["Tmin 2015"], s=10, c="blue", alpha=0.5, label="Min (2015)", zorder=10)
ax.scatter(df_max.index, df_max["Tmax 2015"], s=10, c="red", alpha=0.5, label="Max (2015)", zorder=10)

##Fill area between the min and max lines
ax.fill_between(df_min.index, df_min["Tmin"], df_max["Tmax"], facecolor='green')

# Simplify the graph by adding a legend, removing some of the tick marks, and removing the spine on the top and right
#The spines on the bottom and left were kept for axis visualization.
#The grid is set to false since the plot is interactive
##the x axis labels are rotated 45 degrees for readability
plt.legend(loc=9, bbox_to_anchor=(0.5, -0.25), ncol=4, frameon = False)
ax.tick_params(top='off', bottom='on', left='off', right='off', labelleft='on', labelbottom='on')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.grid(False)
plt.xticks(np.arange(1, 365, 30), rotation=45)

# set the labels and title
plt.xlabel("Date")
plt.ylabel("Temperature (C)")
plt.title("Daily Minimum and Maximum\nAnn Arbor, Michigan")

#Adjust the bottom of the plot so the legend can be read
fig.subplots_adjust(bottom=0.35)

##shows the plot
plt.show()

<IPython.core.display.Javascript object>