# Bonus: Temperature Analysis I

In [1]:
import pandas as pd
from datetime import datetime as dt

In [2]:
# "tobs" is "temperature observations"
hawaii_df = pd.read_csv('Resources/hawaii_measurements.csv')
hawaii_df

Unnamed: 0,station,date,prcp,tobs
0,USC00519397,2010-01-01,0.08,65
1,USC00519397,2010-01-02,0.00,63
2,USC00519397,2010-01-03,0.00,74
3,USC00519397,2010-01-04,0.00,76
4,USC00519397,2010-01-06,,73
...,...,...,...,...
19545,USC00516128,2017-08-19,0.09,71
19546,USC00516128,2017-08-20,,78
19547,USC00516128,2017-08-21,0.56,76
19548,USC00516128,2017-08-22,0.50,76


In [3]:
# Convert the date column format from string to datetime
hawaii_df['date'] = pd.to_datetime(hawaii_df['date'], infer_datetime_format=True)

In [4]:
# Set the date column as the DataFrame index
hawaii_df = hawaii_df.set_index(hawaii_df['date'])
hawaii_df.head()

Unnamed: 0_level_0,station,date,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-01-01,USC00519397,2010-01-01,0.08,65
2010-01-02,USC00519397,2010-01-02,0.0,63
2010-01-03,USC00519397,2010-01-03,0.0,74
2010-01-04,USC00519397,2010-01-04,0.0,76
2010-01-06,USC00519397,2010-01-06,,73


In [5]:
# Drop the date column
hawaii_df = hawaii_df.drop(columns='date')
hawaii_df.head()

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-01-01,USC00519397,0.08,65
2010-01-02,USC00519397,0.0,63
2010-01-03,USC00519397,0.0,74
2010-01-04,USC00519397,0.0,76
2010-01-06,USC00519397,,73


### Compare June and December data across all years 

In [9]:
from scipy import stats

In [10]:
# Filter data for desired months
# Create df with just june dates
june_df = hawaii_df[hawaii_df.index.month==6]
june_df

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-06-01,USC00519397,0.00,78
2010-06-02,USC00519397,0.01,76
2010-06-03,USC00519397,0.00,78
2010-06-04,USC00519397,0.00,76
2010-06-05,USC00519397,0.00,77
...,...,...,...
2017-06-26,USC00516128,0.02,79
2017-06-27,USC00516128,0.10,74
2017-06-28,USC00516128,0.02,74
2017-06-29,USC00516128,0.04,76


In [11]:
#same process but for december
december_df = hawaii_df[hawaii_df.index.month==12]
december_df

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-12-01,USC00519397,0.04,76
2010-12-03,USC00519397,0.00,74
2010-12-04,USC00519397,0.00,74
2010-12-06,USC00519397,0.00,64
2010-12-07,USC00519397,0.00,64
...,...,...,...
2016-12-27,USC00516128,0.14,71
2016-12-28,USC00516128,0.14,71
2016-12-29,USC00516128,1.03,69
2016-12-30,USC00516128,2.37,65


In [12]:
# Identify the average temperature for June
june_mean = june_df['tobs'].mean()
june_mean

74.94411764705882

In [13]:
# Identify the average temperature for December
december_mean = december_df['tobs'].mean()
december_mean

71.04152933421226

In [14]:
# Create collections of temperature data
june_temps = june_df['tobs']
december_temps = december_df['tobs']

In [39]:
# Run paired t-test
stats.ttest_rel(june_temps, december_temps)


Ttest_relResult(statistic=31.752653814106903, pvalue=7.2369972334754e-170)

In [37]:
mask = pd.Series([x.year for x in june_temps.index], index=june_temps.index)<2017
june_temps = june_temps[mask]

In [36]:
mask = pd.Series([x.year for x in december_temps.index], index=december_temps.index)<2017
december_temps = december_temps[mask][:1509]

### Analysis

  There is a statistically signficant difference between the mean temperature in June and the mean temperature in Decemeber in Hawaii. In order to evaluate whether or not the difference in means was statistically significant, the paired test was used because we are comparing two "subjects"(which is the temperature data in this case) at different points in time, and both "subjects" come from the same sample.  And since the p-value from this calculation resulted in being less than 0.05, we can reject the null hypothesis, which states that there is no difference between the mean temperature in Hawaii in June vs December. Therefore, we can conclude that the difference in means between the two months is statistically significant.  
 
  However, in regards to the question in the prompt of whether the difference is a "meaningful difference", that is a whole different story. What consitutes a "meaningful difference" and whether Hawaii truly enjoys "mild weather" is very subjective and more based on opinion. And not that it really matters but if one were to consider my opinion, I say that the difference is not meaningful, since the means for the two months are only 3 degrees part, which I consider negligible. I've grown up and currently live in areas whether the difference in temperature between June and December is normally 70 degrees apart. Furthermore, in both months, the mean temperature is nearly equal to the median temperature, so the weather is very consistent and the mean is a very good representation of the typical weather Therefore, I would still fully agree with the notion that Hawaii enjoys mild weather all year around, and I think many others would as well. 