# Bonus: Temperature Analysis I

In [1]:
import pandas as pd
from datetime import datetime as dt

In [7]:
# "tobs" is "temperature observations"
hawaii_df = pd.read_csv('Resources/hawaii_measurements.csv')
hawaii_df

Unnamed: 0,station,date,prcp,tobs
0,USC00519397,2010-01-01,0.08,65
1,USC00519397,2010-01-02,0.00,63
2,USC00519397,2010-01-03,0.00,74
3,USC00519397,2010-01-04,0.00,76
4,USC00519397,2010-01-06,,73
...,...,...,...,...
19545,USC00516128,2017-08-19,0.09,71
19546,USC00516128,2017-08-20,,78
19547,USC00516128,2017-08-21,0.56,76
19548,USC00516128,2017-08-22,0.50,76


In [8]:
# Convert the date column format from string to datetime
hawaii_df['date'] = pd.to_datetime(hawaii_df['date'], infer_datetime_format=True)

In [9]:
# Set the date column as the DataFrame index
hawaii_df = hawaii_df.set_index(hawaii_df['date'])
hawaii_df.head()

Unnamed: 0_level_0,station,date,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-01-01,USC00519397,2010-01-01,0.08,65
2010-01-02,USC00519397,2010-01-02,0.0,63
2010-01-03,USC00519397,2010-01-03,0.0,74
2010-01-04,USC00519397,2010-01-04,0.0,76
2010-01-06,USC00519397,2010-01-06,,73


In [16]:
# Drop the date column
hawaii_df = hawaii_df.drop(columns='date')
hawaii_df.head()

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-01-01,USC00519397,0.08,65
2010-01-02,USC00519397,0.0,63
2010-01-03,USC00519397,0.0,74
2010-01-04,USC00519397,0.0,76
2010-01-06,USC00519397,,73


### Compare June and December data across all years 

In [17]:
from scipy import stats

In [19]:
# Filter data for desired months
# Create df with just june dates
june_df = hawaii_df[hawaii_df.index.month==6]
june_df

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-06-01,USC00519397,0.00,78
2010-06-02,USC00519397,0.01,76
2010-06-03,USC00519397,0.00,78
2010-06-04,USC00519397,0.00,76
2010-06-05,USC00519397,0.00,77
...,...,...,...
2017-06-26,USC00516128,0.02,79
2017-06-27,USC00516128,0.10,74
2017-06-28,USC00516128,0.02,74
2017-06-29,USC00516128,0.04,76


In [20]:
#same process but for december
december_df = hawaii_df[hawaii_df.index.month==12]
december_df

Unnamed: 0_level_0,station,prcp,tobs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-12-01,USC00519397,0.04,76
2010-12-03,USC00519397,0.00,74
2010-12-04,USC00519397,0.00,74
2010-12-06,USC00519397,0.00,64
2010-12-07,USC00519397,0.00,64
...,...,...,...
2016-12-27,USC00516128,0.14,71
2016-12-28,USC00516128,0.14,71
2016-12-29,USC00516128,1.03,69
2016-12-30,USC00516128,2.37,65


In [21]:
# Identify the average temperature for June
june_mean = june_df['tobs'].mean()
june_mean

74.94411764705882

In [22]:
# Identify the average temperature for December
december_mean = december_df['tobs'].mean()
december_mean

71.04152933421226

In [24]:
# Create collections of temperature data
june_temps = june_df['tobs']
december_temps = december_df['tobs']

In [26]:
# Run paired t-test
stats.ttest_ind(june_temps, december_temps)

Ttest_indResult(statistic=31.60372399000329, pvalue=3.9025129038616655e-191)

### Analysis

Technically and from a statistical standpoint, there is a statistically signficant difference between the mean temperature in June and the mean temperature in Decemeber in Hawaii. In order to evaluate whether or not the difference in means was statistically significant, the independent t-test was used as there were two separate "populations" of temperatures to compare to one another. And since the p-value from this calculation resulted in being less than 0.05, we can conclude that the difference in means is statistically significant.  
 However, I don't believe the difference between the average temperature in June vs. December is meaningful from a "real-life" standpoint. The means for the two months are only 3 degrees part, which in reality is miniscule. Furthermore, in both months, the mean temperature is nearly equal to the median temperature, so the weather is very consistent and the mean is a very good representation of the typical weather. Therefore, it's not like theres a bunch of outliers that are skewing the data and creating a misrepresentation. In many places, the difference between the mean temperature in June vs. December is going to be a lot larger than that. Therefore, I would still fully agree with the notion that Hawaii enjoys mild weather all year around, and I think many others would as well. And not that it really matters, but I would give almost anything to live in an environment where the average temperature in June and December is only 3 degrees apart, that's plenty mild enough for me!