# Project 2:  Holiday weather

by duncan wekesa, november 26,  2018, 18 



## Getting the data

Weather Underground keeps historical weather data collected in many airports around the world. Right-click on the following URL and choose 'Open Link in New Window' (or similar, depending on your browser):

http://www.wunderground.com/history

When the new page opens start typing 'LHR' in the 'Location' input box and when the pop up menu comes up with the option 'LHR, United Kingdom' select it and then click on 'Submit'. 

When the next page opens with London Heathrow data, click on the 'Custom' tab and select the time period From: 1 January 2014 to: 31 December 2014 and then click on 'Get History'. The data for that year should then be displayed further down the page. 

You can copy each month's data directly from the browser to a text editor like Notepad or TextEdit, to obtain a single file with as many months as you wish.

Weather Underground has changed in the past the way it provides data and may do so again in the future. 
I have therefore collated the whole 2014 data in the provided 'London_2014.csv' file. 

Now load the CSV file into a dataframe making sure that any extra spaces are skipped:

In [36]:
import warnings
warnings.simplefilter('ignore', FutureWarning)

from pandas import *
capetown = read_csv('CapeTown_CPT_2014.csv', skipinitialspace=True)
capetown.head()
capetown[(capetown['Mean TemperatureC']>25)]

Unnamed: 0,Date,Max TemperatureC,Mean TemperatureC,Min TemperatureC,Dew PointC,MeanDew PointC,Min DewpointC,Max Humidity,Mean Humidity,Min Humidity,...,Max VisibilityKm,Mean VisibilityKm,Min VisibilitykM,Max Wind SpeedKm/h,Mean Wind SpeedKm/h,Max Gust SpeedKm/h,Precipitationmm,CloudCover,Events,WindDirDegrees<br />
29,2014-1-30,31,26,21,19,17,15,73,59,40,...,31.0,13.0,10.0,37,27,,0.0,1.0,,169<br />
30,2014-1-31,35,28,20,20,18,15,73,56,28,...,10.0,10.0,10.0,24,13,,0.0,1.0,,205<br />
44,2014-2-14,33,26,19,20,18,15,83,60,36,...,31.0,14.0,10.0,35,21,,0.0,1.0,,172<br />
45,2014-2-15,33,28,22,21,19,18,78,64,36,...,26.0,16.0,10.0,34,16,,0.0,1.0,,195<br />
46,2014-2-16,36,28,20,21,19,17,83,61,38,...,31.0,22.0,10.0,29,18,,0.0,1.0,,202<br />
47,2014-2-17,29,26,22,21,19,16,88,69,39,...,31.0,14.0,9.0,34,13,,0.0,4.0,,205<br />
48,2014-2-18,31,27,22,21,20,18,94,68,43,...,31.0,12.0,10.0,42,31,,0.0,1.0,,174<br />
55,2014-2-25,33,26,19,17,16,14,73,55,30,...,31.0,14.0,10.0,40,26,,0.0,1.0,,158<br />
56,2014-2-26,34,27,20,17,16,14,69,54,26,...,31.0,16.0,10.0,24,14,,0.0,1.0,,203<br />
100,2014-4-11,36,26,15,17,14,12,82,58,17,...,26.0,18.0,10.0,21,6,,0.0,,,114<br />


## Cleaning the data
First we need to clean up the data. I'm not going to make use of `'WindDirDegrees'` in my analysis, but you might in yours so we'll rename `'WindDirDegrees< br />'` to `'WindDirDegrees'`. 

In [37]:
capetown = capetown.rename(columns={'WindDirDegrees<br />' : 'WindDirDegrees'})

remove the  `< br />`  html line breaks from the values in the `'WindDirDegrees'` column. 

In [38]:
capetown['WindDirDegrees'] = capetown['WindDirDegrees'].str.rstrip('<br />')

and change the values in the `'WindDirDegrees'` column to `float64`:

In [39]:
capetown['WindDirDegrees'] = capetown['WindDirDegrees'].astype('float64')   

We definitely need to change the values in the `'GMT'` column into values of the `datetime64`  date type.

In [40]:
capetown['Date'] = to_datetime(capetown['Date'])

We also need to change the index from the default to the `datetime64` values in the `'GMT'` column so that it is easier to pull out rows between particular dates and display more meaningful graphs: 

In [41]:
capetown.index = capetown['Date']

## Finding a summer break

According to meteorologists, summer extends for the whole months of June, July, and August in the northern hemisphere and the whole months of December, January, and February in the southern hemisphere. So as I'm in the northern hemisphere I'm going to create a dataframe that holds just those months using the `datetime` index, like this:

In [42]:
summer = capetown.loc[datetime(2014,12,1) : datetime(2014,2,28)]

I now look for the days with warm temperatures.

In [46]:
summer[summer['Mean TemperatureC'] >=25]

Unnamed: 0_level_0,Date,Max TemperatureC,Mean TemperatureC,Min TemperatureC,Dew PointC,MeanDew PointC,Min DewpointC,Max Humidity,Mean Humidity,Min Humidity,...,Max VisibilityKm,Mean VisibilityKm,Min VisibilitykM,Max Wind SpeedKm/h,Mean Wind SpeedKm/h,Max Gust SpeedKm/h,Precipitationmm,CloudCover,Events,WindDirDegrees
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1


 Best to see a graph of the temperature and look for the warmest period.

So next we tell Jupyter to display any graph created inside this notebook:

In [44]:
%matplotlib inline

Now let's plot the `'Mean TemperatureC'` for the summer:

In [45]:
summer['Mean TemperatureC'].plot(grid=True, figsize=(10,5))

TypeError: Empty 'DataFrame': no numeric data to plot

Well looking at the graph the second half of July looks good for mean temperatures over 20 degrees C so let's also put precipitation on the graph too:

In [47]:
summer[['Mean TemperatureC', 'Precipitationmm']].plot(grid=True, figsize=(10,5))

TypeError: Empty 'DataFrame': no numeric data to plot

The second half of July is still looking good, with just a couple of peaks showing heavy rain. Let's have a closer look by just plotting mean temperature and precipitation for July.  

In [None]:
july = summer.loc[datetime(2014,7,1) : datetime(2014,7,31)]
july[['Mean TemperatureC', 'Precipitationmm']].plot(grid=True, figsize=(10,5))

Yes, second half of July looks pretty good, just two days that have significant rain, the 25th and the 28th and just one day when the mean temperature drops below 20 degrees, also the 28th.

## Conclusions

The graphs have shown the volatility of a British summer, but a couple of weeks were found when the weather wasn't too bad in 2014. Of course this is no guarantee that the weather pattern will repeat itself in future years. To make a sensible prediction we would need to analyse the summers for many more years. By the time you have finished this course you should be able to do that.