# Time Series End-to-End Exercise
***

### Earth Surface Temperature Analysis

- You will use the earth surface temperature data offered by Berkeley Earth through Kaggle.com. 
- You will select one location, could be a city or a state or something relatively similar in size and analyze the patterns in temperature over time. 
- You will then model those patterns to forecast temperature into the future (how far is up to you, but should be something meaningful).

# Acquire

#### We will obtain data from this kaggle competition:
https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data

In [2]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np

from datetime import datetime
from sklearn.metrics import mean_squared_error
from math import sqrt

import matplotlib.pyplot as plt
#%matplotlib inline
import seaborn as sns
from pandas.plotting import register_matplotlib_converters

import statsmodels.api as sm
from statsmodels.tsa.api import Holt


In [3]:
#Read local .csv downloaded from kaggle.com and store data in a dataframe
df = pd.read_csv('GlobalLandTemperaturesByMajorCity.csv')
df.head()

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
0,1849-01-01,26.704,1.435,Abidjan,Côte D'Ivoire,5.63N,3.23W
1,1849-02-01,27.434,1.362,Abidjan,Côte D'Ivoire,5.63N,3.23W
2,1849-03-01,28.101,1.612,Abidjan,Côte D'Ivoire,5.63N,3.23W
3,1849-04-01,26.14,1.387,Abidjan,Côte D'Ivoire,5.63N,3.23W
4,1849-05-01,25.427,1.2,Abidjan,Côte D'Ivoire,5.63N,3.23W


# Prepare

In [4]:
#Look at data to see what preparation and cleaning needs to be done

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 239177 entries, 0 to 239176
Data columns (total 7 columns):
 #   Column                         Non-Null Count   Dtype  
---  ------                         --------------   -----  
 0   dt                             239177 non-null  object 
 1   AverageTemperature             228175 non-null  float64
 2   AverageTemperatureUncertainty  228175 non-null  float64
 3   City                           239177 non-null  object 
 4   Country                        239177 non-null  object 
 5   Latitude                       239177 non-null  object 
 6   Longitude                      239177 non-null  object 
dtypes: float64(2), object(5)
memory usage: 12.8+ MB


In [7]:
#Looking for nulls
df.isna().sum()

dt                                   0
AverageTemperature               11002
AverageTemperatureUncertainty    11002
City                                 0
Country                              0
Latitude                             0
Longitude                            0
dtype: int64

In [12]:
df[df.AverageTemperature.isna() == True]

Series([], dtype: int64)

In [11]:
df.AverageTemperature.min(), df.AverageTemperature.max()

(-26.772, 38.283)

In [17]:
df['City'][df.AverageTemperature.isna() == True].unique()

array(['Abidjan', 'Addis Abeba', 'Ahmadabad', 'Aleppo', 'Alexandria',
       'Ankara', 'Baghdad', 'Bangalore', 'Bangkok', 'Belo Horizonte',
       'Berlin', 'Bogotá', 'Bombay', 'Brasília', 'Cairo', 'Calcutta',
       'Cali', 'Cape Town', 'Casablanca', 'Changchun', 'Chengdu',
       'Chicago', 'Chongqing', 'Dakar', 'Dalian', 'Dar Es Salaam',
       'Delhi', 'Dhaka', 'Durban', 'Faisalabad', 'Fortaleza', 'Gizeh',
       'Guangzhou', 'Harare', 'Harbin', 'Ho Chi Minh City', 'Hyderabad',
       'Ibadan', 'Istanbul', 'Izmir', 'Jaipur', 'Jakarta', 'Jiddah',
       'Jinan', 'Kabul', 'Kano', 'Kanpur', 'Karachi', 'Kiev', 'Kinshasa',
       'Lagos', 'Lahore', 'Lakhnau', 'Lima', 'London', 'Luanda', 'Madras',
       'Madrid', 'Manila', 'Mashhad', 'Melbourne', 'Mogadishu',
       'Montreal', 'Moscow', 'Nagoya', 'Nagpur', 'Nairobi', 'Nanjing',
       'New Delhi', 'New York', 'Paris', 'Peking', 'Pune', 'Rangoon',
       'Rio De Janeiro', 'Riyadh', 'Rome', 'São Paulo',
       'Saint Petersburg', 'Salvad

In [18]:
df.groupby('City').agg({'AverageTemperature': lambda x: x.isnull().sum()})


Unnamed: 0_level_0,AverageTemperature
City,Unnamed: 1_level_1
Abidjan,200.0
Addis Abeba,286.0
Ahmadabad,165.0
Aleppo,190.0
Alexandria,3.0
...,...
Tokyo,5.0
Toronto,98.0
Umm Durman,89.0
Wuhan,1.0


In [20]:
df.groupby('City')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fd1ba02d4f0>

In [21]:
df.isna().sum()

dt                                   0
AverageTemperature               11002
AverageTemperatureUncertainty    11002
City                                 0
Country                              0
Latitude                             0
Longitude                            0
dtype: int64