# Global Warming Analysis - Geospatial Data Analysis Project
Weekly challenge: Week 10  
Date: 9/26/2022

## 1. Problem statement / Objective: 
### To understand the global warming thru geospatial data analysis

## 2. Getting the system ready and loading the data

### Loading the packages

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt # for data visualization
import seaborn as sns # for data visualization
import plotly.express as px # for data visualization

from plotly.offline import init_notebook_mode # for data visualization
init_notebook_mode(connected = True)

import warnings
warnings.filterwarnings('ignore')

### Data

For this project we have 5 data files. We will read these files as and when needed.

1. GlobalLandTemperaturesByCity.csv
2. GlobalLandTemperaturesByCountry.csv
3. GlobalLandTemperaturesByMajorCity.csv
4. GlobalLandTemperaturesByState.csv
5. GlobalTemperatures.csv

### Reading the data

In [2]:
file = 'C:/Users/unpat/_Projects/Challenge_AprojectAweek_data_files/Week_10/GlobalLandTemperaturesByCountry.csv'
global_temp_country = pd.read_csv(file)
global_temp_country.head(3)

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,Country
0,1743-11-01,4.384,2.294,Åland
1,1743-12-01,,,Åland
2,1744-01-01,,,Åland


## 3. Understanding and cleaning the data (Data preprocessing)

In this step, we will understand our data and clean it wherever needed. This includes but not limited to meaning of each column, data type of each column, # of rows and columns, statistical summery of the numerical columns,  take care of the null values as well as duplicate values.

In [3]:
# Checking columns of the test data
global_temp_country.columns

Index(['dt', 'AverageTemperature', 'AverageTemperatureUncertainty', 'Country'], dtype='object')

### Description of each column

dt:  
AverageTemperature:  
AverageTemperatureUncertainty:  
Country:

### Data type of each column

In [4]:
global_temp_country.dtypes

dt                                object
AverageTemperature               float64
AverageTemperatureUncertainty    float64
Country                           object
dtype: object

There are two formats of data types:  
* Object:  
* float64:  

We will convert the date data type when needed.

### Shape of the dataset

In [5]:
shape_ = global_temp_country.shape
shape_

(577462, 4)

There are 577462 rows and 4 columns in the dataset global_temp_country.

### Checking the statistical summary of numerical columns

In [6]:
global_temp_country.describe()

Unnamed: 0,AverageTemperature,AverageTemperatureUncertainty
count,544811.0,545550.0
mean,17.193354,1.019057
std,10.953966,1.20193
min,-37.658,0.052
25%,10.025,0.323
50%,20.901,0.571
75%,25.814,1.206
max,38.842,15.003


The data seems to be right skewed.

### Checking for the null values

In [7]:
# Count of null values
isna_ = global_temp_country.isna().sum()
isna_

dt                                   0
AverageTemperature               32651
AverageTemperatureUncertainty    31912
Country                              0
dtype: int64

In [8]:
# % of null values for percent_avg_temp
percent_avg_temp = isna_[1] / shape_[0] * 100
percent_avg_temp

5.654224866744478

In [9]:
# % of null values for percent_avg_temp_uncertain
percent_avg_temp_uncertain = isna_[2] / shape_[0] * 100
percent_avg_temp_uncertain

5.5262510779930105

* There are less than 6% of missing rows in two variables (AverageTemperature, AverageTemperatureUncertainty) each.  
* Deleting these rows won't affect too much to our analysis so I will delete these rows before moving forward.  
* AverageTemperature and AverageTemperatureUncertainty are related to each other so I will have to delete rows from only one column.

In [10]:
global_temp_country.dropna(
    axis = 'index',
    how = 'any',
    subset=['AverageTemperature'],
    inplace = True
)

In [11]:
# Checking for null values after deleting rows with null values
global_temp_country.isna().sum()

dt                               0
AverageTemperature               0
AverageTemperatureUncertainty    0
Country                          0
dtype: int64

We now do not have any missing values in our data

### Review of the country column: # of countries, duplicate values

In [12]:
# Checking for # of countries
global_temp_country['Country'].nunique()

242

There are 242 unique countries in our data.

In [13]:
# Checking of there are any duplicate values.
global_temp_country['Country'].unique()

array(['Åland', 'Afghanistan', 'Africa', 'Albania', 'Algeria',
       'American Samoa', 'Andorra', 'Angola', 'Anguilla',
       'Antigua And Barbuda', 'Argentina', 'Armenia', 'Aruba', 'Asia',
       'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
       'Baker Island', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
       'Belize', 'Benin', 'Bhutan', 'Bolivia',
       'Bonaire, Saint Eustatius And Saba', 'Bosnia And Herzegovina',
       'Botswana', 'Brazil', 'British Virgin Islands', 'Bulgaria',
       'Burkina Faso', 'Burma', 'Burundi', "Côte D'Ivoire", 'Cambodia',
       'Cameroon', 'Canada', 'Cape Verde', 'Cayman Islands',
       'Central African Republic', 'Chad', 'Chile', 'China',
       'Christmas Island', 'Colombia', 'Comoros',
       'Congo (Democratic Republic Of The)', 'Congo', 'Costa Rica',
       'Croatia', 'Cuba', 'Curaçao', 'Cyprus', 'Czech Republic',
       'Denmark (Europe)', 'Denmark', 'Djibouti', 'Dominica',
       'Dominican Republic', 'Ecuador', 'Egypt'

From the above country name array, we can see that there are a few countries with more than one name so we will have to correct these country names.

In [14]:
dict_ = {
    'Congo (Democratic Republic Of The)':'Congo',
    'Denmark (Europe)':'Denmark',
    'France (Europe)':'France',
    'Netherlands (Europe)':'Netherlands',
    'United Kingdom (Europe)':'United Kingdom'
}

In [15]:
global_temp_country['Country'].replace(dict_, inplace = True)

## 4. Exploratory Data Analysis (EDA)

### Calculate the average temperate of each country

In [16]:
avg_temp = global_temp_country.groupby(['Country'])['AverageTemperature'].mean()
avg_temp

Country
Afghanistan       14.045007
Africa            24.074203
Albania           12.610646
Algeria           22.985112
American Samoa    26.611965
                    ...    
Western Sahara    22.319818
Yemen             26.253597
Zambia            21.282956
Zimbabwe          21.117547
Åland              5.291383
Name: AverageTemperature, Length: 237, dtype: float64

In [17]:
# Converting to a dataframe and resetting the index
avg_temp = avg_temp.to_frame().reset_index()
avg_temp

Unnamed: 0,Country,AverageTemperature
0,Afghanistan,14.045007
1,Africa,24.074203
2,Albania,12.610646
3,Algeria,22.985112
4,American Samoa,26.611965
...,...,...
232,Western Sahara,22.319818
233,Yemen,26.253597
234,Zambia,21.282956
235,Zimbabwe,21.117547


In [18]:
# Plotting choropleth map
fig = px.choropleth(
    data_frame=avg_temp,
    locations='Country',
    locationmode='country names',
    color='AverageTemperature',
    title='Choropleth Map of Average Temperature',
)
# fig.update_layout(title='Choropleth Map of Average Temperature')
fig.show()

### Is there a global warming?

To see the effect of global warming, we need to know how the temperature changed over a long period of time. Global warming is a slow process and its effect can only be felt over a long period of time. We have monthly temperature data of many countries in the world. We will analyze this data and visualize the effect of global warming in this section.

#### Loading the 'GlobalTemperatures.csv' data file

In [19]:
file = 'C:/Users/unpat/_Projects/Challenge_AprojectAweek_data_files/Week_10\GlobalTemperatures.csv'
global_temp = pd.read_csv(file)
global_temp.head(5)

Unnamed: 0,dt,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty
0,1750-01-01,3.034,3.574,,,,,,
1,1750-02-01,3.083,3.702,,,,,,
2,1750-03-01,5.626,3.076,,,,,,
3,1750-04-01,8.49,2.451,,,,,,
4,1750-05-01,11.573,2.072,,,,,,


In [20]:
global_temp.tail(5)

Unnamed: 0,dt,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty
3187,2015-08-01,14.755,0.072,20.699,0.11,9.005,0.17,17.589,0.057
3188,2015-09-01,12.999,0.079,18.845,0.088,7.199,0.229,17.049,0.058
3189,2015-10-01,10.801,0.102,16.45,0.059,5.232,0.115,16.29,0.062
3190,2015-11-01,7.433,0.119,12.892,0.093,2.157,0.106,15.252,0.063
3191,2015-12-01,5.518,0.1,10.725,0.154,0.287,0.099,14.774,0.062


Date is associated with the global warming. To see the effect of global warming, we have to know the history of temperature (how the temperature varied over time). Our data goes back to a few hundred years so we would be working with the annual temperature variations.

In [21]:
# Concept of separating year from the date
global_temp['dt'][0].split('-')[0]

'1750'

In [22]:
# Function that will fetch year from date
def fetch_year(date):
    return date.split('-')[0]

In [23]:
# Applying fetch_year function to the dt column of the dataframe
global_temp['years'] = global_temp['dt'].apply(fetch_year)

In [24]:
global_temp.head()

Unnamed: 0,dt,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty,years
0,1750-01-01,3.034,3.574,,,,,,,1750
1,1750-02-01,3.083,3.702,,,,,,,1750
2,1750-03-01,5.626,3.076,,,,,,,1750
3,1750-04-01,8.49,2.451,,,,,,,1750
4,1750-05-01,11.573,2.072,,,,,,,1750


In [25]:
# Grouping the data by years and aggregating 2 columns that we want to work on
data = global_temp.groupby('years').agg({'LandAverageTemperature':'mean',
                                  'LandAverageTemperatureUncertainty':'mean'}).reset_index()
data.head()

Unnamed: 0,years,LandAverageTemperature,LandAverageTemperatureUncertainty
0,1750,8.719364,2.637818
1,1751,7.976143,2.781143
2,1752,5.779833,2.977
3,1753,8.388083,3.176
4,1754,8.469333,3.49425


In [26]:
data['Uncertainity Top'] = data['LandAverageTemperature'] + \
    data['LandAverageTemperatureUncertainty']

data['Uncertainity Bottom'] = data['LandAverageTemperature'] - \
    data['LandAverageTemperatureUncertainty']

In [27]:
data.head()

Unnamed: 0,years,LandAverageTemperature,LandAverageTemperatureUncertainty,Uncertainity Top,Uncertainity Bottom
0,1750,8.719364,2.637818,11.357182,6.081545
1,1751,7.976143,2.781143,10.757286,5.195
2,1752,5.779833,2.977,8.756833,2.802833
3,1753,8.388083,3.176,11.564083,5.212083
4,1754,8.469333,3.49425,11.963583,4.975083


In [28]:
data.columns

Index(['years', 'LandAverageTemperature', 'LandAverageTemperatureUncertainty',
       'Uncertainity Top', 'Uncertainity Bottom'],
      dtype='object')

In [29]:
px.line(
    data_frame=data,
    x='years',
#    y=['LandAverageTemperature', 'Uncertainity Top', 'Uncertainity Bottom'],
    y=['LandAverageTemperature'],    
    render_mode='auto',
    title='Average Land Temperature in the World',
    width=800,
    height=450,
)

From the above plot, we can see that the LandAverageTemperature has gone up in the last 50 years or so. The upward trend looks like starting from around 1970 and shows a gradual increase in the temperature. Our analysis clearly shows that there is global warming in the world.

### Analysis of average temperature in each season

In [30]:
global_temp.head()

Unnamed: 0,dt,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty,years
0,1750-01-01,3.034,3.574,,,,,,,1750
1,1750-02-01,3.083,3.702,,,,,,,1750
2,1750-03-01,5.626,3.076,,,,,,,1750
3,1750-04-01,8.49,2.451,,,,,,,1750
4,1750-05-01,11.573,2.072,,,,,,,1750


We do not have any date on season so will have to separate month from date and work for season analysis using month info.

In [31]:
global_temp['dt'].dtype

dtype('O')

In [32]:
# Convert date to datetime format
global_temp['dt'] = pd.to_datetime(global_temp['dt'])

In [33]:
# Separating month from date
global_temp['month'] = global_temp['dt'].dt.month

In [34]:
global_temp.head()

Unnamed: 0,dt,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty,years,month
0,1750-01-01,3.034,3.574,,,,,,,1750,1
1,1750-02-01,3.083,3.702,,,,,,,1750,2
2,1750-03-01,5.626,3.076,,,,,,,1750,3
3,1750-04-01,8.49,2.451,,,,,,,1750,4
4,1750-05-01,11.573,2.072,,,,,,,1750,5


In [35]:
# Function that defines seasons from months of a year
def get_season(month):
    if month >= 3 and month <= 5:
        return 'spring'
    elif month >= 6 and month <= 8:
        return 'summer'
    elif month >= 9 and month <= 11:
        return 'autumn'
    else:
        return 'winter'

In [36]:
# applying the get_season function to the month column
global_temp['season'] = global_temp['month'].apply(get_season)

In [37]:
global_temp.head()

Unnamed: 0,dt,LandAverageTemperature,LandAverageTemperatureUncertainty,LandMaxTemperature,LandMaxTemperatureUncertainty,LandMinTemperature,LandMinTemperatureUncertainty,LandAndOceanAverageTemperature,LandAndOceanAverageTemperatureUncertainty,years,month,season
0,1750-01-01,3.034,3.574,,,,,,,1750,1,winter
1,1750-02-01,3.083,3.702,,,,,,,1750,2,winter
2,1750-03-01,5.626,3.076,,,,,,,1750,3,spring
3,1750-04-01,8.49,2.451,,,,,,,1750,4,spring
4,1750-05-01,11.573,2.072,,,,,,,1750,5,spring


In [38]:
# Unique years in the data
years = global_temp['years'].unique()

In [39]:
# Temp lists to store season temperatures in each year
spring_temps = []
summer_temps = []
autumn_temps = []
winter_temps = []

In [40]:
# Find average temperature of each season in each year
for year in years:
    current_df = global_temp[global_temp['years'] == year]
    spring_temps.append(current_df[current_df['season'] == 'spring']['LandAverageTemperature'].mean())
    summer_temps.append(current_df[current_df['season'] == 'summer']['LandAverageTemperature'].mean())
    autumn_temps.append(current_df[current_df['season'] == 'autumn']['LandAverageTemperature'].mean())
    winter_temps.append(current_df[current_df['season'] == 'winter']['LandAverageTemperature'].mean())

In [41]:
spring_temps

[8.563,
 6.734999999999999,
 7.035499999999999,
 8.627333333333334,
 9.074333333333334,
 8.583666666666666,
 9.466,
 8.604666666666667,
 6.896666666666666,
 6.897333333333333,
 6.653666666666666,
 8.916,
 7.809333333333332,
 6.716,
 8.192,
 8.868666666666668,
 8.432333333333332,
 7.831,
 6.144000000000001,
 8.803333333333333,
 7.132000000000001,
 6.0523333333333325,
 7.148666666666666,
 8.866999999999999,
 10.607,
 9.036666666666667,
 7.522333333333333,
 7.774333333333334,
 8.957999999999998,
 10.370666666666667,
 11.737666666666664,
 7.599,
 7.390999999999998,
 8.397333333333334,
 7.3580000000000005,
 6.173666666666667,
 8.849666666666666,
 7.9576666666666656,
 8.159333333333334,
 7.783,
 6.997333333333333,
 7.9253333333333345,
 7.914666666666666,
 8.248,
 9.146333333333333,
 8.552,
 7.507666666666666,
 7.024333333333334,
 8.953333333333333,
 8.041666666666666,
 8.224666666666666,
 8.660666666666666,
 7.760333333333333,
 8.653666666666666,
 8.863,
 8.328999999999999,
 8.07533333333333

In [42]:
# Create a season dataframe for season temperature of each year
season = pd.DataFrame()

In [43]:
# Populting season dataframe with the year and season temperature info
season['year'] = years
season['spring_temps'] = spring_temps
season['summer_temps'] = summer_temps
season['autumn_temps'] = autumn_temps
season['winter_temps'] = winter_temps

In [44]:
season.head()

Unnamed: 0,year,spring_temps,summer_temps,autumn_temps,winter_temps
0,1750,8.563,14.518333,8.89,2.963
1,1751,6.735,14.116,10.673,1.729
2,1752,7.0355,,7.587,2.717
3,1753,8.627333,14.608333,9.212333,1.104333
4,1754,9.074333,14.208333,8.957333,1.637333


In [45]:
season.columns

Index(['year', 'spring_temps', 'summer_temps', 'autumn_temps', 'winter_temps'], dtype='object')

In [46]:
# Visualizing the seasonal temperature trend
fig = px.line(
    data_frame=season,
    x='year',
    y=['spring_temps', 'summer_temps', 'autumn_temps', 'winter_temps'],
    render_mode='auto',
    title='Average Temperature in Each Season',
    width=None,
    height=None,
)
fig.show()

From the above plot, we can infer that the temperature in each season has started to go up from 1970 and we can conclude that the rise in temperature is due to global warming.

### Analysis of trend in temperatures for the top economies

In [47]:
# Creating a list of top economies
continent = ['Russia','United States','China','Japan','Australia','India']

In [48]:
global_temp_country.head()

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,Country
0,1743-11-01,4.384,2.294,Åland
5,1744-04-01,1.53,4.68,Åland
6,1744-05-01,6.702,1.789,Åland
7,1744-06-01,11.609,1.577,Åland
8,1744-07-01,15.342,1.41,Åland


In [49]:
# separating top economies from global_temp_country dataframe
continent_df = global_temp_country[global_temp_country['Country'].isin(continent)]
continent_df.head()

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,Country
34816,1852-07-01,14.116,1.53,Australia
34817,1852-08-01,15.33,1.4,Australia
34818,1852-09-01,18.74,1.446,Australia
34819,1852-10-01,21.984,1.493,Australia
34820,1852-11-01,24.073,1.466,Australia


In [50]:
# Applying fetch_year function to to get the year from dt column
continent_df['years'] = continent_df['dt'].apply(fetch_year)

In [51]:
# avg_temp = continent_df.groupby(['Country','years']).agg({'AverageTemperature':'mean'}).reset_index()
avg_temp = continent_df.groupby(['years','Country']).agg({'AverageTemperature':'mean'}).reset_index()
avg_temp.head()

Unnamed: 0,years,Country,AverageTemperature
0,1768,United States,5.57275
1,1769,United States,10.4465
2,1774,United States,1.603
3,1775,United States,9.499167
4,1776,United States,8.11


In [52]:
avg_temp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1148 entries, 0 to 1147
Data columns (total 3 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   years               1148 non-null   object 
 1   Country             1148 non-null   object 
 2   AverageTemperature  1148 non-null   float64
dtypes: float64(1), object(2)
memory usage: 27.0+ KB


We currently have years as object so it should be converted to int before we plot a graph

In [53]:
avg_temp['years'] = avg_temp['years'].astype(int)
avg_temp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1148 entries, 0 to 1147
Data columns (total 3 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   years               1148 non-null   int32  
 1   Country             1148 non-null   object 
 2   AverageTemperature  1148 non-null   float64
dtypes: float64(1), int32(1), object(1)
memory usage: 22.5+ KB


In [54]:
fig = px.line(avg_temp, x = 'years', y = 'AverageTemperature', color = 'Country', 
        title = 'Average Land Temperature in the Top Economies')
fig.show()

We can infer that the temperature started to rise in these top economies from around 1970.

In [55]:
# Average Land Temperature in the Top Economies between 1920 to 1970 and 1970 to 2013 for further analysis
fig1 = px.line(avg_temp[600:880], x = 'years', y = 'AverageTemperature', color = 'Country', 
        title = 'Average Land Temperature in the Top Economies - 1920 ~ 1970')
fig2 = px.line(avg_temp[880:], x = 'years', y = 'AverageTemperature', color = 'Country', 
        title = 'Average Land Temperature in the Top Economies - 1970 ~ 2013')

fig1.show()
fig2.show()

For a comparison from the above plots, there is some increase in the temperature during the last 50 years while there is not much increase during the previous 50 years.

### Analysis of average temperatures of US states

In [56]:
file = 'C:/Users/unpat/_Projects/Challenge_AprojectAweek_data_files/Week_10/GlobalLandTemperaturesByState.csv'
global_temp_state = pd.read_csv(file)

In [57]:
global_temp_state.head()

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,State,Country
0,1855-05-01,25.544,1.171,Acre,Brazil
1,1855-06-01,24.228,1.103,Acre,Brazil
2,1855-07-01,24.371,1.044,Acre,Brazil
3,1855-08-01,25.427,1.073,Acre,Brazil
4,1855-09-01,25.675,1.014,Acre,Brazil


In [58]:
filter_ = global_temp_state['Country'] == 'United States'
USA = global_temp_state[filter_]

In [59]:
USA.head()

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,State,Country
7458,1743-11-01,10.722,2.898,Alabama,United States
7459,1743-12-01,,,Alabama,United States
7460,1744-01-01,,,Alabama,United States
7461,1744-02-01,,,Alabama,United States
7462,1744-03-01,,,Alabama,United States


In [60]:
USA.dropna(inplace = True)

In [61]:
USA['State'].unique()

array(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California',
       'Colorado', 'Connecticut', 'Delaware', 'District Of Columbia',
       'Florida', 'Georgia (State)', 'Hawaii', 'Idaho', 'Illinois',
       'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine',
       'Maryland', 'Massachusetts', 'Michigan', 'Minnesota',
       'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada',
       'New Hampshire', 'New Jersey', 'New Mexico', 'New York',
       'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon',
       'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota',
       'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington',
       'West Virginia', 'Wisconsin', 'Wyoming'], dtype=object)

In [62]:
len(USA['State'].unique())

51

In [63]:
# Modifying some of the state names
state = {'Georgia (State)':'Georgia', 'District Of Columbia':'DC'}

In [64]:
USA['State'].replace(state, inplace = True)

In [65]:
USA = USA[['AverageTemperature','State']]

In [66]:
USA.head()

Unnamed: 0,AverageTemperature,State
7458,10.722,Alabama
7463,19.075,Alabama
7464,21.197,Alabama
7465,25.29,Alabama
7466,26.42,Alabama


In [67]:
USA_temp = USA.groupby('State')['AverageTemperature'].mean().reset_index()
USA_temp.head()

Unnamed: 0,State,AverageTemperature
0,Alabama,17.066138
1,Alaska,-4.890738
2,Arizona,15.381526
3,Arkansas,15.573963
4,California,14.327677


In [68]:
# To plot a heatmap, we will need longitude and latitude
# Importing a library to get thr long & lat info
# !pip install opencage

In [69]:
from opencage.geocoder import OpenCageGeocode

In [70]:
# https://opencagedata.com/
key = '77636310ed26440283d389b532bf3d6b'

In [71]:
geocoder_ = OpenCageGeocode(key)

In [72]:
# Getting long & lat of some random location just for checking
location = 'Bijuesca, Spain'
results = geocoder_.geocode(location)
results

[{'annotations': {'DMS': {'lat': "41° 32' 25.83312'' N",
    'lng': "1° 55' 13.28232'' W"},
   'MGRS': '30TWL9005499324',
   'Maidenhead': 'IN91am99nr',
   'Mercator': {'x': -213773.074, 'y': 5064053.763},
   'OSM': {'edit_url': 'https://www.openstreetmap.org/edit?relation=342295#map=17/41.54051/-1.92036',
    'note_url': 'https://www.openstreetmap.org/note/new#map=17/41.54051/-1.92036&layers=N',
    'url': 'https://www.openstreetmap.org/?mlat=41.54051&mlon=-1.92036#map=17/41.54051/-1.92036'},
   'UN_M49': {'regions': {'ES': '724',
     'EUROPE': '150',
     'SOUTHERN_EUROPE': '039',
     'WORLD': '001'},
    'statistical_groupings': ['MEDC']},
   'callingcode': 34,
   'currency': {'alternate_symbols': [],
    'decimal_mark': ',',
    'html_entity': '&#x20AC;',
    'iso_code': 'EUR',
    'iso_numeric': '978',
    'name': 'Euro',
    'smallest_denomination': 1,
    'subunit': 'Cent',
    'subunit_to_unit': 100,
    'symbol': '€',
    'symbol_first': 0,
    'thousands_separator': '.'},
 

In [73]:
results[0]['geometry']['lat']

41.5405092

In [74]:
results[0]['geometry']['lng']

-1.9203562

In [75]:
USA_temp.head(3)

Unnamed: 0,State,AverageTemperature
0,Alabama,17.066138
1,Alaska,-4.890738
2,Arizona,15.381526


In [76]:
# Collecting long & lat info of all the US states from USA_temp dataframe
list_lat = []
list_long = []

# for loop to pull lat & long for each state
for state in USA_temp['State']:
    results = geocoder_.geocode(state)
    lat = results[0]['geometry']['lat']
    long = results[0]['geometry']['lng']

    list_lat.append(lat)
    list_long.append(long)
    

In [77]:
# Inserting list_lat & list_long info into USA_temp dataframe
USA_temp['lat'] = list_lat
USA_temp['lon'] = list_long

In [78]:
USA_temp.head(3)

Unnamed: 0,State,AverageTemperature,lat,lon
0,Alabama,17.066138,33.258882,-86.829534
1,Alaska,-4.890738,64.445961,-149.680909
2,Arizona,15.381526,34.395342,-111.763275


In [79]:
# Our dataframe is now ready and can apply heatmap
import folium

In [80]:
from folium.plugins import HeatMap

In [81]:
# Base map
basemap = folium.Map()

In [82]:
# Passing required variable info to HeatMap and adding it to the basemap
HeatMap(USA_temp[['lat','lon','AverageTemperature']]).add_to(basemap)
basemap

The above heatmap shows average temperature of each of the US states with red being higher than the blue temperature.

### Analysis of average temperatures of major Indian cities by month
**We need the following info for this analysis**  
* Average temperatures  
* Major Indian cities  
* Month

In [83]:
file = 'C:/Users/unpat/_Projects/Challenge_AprojectAweek_data_files/Week_10/GlobalLandTemperaturesByCity.csv'
cities = pd.read_csv(file)
cities.head(3)

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
0,1743-11-01,6.068,1.737,Århus,Denmark,57.05N,10.33E
1,1743-12-01,,,Århus,Denmark,57.05N,10.33E
2,1744-01-01,,,Århus,Denmark,57.05N,10.33E


In [84]:
# Separating Indian cities from the dataframe
India = cities[cities['Country'] == 'India']

In [85]:
India.shape

(1014906, 7)

In [86]:
India['City'].nunique()

391

In [87]:
# Considering the following major Indian cities for our analysis
cities = ['New Delhi','Bangalore','Hydrabad','Pune','Madras','Varanasi','Gurgaon']

In [88]:
# Dataframe for the major Indian cities
cities = India[India['City'].isin(cities)]

In [89]:
cities.head(3)

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
630113,1796-01-01,22.672,2.317,Bangalore,India,12.05N,77.26E
630114,1796-02-01,24.42,1.419,Bangalore,India,12.05N,77.26E
630115,1796-03-01,26.092,2.459,Bangalore,India,12.05N,77.26E


In [90]:
# Removing extra 'N' and 'E' from Latitude & Longitude
cities['Latitude'] = cities['Latitude'].str.strip('N')
cities['Longitude'] = cities['Longitude'].str.strip('E')

In [91]:
cities.head(3)

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude
630113,1796-01-01,22.672,2.317,Bangalore,India,12.05,77.26
630114,1796-02-01,24.42,1.419,Bangalore,India,12.05,77.26
630115,1796-03-01,26.092,2.459,Bangalore,India,12.05,77.26


We need average temp by month but we do not have month column so fetching the month from dt column

In [92]:
# Converting the dt column to datetime
cities['dt'] = pd.to_datetime(cities['dt'])

In [93]:
# Fetching the month from dt column
cities['Month'] = cities['dt'].dt.month

In [94]:
cities.head(3)

Unnamed: 0,dt,AverageTemperature,AverageTemperatureUncertainty,City,Country,Latitude,Longitude,Month
630113,1796-01-01,22.672,2.317,Bangalore,India,12.05,77.26,1
630114,1796-02-01,24.42,1.419,Bangalore,India,12.05,77.26,2
630115,1796-03-01,26.092,2.459,Bangalore,India,12.05,77.26,3


In [95]:
# Grouping the data, first on the basis of month and after that on the basis of city. After that, access the average temp
cities_temp = cities.groupby(['Month','City'])['AverageTemperature'].mean().to_frame().reset_index()
cities_temp.head()

Unnamed: 0,Month,City,AverageTemperature
0,1,Bangalore,22.713981
1,1,Gurgaon,14.23856
2,1,Madras,24.346733
3,1,New Delhi,14.23856
4,1,Pune,20.448205


In [96]:
# Assigning new column names to the cities_temp dataframe
cities_temp.columns = ['Month','City','Mean_temp']

In [97]:
# Merging cities dataframe to cities_temp dataframe to get the Latitude & Longitude info
df = cities_temp.merge(right = cities, on = 'City')
df.head()

Unnamed: 0,Month_x,City,Mean_temp,dt,AverageTemperature,AverageTemperatureUncertainty,Country,Latitude,Longitude,Month_y
0,1,Bangalore,22.713981,1796-01-01,22.672,2.317,India,12.05,77.26,1
1,1,Bangalore,22.713981,1796-02-01,24.42,1.419,India,12.05,77.26,2
2,1,Bangalore,22.713981,1796-03-01,26.092,2.459,India,12.05,77.26,3
3,1,Bangalore,22.713981,1796-04-01,27.687,1.746,India,12.05,77.26,4
4,1,Bangalore,22.713981,1796-05-01,27.619,1.277,India,12.05,77.26,5


In [98]:
# Dropping the duplicate rows from the df dataframe
data = df.drop_duplicates(subset = ['Month_x','City'])
data.head()

Unnamed: 0,Month_x,City,Mean_temp,dt,AverageTemperature,AverageTemperatureUncertainty,Country,Latitude,Longitude,Month_y
0,1,Bangalore,22.713981,1796-01-01,22.672,2.317,India,12.05,77.26,1
2613,2,Bangalore,24.656619,1796-01-01,22.672,2.317,India,12.05,77.26,1
5226,3,Bangalore,27.062186,1796-01-01,22.672,2.317,India,12.05,77.26,1
7839,4,Bangalore,27.988517,1796-01-01,22.672,2.317,India,12.05,77.26,1
10452,5,Bangalore,27.522754,1796-01-01,22.672,2.317,India,12.05,77.26,1


In [99]:
data2 = data[['Month_x','City','Mean_temp','Country','Latitude','Longitude']]

In [100]:
data2.head()

Unnamed: 0,Month_x,City,Mean_temp,Country,Latitude,Longitude
0,1,Bangalore,22.713981,India,12.05,77.26
2613,2,Bangalore,24.656619,India,12.05,77.26
5226,3,Bangalore,27.062186,India,12.05,77.26
7839,4,Bangalore,27.988517,India,12.05,77.26
10452,5,Bangalore,27.522754,India,12.05,77.26


In [101]:
import plotly.graph_objs as go

In [102]:
data = go.Heatmap(x = data2['Month_x'],
           y = data2['City'],
           z = data2['Mean_temp'])

In [103]:
# Setting a layout
layout = go.Layout(title = 'Average Temperature of Major Indian Cities by Month')

In [104]:
fig = go.Figure(data = data, layout = layout)
fig.show()

### Spatial analysis on average temperature of major Indian cities

In [105]:
# Base map
basemap = folium.Map()

In [106]:
data2.head(3)

Unnamed: 0,Month_x,City,Mean_temp,Country,Latitude,Longitude
0,1,Bangalore,22.713981,India,12.05,77.26
2613,2,Bangalore,24.656619,India,12.05,77.26
5226,3,Bangalore,27.062186,India,12.05,77.26


In [107]:
for id, row in data2.iterrows():
    folium.Marker(location = [row['Latitude'], row['Longitude']], popup = row['Mean_temp']).add_to(basemap)

basemap

# --- END ---