<a href="https://colab.research.google.com/github/tharina11/Geospatial-Projects/blob/main/3.%20Geocoding_and_Visualizing_Centuries_scored_by_Sachin_Tendulkar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Geocoding and Visualizing Centuries scored by Sachin Tendulkar

Sachin Tendulkar is one of the best batsmen of the history of cricket. Sachin holds the world record for scoring highest number of centuries in international cricket playing formats. This exercise consists of a section of geocoding the venues where Sachin scored his centuties and a visualzation of basic statistics.

The data for this exercise is imported from the Wikipedia page [List of international cricket centuries by Sachin Tendulkar](https://en.wikipedia.org/wiki/List_of_international_cricket_centuries_by_Sachin_Tendulkar).

In [None]:
# Import Libraries
import pandas as pd
from geopandas.tools import geocode
from geopy.geocoders import Nominatim
import time
from pprint import pprint
import folium
from folium import plugins
from folium.plugins import HeatMap
import requests

In [None]:
# Import data from the wikipedia page
data=pd.read_html('https://en.wikipedia.org/wiki/List_of_international_cricket_centuries_by_Sachin_Tendulkar')

The Wikipedia page consists of multiple tables that include different sets of information on Sachin's centuries.

In [None]:
# Number of tables in the Wikipedia page
len(data)

5

First and second tables include the information of Test and ODI (One Day International) centuries. The tables consist of information such as venue, the opponent, and date etc.

In [None]:
# Column names of the first table
data[1].columns

Index(['No.', 'Score', 'Against', 'Pos.', 'Inn.', 'S/R', 'Venue', 'H/A/N',
       'Date', 'Result', 'Ref'],
      dtype='object')

Third table consists of some basics statistics of Sachin's cricket career.

In [None]:
# Third table in the Wikipedia page
data[2].head()

Unnamed: 0.1,Unnamed: 0,Total,Won,Win %,Lost,Lost%,Tie,Tie%,Draw,Draw%,NR,NR%
0,Test,51,20,39.22%,11,21.56%,0,0%,20,39.22%,0,0%
1,ODI,49,33,67.35%,14,28.57%,1,2.04%,0,0%,1,2.04%
2,Total,100,53,53%,25,25%,1,1%,20,20%,1,1%


Add a new column to identify the format (ODI or test) of the each century, because in the next step, we will combine the tables.

In [None]:
# Table with information on Test centuries
centuries_test = data[0]
centuries_test['Format'] = 'Test'
centuries_test.head(5)

Unnamed: 0,No.,Score,Against,Pos.,Inn.,Test,Venue,H/A,Date,Result,Ref,Format
0,1,119*,England,6,4,2,"Old Trafford, Manchester",Away,9 August 1990,Drawn,[11],Test
1,2,148*,Australia,6,2,3,"Sydney Cricket Ground, Sydney",Away,2 January 1992,Drawn,[12],Test
2,3,114,Australia,4,2,5,"WACA Ground, Perth",Away,1 February 1992,Lost,[13],Test
3,4,111,South Africa,4,2,2,"Wanderers Stadium, Johannesburg",Away,26 November 1992,Drawn,[14],Test
4,5,165,England,4,1,2,"M. A. Chidambaram Stadium, Chennai",Home,11 February 1993,Won,[15],Test


In [None]:
# Table with information on ODI centuries
centuries_odi = data[1]
centuries_odi['Format'] = 'ODI'
centuries_odi.head(5)

Unnamed: 0,No.,Score,Against,Pos.,Inn.,S/R,Venue,H/A/N,Date,Result,Ref,Format
0,1,110,Australia,2,1,84.61,"R. Premadasa Stadium, Colombo",Neutral,9 September 1994,Won,[63],ODI
1,2,115,New Zealand,2,2,84.55,"IPCL Sports Complex Ground, Vadodara",Home,28 October 1994,Won,[64],ODI
2,3,105,West Indies,2,1,78.35,"Sawai Mansingh Stadium, Jaipur",Home,11 November 1994,Won,[65],ODI
3,4,112*,Sri Lanka,2,2,104.67,"Sharjah Cricket Association Stadium, Sharjah",Neutral,9 April 1995,Won,[66],ODI
4,5,127*,Kenya,2,2,92.02,"Barabati Stadium, Cuttack",Home,18 February 1996,Won,[67],ODI


Lets combine the tables row-wise.

In [None]:
# Concatenate two tables into one table
all_centuries = pd.concat([centuries_test,centuries_odi])

As the next step, name of the city where Sachin scored his each century is extracted from the venue column. The extracted city name is used to obtain the longitude and latitude values of the city. Geographic information is acquired from the open source service called [Photon](https://photon.komoot.io/).

In [None]:
# Initialize three empty lists to add name and geographic coordinates of each city
city_name = []
longitude = []
latitude = []

for index, row in all_centuries.iterrows():
    try:
        ground = row['Venue']
        # Split the venue name and extract and print the name of the city
        city = ground.split(", ")[-1]
        print(city)
        # Obtain the geographic information of the city 
        geographic_info= geocode(city, provider='photon')
        # Convert the geographic information to string, extract coordinates, and append the coordinates to the lists 
        location = str(geographic_info['geometry'][0])
        location_coordinates = location.split(" ")
        city_name.append(city)
        longitude.append(location_coordinates[1][1:])
        latitude.append(location_coordinates[2][:-1])
    except requests.exceptions.Timeout:
        print("Timeout occurred")
    
    except TypeError:
            print('geocoding information for '+ row['Venue'] + ' is not found')

Manchester
Sydney
Perth
Johannesburg
Chennai
Colombo
Lucknow
Nagpur
Birmingham
Nottingham
Cape Town
Colombo
Colombo
Mumbai
Chennai
Bangalore
Wellington
Chennai
Colombo
Mohali
Ahmedabad
Melbourne
New Delhi
Nagpur
Chennai
Bloemfontein
Ahmedabad
Nagpur
Port of Spain
Leeds
Kolkata
Sydney
Multan
Dhaka
New Delhi
Chittagong
Mirpur
Sydney
Adelaide
Nagpur
Chennai
Hamilton
Ahmedabad
Chittagong
Mirpur
Nagpur
Kolkata
Colombo
Bangalore
Centurion
Cape Town
Colombo
Vadodara
Jaipur
Sharjah
Cuttack
New Delhi
Singapore
Sharjah
Colombo
Mumbai
Benoni
Bangalore
Kanpur
Sharjah
Sharjah
Kolkata
Colombo
Bulawayo
Dhaka
Sharjah
Sharjah
Bristol
Colombo
Hyderabad
Vadodara
Sharjah
Jodhpur
Indore
Harare
Johannesburg
Paarl
Chester-le-Street
Bristol
Pietermaritzburg
Gwalior
Hyderabad
Rawalpindi
Ahmedabad
Peshawar
Kuala Lumpur
Vadodara
Sydney
Christchurch
Colombo
Hyderabad
Gwalior
Bangalore




Nagpur
Mirpur


In [None]:
# Add the city name and the location coordinates as new columns to the table
all_centuries["Longitude"] = longitude
all_centuries["Latitude"] = latitude
all_centuries["City"] = city_name
all_centuries.head(5)

Unnamed: 0,No.,Score,Against,Pos.,Inn.,Test,Venue,H/A,Date,Result,Ref,Format,S/R,H/A/N,Longitude,Latitude,City
0,1,119*,England,6,4,2.0,"Old Trafford, Manchester",Away,9 August 1990,Drawn,[11],Test,,,-2.2451148,53.4794892,Manchester
1,2,148*,Australia,6,2,3.0,"Sydney Cricket Ground, Sydney",Away,2 January 1992,Drawn,[12],Test,,,151.2082848,-33.8698439,Sydney
2,3,114,Australia,4,2,5.0,"WACA Ground, Perth",Away,1 February 1992,Lost,[13],Test,,,115.8605801,-31.9558964,Perth
3,4,111,South Africa,4,2,2.0,"Wanderers Stadium, Johannesburg",Away,26 November 1992,Drawn,[14],Test,,,28.049722,-26.205,Johannesburg
4,5,165,England,4,1,2.0,"M. A. Chidambaram Stadium, Chennai",Home,11 February 1993,Won,[15],Test,,,80.270186,13.0836939,Chennai


Now the Longitude and Latitude information are acquired. These locations can be plotted in a world map using folium. A marker is added to indicate each city. When you click on the marker, the city name and the format which the century is scored will appear.

In [None]:
# Open a worldmap using Openstreetmap
m= folium.Map(tiles='Openstreetmap', zoom_start=2)

# Add a marker to each point and set the information to appear when click on the marker
for index,row in all_centuries.iterrows():
    folium.Marker([row['Latitude'],row['Longitude']], popup= 'City: '+ row['City'] 
                  + '<br>''<br>''Format: ' +row['Format']).add_to(m)

m

Github has difficulties of showing Folium maps. You can visualize the map above by dropping the **github link of this project** to [nbviewer.org](https://nbviewer.org/).

Here is a screenshot of the geocoded map.
![picture](
https://drive.google.com/uc?id=1rdeJhY5As1VFh35gTrWk56IqdgTk206D)





Instantialte a client instance to connect the geocoing service.

In [None]:
# instantiate a new Nominatim client
app = Nominatim(user_agent="geocoding")

Define a function to get the address of each location indicated by each pair of longitudes and latitudes.

In [None]:
def get_address_by_location(latitude, longitude, language="en"):
    # build coordinates string to pass to the reverse function
    coordinates = f"{latitude}, {longitude}"
    # sleep for a second to respect the geocoding service usage Policy
    time.sleep(1)
    try:
        return app.reverse(coordinates, language=language).raw
    except:
        return get_address_by_location(latitude, longitude)

In [None]:
# Initialize an empty list to add country names
country= []

for rowIndex, row in all_centuries.iterrows():
    latitude = row['Latitude']
    longitude = row['Longitude']
    # get the geographic information of each location
    address = get_address_by_location(latitude, longitude)
    
    # Add the name of the country correspond to each entry
    country.append(address['address']['country'])
print(country)

['United Kingdom', 'Australia', 'Australia', 'South Africa', 'India', 'Sri Lanka', 'India', 'India', 'United Kingdom', 'United Kingdom', 'South Africa', 'Sri Lanka', 'Sri Lanka', 'India', 'India', 'India', 'New Zealand', 'India', 'Sri Lanka', 'India', 'India', 'Australia', 'India', 'India', 'India', 'South Africa', 'India', 'India', 'Trinidad and Tobago', 'United Kingdom', 'India', 'Australia', 'Pakistan', 'Bangladesh', 'India', 'Bangladesh', 'Pakistan', 'Australia', 'Australia', 'India', 'India', 'Bermuda', 'India', 'Bangladesh', 'Pakistan', 'India', 'India', 'Sri Lanka', 'India', 'South Africa', 'South Africa', 'Sri Lanka', 'India', 'India', 'United Arab Emirates', 'India', 'India', 'Singapore', 'United Arab Emirates', 'Sri Lanka', 'India', 'South Africa', 'India', 'India', 'United Arab Emirates', 'United Arab Emirates', 'India', 'Sri Lanka', 'Zimbabwe', 'Bangladesh', 'United Arab Emirates', 'United Arab Emirates', 'United Kingdom', 'Sri Lanka', 'India', 'India', 'United Arab Emirate

In [None]:
# Add the name of the country as a new columns to the table
all_centuries["Country"] = country
all_centuries.head(5)

Unnamed: 0,No.,Score,Against,Pos.,Inn.,Test,Venue,H/A,Date,Result,Ref,Format,S/R,H/A/N,Longitude,Latitude,City,Country
0,1,119*,England,6,4,2.0,"Old Trafford, Manchester",Away,9 August 1990,Drawn,[11],Test,,,-2.2451148,53.4794892,Manchester,United Kingdom
1,2,148*,Australia,6,2,3.0,"Sydney Cricket Ground, Sydney",Away,2 January 1992,Drawn,[12],Test,,,151.2082848,-33.8698439,Sydney,Australia
2,3,114,Australia,4,2,5.0,"WACA Ground, Perth",Away,1 February 1992,Lost,[13],Test,,,115.8605801,-31.9558964,Perth,Australia
3,4,111,South Africa,4,2,2.0,"Wanderers Stadium, Johannesburg",Away,26 November 1992,Drawn,[14],Test,,,28.049722,-26.205,Johannesburg,South Africa
4,5,165,England,4,1,2.0,"M. A. Chidambaram Stadium, Chennai",Home,11 February 1993,Won,[15],Test,,,80.270186,13.0836939,Chennai,India


Now the dataset is completed with a lot of usable location and information. This data can be used for further analysis and visualizations.