# Capstone Project - (Week 4)
## Applied Data Science Capstone by IBM/Coursera

### Introduction: Description of the problem

In this project, I will try to attempt to find a reasonable area for a restaurant,probably a specific type, in London city. Based on the lessons learnt from earlier weeks, I will try to identify specific area and location where a particular restaurant might be of interest for stakeholders. This is partly based on the earlier lessons and the Berlin City example provided in the course.

London has about 33 boroughs, including City of London. There are about 533 neighbhourhoods. Ideally, I would have wanted to survey all the neighbhourhoods, but considering limitations on the sandbox account of Foursquare, I will limit the study to about 6000 metres radius of London (51.5073219, -0.1276474).

The neighbhourhoods of the london will not be specifically considered. The region under 6000 metres radius of London is divided into several areas to begin the survey of restaurants located using Foursquare. I will try to identify specific cuisine restaurant in an area based on its frequency.


### Data Sources:

The neighbhourhood and borough locations have been scrapped from below wikipedia link using Beautiful Soup.
https://en.wikipedia.org/wiki/List_of_areas_of_London

The borough boundaries have been taken from below link.Though borough boundaries are not utilised for analysis, but have been used for general visualisation purpose.
https://github.com/martinjc/UK-GeoJSON/blob/master/json/administrative/eng/lad.json


### Factors that might affect the selection:

1.number of existing restaurants in a restaurant 
2.frequency of specific restaurants 
3.distance of the city centre 
4.population of the area 

Points 1 to 3 are considered in this report scope. Point-4 can be considered for the future scope.



### Below Code Scrapes the boroughs and neighbhourhoods of London City

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from pandas import ExcelWriter
import json
from pandas.io.json import json_normalize

response_can=requests.get("https://en.wikipedia.org/wiki/List_of_areas_of_London")
soup = BeautifulSoup(response_can.text,"html.parser")
Table_Rows = soup.find_all("tr")

Neighbhorhood=[]
Borough=[]
PostTown=[]
PostCode=[]
DialCode=[]
Coords=[]


for i in range(3,536):

    x=Table_Rows[i].find_all("td")
    
    neighbhorhood = x[0].find("a").text.strip()
    Neighbhorhood.append(neighbhorhood)
    
    try:
        borough =   x[1].text.strip().strip(x[1].find("a").text).strip()
        
    except:
        borough =   x[1].text.strip()
    Borough.append(borough)
    
    postTown =      x[2].text.strip()
    PostTown.append(postTown)
    
    postCode =      x[3].text.strip()
    PostCode.append(postCode)
    
    dialCode =      x[4].text.strip()
    DialCode.append(dialCode)
    
    coord_strip1= len('https://tools.wmflabs.org/geohack/en/')
    coord_strip2=len('_region:GB_scale:25000?pagename=List_of_areas_of_London')
    try:
        coords=         x[5].find("a")['href'][coord_strip1:-coord_strip2].split(';')
        lat = coords[0]
        long = coords[1]
    except:
        coords= 'NAN' 
    Coords.append(coords)
    
Df_London = pd.DataFrame({'Neighbhorhood':Neighbhorhood,'Borough':Borough,'PostTown':PostTown,'Post Code':PostCode,'Dial Code':DialCode,'Coords':Coords})

print('Number of Neighbhourhoods in London: ',  Df_London.shape[0])

Borough_list =['Croydon' ,'Bexley' ,'Redbridge' ,'City' ,'Westminster' ,'Brent', 'Bromley', 'Islington', 'Havering' ,'Barnet', 'Enfield', 'Wandsworth' ,'Southwark',
 'Barking and Dagenham' ,'Richmond upon Thames' ,'Newham' ,'Sutton' ,'Ealing', 'Lewisham' ,'Harrow' ,'Camden', 'Kingston upon Thames', 'Tower Hamlets',
 'Greenwich', 'Haringey', 'Hounslow' ,'Lambeth', 'Waltham Forest', 'Merton' ,'Hillingdon', 'Hackney', 'Kensington and Chelsea','Hammersmith and Fulham',]


DF = Df_London[ (Df_London['Coords'] !='NAN') & (~Df_London['Borough'].str.contains(',')) & (Df_London['Borough'].isin(Borough_list))].reset_index(drop=True)

print('Number of Neighbhourhoods Plotted in London Map after filtering: ', (DF.shape)[0])

DF.head()

Number of Neighbhourhoods in London:  533
Number of Neighbhourhoods Plotted in London Map after filtering:  500


Unnamed: 0,Neighbhorhood,Borough,PostTown,Post Code,Dial Code,Coords
0,Addington,Croydon,CROYDON,CR0,20,"[51.362931128458, -0.026373738779412]"
1,Addiscombe,Croydon,CROYDON,CR0,20,"[51.381621885559, -0.068682165650808]"
2,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,"[51.434925966837, 0.12492137518833]"
3,Aldborough Hatch,Redbridge,ILFORD,IG2,20,"[51.585577492045, 0.098742119839992]"
4,Aldgate,City,LONDON,EC3,20,"[51.51488143102, -0.078904677469267]"


### Below Code marks the borough boundaries(green), neighbhourhoods centres(red) and 6000 metres region (blue) of interest.

In [2]:
import folium
from geopy.geocoders import Nominatim
address = 'London'
geolocator = Nominatim(user_agent="london")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

map_london = folium.Map(location=[latitude, longitude], zoom_start=10)
folium.Marker([latitude,longitude], popup='London_Centre').add_to(map_london)
i=0
for coord, borough, neighborhood in zip(DF['Coords'], DF['Borough'], DF['Neighbhorhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [float(coord[0].strip()), float(coord[1].strip())],
        radius=2,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.2,
        parse_html=False).add_to(map_london)
    folium.Circle([float(coord[0].strip()), float(coord[1].strip())], radius=300, color='red', fill=False).add_to(map_london)
    
with open('London.json', 'r') as myfile:
    London_Borough=myfile.read()

def boroughs_style(feature):
    return { 'color': 'green', 'fill': False }

folium.GeoJson(London_Borough, style_function=boroughs_style, name='geojson').add_to(map_london)
folium.Circle([latitude, longitude], radius=6000, color='blue', fill=False).add_to(map_london)
map_london

The geograpical coordinate of London are 51.5073219, -0.1276474.


### Below code divides the 6000m region into several areas and area centres are extracted

In [3]:
import math
from geopy import distance

London_Coords = [51.5073219,-0.1276474]

Search_Radius = 6600
Area_diameter = 600

latitudes= []
longitudes = []

Dy=[Area_diameter,-Area_diameter,Area_diameter,-Area_diameter]
Dx=[Area_diameter,-Area_diameter,-Area_diameter,Area_diameter]


for dx,dy in zip(Dx,Dy):
    for i in range(int(Search_Radius/Area_diameter)+1):
        lat=i*dy/(111.3195*1000) + London_Coords[0]    
        for j in range(int(Search_Radius/Area_diameter)+1):
            long= London_Coords[1] - j*dx/((111.3195*1000)*math.cos(London_Coords[0]*3.14159/180))
            latitudes.append(lat)
            longitudes.append(long)


Actual_centres = list(zip(latitudes,longitudes))         
lat_long_centres = list(dict.fromkeys(Actual_centres))

Final_Area_coord=[]
for coord in lat_long_centres:
    if distance.distance((London_Coords[0],London_Coords[1]), coord).m <= 6300:
        Final_Area_coord.append(coord)        


map_london = folium.Map(location=[London_Coords[0], London_Coords[1]], zoom_start=12)
folium.Circle([London_Coords[0], London_Coords[1]], radius=6000, color='blue', fill=False).add_to(map_london)
folium.Marker([London_Coords[0],London_Coords[1]], popup='London_Centre').add_to(map_london)
for coord in Final_Area_coord:
    folium.CircleMarker([coord[0], coord[1]], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_london) 
    folium.Circle([coord[0], coord[1]], radius=Area_diameter/2, color='blue', fill=False).add_to(map_london)    

latitudes_final= [coord[0] for coord in Final_Area_coord]
longitudes_final = [coord[1] for coord in Final_Area_coord]

print('Total listed areas in the 6000m radius: ',len(Final_Area_coord))
map_london

Total listed areas in the 6000m radius:  349


### Below code extracts total number of restaurants in 6000m radius

In [4]:
import requests
from pandas.io.json import json_normalize

CLIENT_ID = 'PKQ2ZQXCCHVJ40XYELQP3XUA4F5MXKOTPTH02GV1JDFIWO2W' # your Foursquare ID
CLIENT_SECRET = 'RR3RRSOO1ZLMZUTNHUH04YYAARTKV4VF0CILVZVCM2DBGXXO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)


LIMIT = 100
search_radius = 430 # more than 300m, in order to overlap, so that non intersected areas are also searched.

category_rest_id='4d4b7105d754a06374d81259' # All food category
#category_IT_id='4bf58dd8d48988d110941735'  # Italian restaurant category


def getNearbyRes(names, latitudes, longitudes, radius=search_radius):
    
    res_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url1 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng,category_rest_id,radius,LIMIT)
            
        # make the GET request
        results = requests.get(url1).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        res_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['categories'][0]['id']) for v in results])

    nearby_Res = pd.DataFrame([item for rl in res_list for item in rl])
    nearby_Res.columns = ['Circle No.', 
                  'Circle Latitude', 
                  'Circle Longitude', 
                  'Restaurant Name', 
                  'Restaurant Latitude', 
                  'Restaurant Longitude', 
                  'Restaurant Category','Restaurant Category ID' ]
    
    return(nearby_Res)

All_Res = getNearbyRes(names=range(len(latitudes_final)),
                                   latitudes=latitudes_final,
                                   longitudes=longitudes_final
                                  )


All_Res_unique = All_Res.drop_duplicates(subset='Restaurant Latitude', keep ='first')

print('Total restaurants in 6000m radius:' , All_Res_unique.shape[0])
All_Res_unique.head()

Total restaurants in 6000m radius: 5666


Unnamed: 0,Circle No.,Circle Latitude,Circle Longitude,Restaurant Name,Restaurant Latitude,Restaurant Longitude,Restaurant Category,Restaurant Category ID
0,0,51.507322,-0.127647,Barrafina,51.509427,-0.125894,Spanish Restaurant,4bf58dd8d48988d150941735
1,0,51.507322,-0.127647,Tandoor Chop House,51.509192,-0.125638,North Indian Restaurant,54135bf5e4b08f3d2429dfdd
2,0,51.507322,-0.127647,Bancone,51.509529,-0.126434,Italian Restaurant,4bf58dd8d48988d110941735
3,0,51.507322,-0.127647,Kerridge’s Bar & Grill,51.506728,-0.12452,Restaurant,4bf58dd8d48988d1c4941735
4,0,51.507322,-0.127647,Thai Square,51.507656,-0.12983,Thai Restaurant,4bf58dd8d48988d149941735
