### Exploring venues in Lucnkow, India

Table of Contents
- Introduction
- Data Collection from APIs
- Data Cleaning
- Methodolgy
- Analysis
- Results and Discussion
- Conclusion

### Introduction 
The aim of the project is to identify venues in Lucknow, India based on their rating and average prices. In this notebook, we will **identify various venues in the city of Lucknow**, India, using **Foursquare API and Zomato API**, to help visitors select the restaurants that suit them the best.

Whenever a user is visiting a city they start looking for places to visit during their stay. They primarily look for places based on the venue ratings across all venues and the average prices such that the locations fits in their budget.

Here, we'll identify places that are fit for various individuals based on the information collected from the two APIs and Data Science. Once we have the plot with the venues, any company can launch an application using the same data and suggest users such information.

### Data Collection from APIs 
To begin with, we will take a look at Lucknow on the Map using the folium library.

We will also fetch the data from two different APIs.

**Foursquare API**: We will use the Foursquare API to fetch venues in Lucknow starting from the middle upto 10 Kilometers in each direction.
**Zomato API**: The Zomato API provides information about various venues including the complete address, user ratings, price for two people, price range and a lot more.

### Lucknow
**Lucknow** is composed of a number of sectors spread across a total area of 349 sq Km. There are many venues (especially restaurants, hotels and cafes) which can be explored.

We can use the geopy library to extract the latitude and longitude values of Lucknow but it seems off and thus, we'll directly supply the values in this case.

In [121]:
LKO_LATITUDE = '26.8467'
LKO_LONGITUDE = '80.9462'
print('The geograpical coordinates of Lucknow are {}, {}.'.format(LKO_LATITUDE, LKO_LONGITUDE))

The geograpical coordinates of Lucknow are 26.8467, 80.9462.


In [122]:
import folium

lucknow_map = folium.Map(location = [LKO_LATITUDE, LKO_LONGITUDE], zoom_start = 13)
folium.Marker([LKO_LATITUDE, LKO_LONGITUDE]).add_to(lucknow_map)
lucknow_map.save("../Map/lucknow Map.html")
lucknow_map

### Foursquare API

We begin by fetching a total of all venues in Lucknow upto a range of 10 Kilometers using the Foursquare API. The Foursquare API has the explore API which allows us to find venue recommendations within a given radius from the given coordinates. We will use this API to find all the venues we need.

In [123]:
FOURSQUARE_CLIENT_ID = '0HH2B0MRFB2FALD3CL3SQAGF5KPCVO53DS5OEOKOP4MWUCJO'
FOURSQUARE_CLIENT_SECRET = 'D5KMPZK1RAFC0RSUS3VCUOIAIIA2KVCOWHIP1RJX3D1L0UQS'
RADIUS = 10000 # 
NO_OF_VENUES = 100
VERSION = '20200105' # Current date

In [124]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [125]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from pandas.io.json import json_normalize
import requests

pd.set_option('display.max_rows', None)

offset = 0
total_venues = 0
foursquare_venues = pd.DataFrame(columns = ['name', 'categories', 'lat', 'lng'])

while (True):
    url = ('https://api.foursquare.com/v2/venues/explore?client_id={}'
           '&client_secret={}&v={}&ll={},{}&radius={}&limit={}&offset={}').format(FOURSQUARE_CLIENT_ID, 
                                                                        FOURSQUARE_CLIENT_SECRET, 
                                                                        VERSION, 
                                                                        CHD_LATITUDE, 
                                                                        CHD_LONGITUDE, 
                                                                        RADIUS,
                                                                        NO_OF_VENUES,
                                                                        offset)
    result = requests.get(url).json()
    venues_fetched = len(result['response']['groups'][0]['items'])
    total_venues = total_venues + venues_fetched
    print("Total {} venues fetched within a total radius of {} Km".format(venues_fetched, RADIUS/1000))

    venues = result['response']['groups'][0]['items']
    venues = json_normalize(venues)

    # Filter the columns
    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    venues = venues.loc[:, filtered_columns]

    # Filter the category for each row
    venues['venue.categories'] = venues.apply(get_category_type, axis = 1)

    # Clean all column names
    venues.columns = [col.split(".")[-1] for col in venues.columns]
    foursquare_venues = pd.concat([foursquare_venues, venues], axis = 0, sort = False)
    
    if (venues_fetched < 100):
        break
    else:
        offset = offset + 100

foursquare_venues = foursquare_venues.reset_index(drop = True)
print("\nTotal {} venues fetched".format(total_venues))

Total 66 venues fetched within a total radius of 10.0 Km

Total 66 venues fetched


### Zomato API

The Zomato API allows using its search API to search for any given venue based on certain search filters such as query, latitude, longitude and more. Zomato also requires a Zomato user key which can be accessed with a developer account.

We'll use the name, lat, and lng values of various venues fetched from Foursquare API to use the search API and get more information regarding each venue.

- The query will be the name of the venue.
- The start defines from what offset we want to start, so we'll keep it at 0.
- The count defines the number of restaurants we want to fetch. As we have the exact location coordinates, we'll fetch only one.
- We will supply the latitude and longitude values.
- We will set the sorting criteria as real_distance so each time we get the venue we're searching based on location coordinates.

In [126]:
headers = {'user-key': 'b3a1c26ed20422bfaae7ada558744e1f'}
venues_information = []

for index, row in foursquare_venues.iterrows():
    print("Fetching data for venue: {}".format(index + 1))
    venue = []
    url = ('https://developers.zomato.com/api/v2.1/search?q={}' + 
          '&start=0&count=1&lat={}&lon={}&sort=real_distance').format(row['name'], row['lat'], row['lng'])
    result = requests.get(url, headers = headers).json()
    if (len(result['restaurants']) > 0):
        venue.append(result['restaurants'][0]['restaurant']['name'])
        venue.append(result['restaurants'][0]['restaurant']['location']['latitude'])
        venue.append(result['restaurants'][0]['restaurant']['location']['longitude'])
        venue.append(result['restaurants'][0]['restaurant']['average_cost_for_two'])
        venue.append(result['restaurants'][0]['restaurant']['price_range'])
        venue.append(result['restaurants'][0]['restaurant']['user_rating']['aggregate_rating'])
        venue.append(result['restaurants'][0]['restaurant']['location']['address'])
        venues_information.append(venue)
    else:
        venues_information.append(np.zeros(6))
    
zomato_venues = pd.DataFrame(venues_information, 
                                  columns = ['venue', 'latitude', 
                                             'longitude', 'price_for_two', 
                                             'price_range', 'rating', 'address'])

Fetching data for venue: 1
Fetching data for venue: 2
Fetching data for venue: 3
Fetching data for venue: 4
Fetching data for venue: 5
Fetching data for venue: 6
Fetching data for venue: 7
Fetching data for venue: 8
Fetching data for venue: 9
Fetching data for venue: 10
Fetching data for venue: 11
Fetching data for venue: 12
Fetching data for venue: 13
Fetching data for venue: 14
Fetching data for venue: 15
Fetching data for venue: 16
Fetching data for venue: 17
Fetching data for venue: 18
Fetching data for venue: 19
Fetching data for venue: 20
Fetching data for venue: 21
Fetching data for venue: 22
Fetching data for venue: 23
Fetching data for venue: 24
Fetching data for venue: 25
Fetching data for venue: 26
Fetching data for venue: 27
Fetching data for venue: 28
Fetching data for venue: 29
Fetching data for venue: 30
Fetching data for venue: 31
Fetching data for venue: 32
Fetching data for venue: 33
Fetching data for venue: 34
Fetching data for venue: 35
Fetching data for venue: 36
F

### Data Cleaning 
The data from multiple resources might not always align. Thus, it is **important to combine the data retrieved from multiple resources properly**.

We'll first plot the two data points on the map. We'll then try to combine data points that have their latitude and longitude values very close to one another. From the remaining selected venues, we will inspect the venues to ensure that any remaining mismatched venues are also removed from the final dataset of venues before we begin any analysis.

We will first plot the Foursquare data on the map.

In [127]:
lucknow_map = folium.Map(location = [LKO_LATITUDE, LKO_LONGITUDE], zoom_start = 13)

for name, latitude, longitude in zip(foursquare_venues['name'], foursquare_venues['lat'], foursquare_venues['lng']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [latitude, longitude],
        radius = 5,
        popup = label,
        color = 'green',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(lucknow_map)  

lucknow_map.save("../Map/Venues by Foursquare.html")
lucknow_map

From the map, we can infer that there are clusters of venues around **Sahara Mall, Hajaratganj and lohia park** . We can also plot the category count and see the major type of venues that exist.

We will also plot the Zomato data on the map.

In [128]:
lucknow_map = folium.Map(location = [LKO_LATITUDE, LKO_LONGITUDE], zoom_start = 13)

for venue, address, latitude, longitude in zip(zomato_venues['venue'], zomato_venues['address'], 
                                               zomato_venues['latitude'], zomato_venues['longitude']):
    label = '{}, {}'.format(name, address)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [latitude, longitude],
        radius = 5,
        popup = label,
        color = 'red',
        fill = True,
        fill_color = '#cc3535',
        fill_opacity = 0.7,
        parse_html = False).add_to(lucknow_map)  

lucknow_map.save("../Map/Venues by Zomato.html")
lucknow_map

We can see that there are many venues identified by both Foursquare and Zomato. There is a lot of overlapping between the two near **Sahara Mall, Hajaratganj and lohia park**. However, there are others where the data does not match just like the red dots in the top right on the second map (like area near Hasanganj,Vijaykhand)

To combine the two datasets, I'll have to check that the latitude and longitude values of each corresponding venue match. Thus, I'll round both the latitude and longitude values upto 4 decimal places. Then, I'll calculate the difference between the corresponding latitude and longitude values and see if the difference is less than 0.0004 which should ideally mean that the two locations are same.

In [129]:
foursquare_venues['lat'] = foursquare_venues['lat'].apply(lambda lat: round(float(lat), 4))
foursquare_venues['lng'] = foursquare_venues['lng'].apply(lambda lng: round(float(lng), 4))
zomato_venues['latitude'] = zomato_venues['latitude'].apply(lambda lat: round(float(lat), 4))
zomato_venues['longitude'] = zomato_venues['longitude'].apply(lambda lng: round(float(lng), 4))

In [130]:
dataset = pd.concat([foursquare_venues, zomato_venues], axis = 1)
dataset['lat_diff'] = dataset['latitude'] - dataset['lat']
dataset['lng_diff'] = dataset['longitude'] - dataset['lng']

In [131]:
selected_venues = dataset[(abs(dataset['lat_diff']) <= 0.0004) & (abs(dataset['lng_diff']) <= 0.0004)].reset_index(drop = True)
selected_venues


Unnamed: 0,name,categories,lat,lng,venue,latitude,longitude,price_for_two,price_range,rating,address,lat_diff,lng_diff
0,Royal Sky,Indian Restaurant,26.8502,80.941,Royal Sky,26.8504,80.9411,1100.0,3.0,4.3,"Opposite Halwasiya Market Hazratganj, Hazratga...",0.0002,0.0001
1,The Mughal's Dastarkhwan,Indian Restaurant,26.8442,80.9403,The Mughal's Dastarkhwan,26.8442,80.9403,800.0,3.0,4.4,"29, BN Road, Near Royal Hotel Crossing, Lalbag...",0.0,0.0
2,Dastarkhwan,Indian Restaurant,26.8527,80.9369,Dastarkhwan,26.8526,80.9368,600.0,2.0,4.5,"20, Wala Qadar Road, DM Compound Colony, Kaise...",-0.0001,-0.0001
3,Hazratganj | हज़रतगंज | حضرتگںج (Hazratganj),Neighborhood,26.8483,80.9448,Madras Restaurant,26.8484,80.9448,200.0,1.0,3.6,"Near Gandhi Ashram, Behind Amar Fax, Hazratgan...",0.0001,0.0
4,Cappuccino Blast,Café,26.8337,80.9477,Cappuccino Blast,26.8337,80.9478,700.0,2.0,4.3,"12, Mall Avenue, Near, Hazratganj, Lucknow",0.0,0.0001
5,Janpath Market,Shopping Mall,26.8468,80.9437,Quality Like Five Star,26.8469,80.9438,300.0,1.0,3.0,"A 21, Janpath Market, Off MG Marg, Lalbagh, Lu...",0.0001,0.0001
6,JJ Bakery,Bakery,26.8727,80.9411,La Reine,26.8729,80.9411,500.0,2.0,3.3,"Shop No-UGF-1,Shree Plaza Complex,Vikas Nagar,...",0.0002,0.0
7,Moti Mahal,Indian Restaurant,26.8474,80.9453,Moti Mahal Sweets,26.8472,80.9453,400.0,2.0,3.4,"75, Mahatma Gandhi Marg, Next To Central Bank ...",-0.0002,0.0
8,Royal Cafe,Fast Food Restaurant,26.8483,80.9439,Royal Cafe,26.8482,80.9439,800.0,3.0,4.3,"51, Opposite Sahu Cinema, MG Marg, Hazratganj,...",-0.0001,0.0
9,Chappan bhog,Fast Food Restaurant,26.8276,80.9449,South Indian Raman Dosa,26.8276,80.945,300.0,1.0,3.4,"Near Chhappan Bhog, Sadar Bazaar, Lucknow.",0.0,0.0001


Taking a look at the names of venues from both APIs, some names are a complete mismatch.

**Category 1**: There are venues that have specific restaurants/cafes inside them as provided by Zomato API (Quality Like Five Star in Shopping Mall ,Janpath Market).


**Category 2**: Some have been replaced with new restaurants (JJ Bakery has now been replaced by La Reine).

The venues which belong to category 1 and category 32 are alright to keep.

I'll now select the venue name from Zomato API. I'll also get the average price per person by dividing the column price_for_two by 2 and removing this column from the dataset along with other unnecessary columns.

In [132]:
selected_venues['average_price'] = selected_venues['price_for_two']/2
selected_venues = selected_venues.drop(columns = ['name', 'lat', 'lng', 'lat_diff', 'lng_diff', 'price_for_two'])

Let's take a look at the final dataset that is left with us.

In [133]:
selected_venues.head()

Unnamed: 0,categories,venue,latitude,longitude,price_range,rating,address,average_price
0,Indian Restaurant,Royal Sky,26.8504,80.9411,3.0,4.3,"Opposite Halwasiya Market Hazratganj, Hazratga...",550.0
1,Indian Restaurant,The Mughal's Dastarkhwan,26.8442,80.9403,3.0,4.4,"29, BN Road, Near Royal Hotel Crossing, Lalbag...",400.0
2,Indian Restaurant,Dastarkhwan,26.8526,80.9368,2.0,4.5,"20, Wala Qadar Road, DM Compound Colony, Kaise...",300.0
3,Neighborhood,Madras Restaurant,26.8484,80.9448,1.0,3.6,"Near Gandhi Ashram, Behind Amar Fax, Hazratgan...",100.0
4,Café,Cappuccino Blast,26.8337,80.9478,2.0,4.3,"12, Mall Avenue, Near, Hazratganj, Lucknow",350.0


I'll drop the venues which have 0.0 rating as it means it's not been rated yet.

In [134]:
selected_venues = selected_venues[selected_venues['rating'] != 0.0]
print("Total venues available: {}".format(selected_venues.shape[0]))

Total venues available: 34
