# Data Science Capstone Project - Full Project

### Introduction

The goal in this project is to recommend a best place to stay during the holiday in London

## Business case

London is one of the largest cities in Europe - thousands of tourists visit this city every day. The first and obvious step in holiday-planning is - "Where we will stay" ? Then the real problem begins. Hundreds of places to sleep and hundreds of places to visit. Searching through the web might be inefficient as we would need to check tens of pages...

...and here is where we are coming with help

### Problems

We are here to recommend the best choice of stay in London, but how we are going to do it ?

Couple of questions come into consideration

+ How do we find hotels ?
+ How do we grade them ?
+ From where we obtain the data

### Resolution

#### First, lets describe the data source:

All the data comes from Foursquare API - powerful application which contains geospatial data from all over the world. By using it, we will have access to all the hotel data in London as well as nearby venues which will help with recommendation process

#### Second - grading method:

We are going to assume that the best hotel would be the one, which has the highest online rating and has the most venues nearby. 

Having those 2 questions answered - we have the fundamentals to resolve that 'hotel issue in London'

The final form of the recommendation will be the list of 5 best hotels using the assumptions above - This will surely help the people coming here :)

***

# Data

Using GeoPy Python's library we are going to obtain the geographic localization of London. Once its done, utlizing Foursquare API, we are going to get full list of the hotels in the range of 10 km from the central city point. Once done, we will get online ratings for each hotel - this will help us to choose 5 best hotels in London. Once hotels are chosen, we will find nearby venues in the range of 1km from the hotel. The best hotel would be with highest online rating + with highest number of venues in a neighbourhood. All this data will be obtained using Foursquare API

Having in mind that venues might be in range of more than 1 hotel - we will use K-Means method to cluster them

# Capstone Project - The Battle of Neighborhoods (Week 2)

#### First We are going to import relevant libraries

In [2]:
import requests
import pandas as pd 
import numpy as np


!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Folium installed')
print('All Libraries imported.')

Collecting package metadata: done
Solving environment: \ 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - anaconda/linux-64::conda-build==3.17.8=py36_0
  - anaconda/linux-64::grpcio==1.16.1=py36hf8bcb03_1
  - anaconda/linux-64::keras==2.1.5=py36_0
  - anaconda/linux-64::libarchive==3.3.3=h5d8350f_5
  - anaconda/linux-64::python-libarchive-c==2.8=py36_6
  - anaconda/linux-64::tensorboard==1.8.0=py36hf484d3e_0
  - anaconda/linux-64::tensorflow==1.8.0=h57681fa_0
  - anaconda/linux-64::tensorflow-base==1.8.0=py36h5f64886_0
  - defaults/linux-64::anaconda==5.3.1=py37_0
  - defaults/linux-64::astropy==3.0.4=py37h14c3975_0
  - defaults/linux-64::bkcharts==0.2=py37_0
  - defaults/linux-64::blaze==0.11.3=py37_0
  - defaults/linux-64::bokeh==0.13.0=py37_0
  - defaults/linux-64::bottleneck==1.2.1=py37h035aef0_1
  - defaults/linux-64::dask==0.19.1=py37_0
  - defaults/linux-64::datashape==0.5.4=py37_1
  - defaults/l

In [3]:
CLIENT_ID = 'O3NRWAGDP04G21IH4OIYVB1GFQ1PE3HG1CEVBX5UWSS2CCOV' # your Foursquare ID
CLIENT_SECRET = 'HQFJUH4TZCC5ZZKX3USGUUUWGXDRPFQKAJGPR1GXFSSNNL3A' # your Foursquare Secret
VERSION = '20190713'

### Using GeoPy library, we can obtain the geospatial position of London

In [12]:
address = 'London'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

51.5073219 -0.1276474


# Given we have position of London, we can find hotels in a range of 10km

In [13]:
search_query = 'hotels'
radius = 10000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius)
result_hotels = requests.get(url).json()
result_hotels

{'meta': {'code': 200, 'requestId': '5d2bb410bf7dde002c7d2e99'},
 'response': {'venues': [{'id': '5afb9f1b65211f002c89d942',
    'name': 'Global Great Hotels - Investments Real Estate',
    'location': {'address': '133 Cockfosters Rd',
     'lat': 51.50708638980144,
     'lng': -0.12790918350219727,
     'labeledLatLngs': [{'label': 'display',
       'lat': 51.50708638980144,
       'lng': -0.12790918350219727}],
     'distance': 31,
     'postalCode': 'EN4 0AA',
     'cc': 'GB',
     'city': 'Hertfordshire',
     'state': 'Hertfordshire',
     'country': 'United Kingdom',
     'formattedAddress': ['133 Cockfosters Rd',
      'Hertfordshire',
      'EN4 0AA',
      'United Kingdom']},
    'categories': [{'id': '56aa371be4b08b9a8d573517',
      'name': 'Business Center',
      'pluralName': 'Business Centers',
      'shortName': 'Business Center',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/default_',
       'suffix': '.png'},
      'primary': True}],
    '

### Parsing data into pandas dataframe

In [17]:
# assign relevant part of JSON to hotel
hotels = result_hotels['response']['venues']

# tranform hotels into a dataframe
london_hotels = json_normalize(hotels)
london_hotels.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '56aa371be4b08b9a8d573517', 'name': 'B...",False,5afb9f1b65211f002c89d942,133 Cockfosters Rd,GB,Hertfordshire,United Kingdom,,31,"[133 Cockfosters Rd, Hertfordshire, EN4 0AA, U...","[{'label': 'display', 'lat': 51.50708638980144...",51.507086,-0.127909,EN4 0AA,Hertfordshire,Global Great Hotels - Investments Real Estate,v-1563145232,
1,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",False,4bc1e83c2a89ef3b78daf288,Stamford Bridge Fulham Road,GB,Chelsea,United Kingdom,Fulham Road,5165,"[Stamford Bridge Fulham Road (Fulham Road), Ch...","[{'label': 'display', 'lat': 51.4810573, 'lng'...",51.481057,-0.189093,SW6 1HS,Greater London,Millennium & Copthorne Hotels,v-1563145232,
2,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",False,50eb1154e4b0541e75c25274,,GB,,United Kingdom,,609,[United Kingdom],"[{'label': 'display', 'lat': 51.51089772999349...",51.510898,-0.12099,,,Strand Palace Hotel's Gym,v-1563145232,
3,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",False,544a2903498e333e2f510b01,2 Devonshire Terrace,GB,London,United Kingdom,,527,"[2 Devonshire Terrace, London, Greater London,...","[{'label': 'display', 'lat': 51.50361379162682...",51.503614,-0.12291,W2 3DN,Greater London,Best Value London Hotels,v-1563145232,
4,"[{'id': '4bf58dd8d48988d124941735', 'name': 'O...",False,53eddd5a498e6be4b226555d,45 Monmouth St,GB,London,United Kingdom,,619,"[45 Monmouth St, London, Greater London, Unite...","[{'label': 'display', 'lat': 51.51287841796875...",51.512878,-0.127164,,Greater London,Z Hotels Office,v-1563145232,


#### Table doesnt look nice & clean at the first glance so let's try to organize this

In [18]:
# keep only location columns
important_columns = ['name', 'categories'] + [col for col in london_hotels.columns if col.startswith('location.')] + ['id']
london_hotels_extract = london_hotels.loc[:, filtered_columns]

# category function
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
london_hotels_extract['categories'] = london_hotels_extract.apply(get_category_type, axis=1)

# clean column names by keeping only last term
london_hotels_extract.columns = [column.split('.')[-1] for column in london_hotels_extract.columns]

pd.DataFrame(london_hotels_extract)
london_hotels_extract

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Global Great Hotels - Investments Real Estate,Business Center,133 Cockfosters Rd,GB,Hertfordshire,United Kingdom,,31,"[133 Cockfosters Rd, Hertfordshire, EN4 0AA, U...","[{'label': 'display', 'lat': 51.50708638980144...",51.507086,-0.127909,,EN4 0AA,Hertfordshire,5afb9f1b65211f002c89d942
1,Millennium & Copthorne Hotels,Hotel,Stamford Bridge Fulham Road,GB,Chelsea,United Kingdom,Fulham Road,5165,"[Stamford Bridge Fulham Road (Fulham Road), Ch...","[{'label': 'display', 'lat': 51.4810573, 'lng'...",51.481057,-0.189093,,SW6 1HS,Greater London,4bc1e83c2a89ef3b78daf288
2,Strand Palace Hotel's Gym,Gym,,GB,,United Kingdom,,609,[United Kingdom],"[{'label': 'display', 'lat': 51.51089772999349...",51.510898,-0.12099,,,,50eb1154e4b0541e75c25274
3,Best Value London Hotels,Hotel,2 Devonshire Terrace,GB,London,United Kingdom,,527,"[2 Devonshire Terrace, London, Greater London,...","[{'label': 'display', 'lat': 51.50361379162682...",51.503614,-0.12291,,W2 3DN,Greater London,544a2903498e333e2f510b01
4,Z Hotels Office,Office,45 Monmouth St,GB,London,United Kingdom,,619,"[45 Monmouth St, London, Greater London, Unite...","[{'label': 'display', 'lat': 51.51287841796875...",51.512878,-0.127164,,,Greater London,53eddd5a498e6be4b226555d
5,Preferred Hotels & Resorts,Office,1 Wilder Walk,GB,London,United Kingdom,,649,"[1 Wilder Walk, London, Greater London, W1B 5A...","[{'label': 'display', 'lat': 51.510494, 'lng':...",51.510494,-0.135523,,W1B 5AR,Greater London,5c90d1ee6f0aa2002c3de13e
6,Rocco Forte Hotels,Office,70 Jermyn St,GB,London,United Kingdom,,786,"[70 Jermyn St, London, Greater London, SW1Y 6N...","[{'label': 'display', 'lat': 51.50746481439803...",51.507465,-0.139002,,SW1Y 6NY,Greater London,4f0c30ade4b0dfc434e298fe
7,Design Hotels,Office,,GB,London,United Kingdom,,923,"[London, Greater London, United Kingdom]","[{'label': 'display', 'lat': 51.514835, 'lng':...",51.514835,-0.133289,,,Greater London,4e3924cad22dea80c52cf532
8,Marriott Hotels International,Building,86 Fetter Ln,GB,London,United Kingdom,,1686,"[86 Fetter Ln, London, Greater London, EC4A 1E...","[{'label': 'display', 'lat': 51.51716748003636...",51.517167,-0.109152,,EC4A 1EN,Greater London,4ff19e2fe4b02f36dc3fd764
9,JJW Hotels & Resorts,Office,6 Queen Street,GB,London,United Kingdom,,1346,"[6 Queen Street, London, Greater London, W1U 2...","[{'label': 'display', 'lat': 51.50729340171854...",51.507293,-0.147087,,W1U 2SJ,Greater London,4c07ca4a88ba9521fe17e88f


#### Foursquare application sometimes returns incorrect data, as seen in 'category' column. 'Office, Building, Spa' is actually a hotel, so we need to recategorize them and drop all other values

In [19]:
london_hotels_extract = london_hotels_extract[london_hotels_extract.categories.isin(['Hotel', 'Office','Building','Spa'])]
london_hotels_extract.categories = 'Hotel'
london_hotels_extract.reset_index(drop=True, inplace=True)
london_hotels_extract

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Millennium & Copthorne Hotels,Hotel,Stamford Bridge Fulham Road,GB,Chelsea,United Kingdom,Fulham Road,5165,"[Stamford Bridge Fulham Road (Fulham Road), Ch...","[{'label': 'display', 'lat': 51.4810573, 'lng'...",51.481057,-0.189093,,SW6 1HS,Greater London,4bc1e83c2a89ef3b78daf288
1,Best Value London Hotels,Hotel,2 Devonshire Terrace,GB,London,United Kingdom,,527,"[2 Devonshire Terrace, London, Greater London,...","[{'label': 'display', 'lat': 51.50361379162682...",51.503614,-0.12291,,W2 3DN,Greater London,544a2903498e333e2f510b01
2,Z Hotels Office,Hotel,45 Monmouth St,GB,London,United Kingdom,,619,"[45 Monmouth St, London, Greater London, Unite...","[{'label': 'display', 'lat': 51.51287841796875...",51.512878,-0.127164,,,Greater London,53eddd5a498e6be4b226555d
3,Preferred Hotels & Resorts,Hotel,1 Wilder Walk,GB,London,United Kingdom,,649,"[1 Wilder Walk, London, Greater London, W1B 5A...","[{'label': 'display', 'lat': 51.510494, 'lng':...",51.510494,-0.135523,,W1B 5AR,Greater London,5c90d1ee6f0aa2002c3de13e
4,Rocco Forte Hotels,Hotel,70 Jermyn St,GB,London,United Kingdom,,786,"[70 Jermyn St, London, Greater London, SW1Y 6N...","[{'label': 'display', 'lat': 51.50746481439803...",51.507465,-0.139002,,SW1Y 6NY,Greater London,4f0c30ade4b0dfc434e298fe
5,Design Hotels,Hotel,,GB,London,United Kingdom,,923,"[London, Greater London, United Kingdom]","[{'label': 'display', 'lat': 51.514835, 'lng':...",51.514835,-0.133289,,,Greater London,4e3924cad22dea80c52cf532
6,Marriott Hotels International,Hotel,86 Fetter Ln,GB,London,United Kingdom,,1686,"[86 Fetter Ln, London, Greater London, EC4A 1E...","[{'label': 'display', 'lat': 51.51716748003636...",51.517167,-0.109152,,EC4A 1EN,Greater London,4ff19e2fe4b02f36dc3fd764
7,JJW Hotels & Resorts,Hotel,6 Queen Street,GB,London,United Kingdom,,1346,"[6 Queen Street, London, Greater London, W1U 2...","[{'label': 'display', 'lat': 51.50729340171854...",51.507293,-0.147087,,W1U 2SJ,Greater London,4c07ca4a88ba9521fe17e88f
8,Four Seasons Hotels and Resorts | London World...,Hotel,7 Old Park Ln,GB,London,United Kingdom,,1593,"[7 Old Park Ln, London, Greater London, W1K 1Q...","[{'label': 'display', 'lat': 51.50409455565319...",51.504095,-0.150051,,W1K 1QR,Greater London,4dc92accb0fbf26798c46155
9,Imperial Hotel,Hotel,61-66 Russell Sq,GB,London,United Kingdom,,1620,"[61-66 Russell Sq, London, Greater London, WC1...","[{'label': 'display', 'lat': 51.52169709806958...",51.521697,-0.123935,,WC1B 5BB,Greater London,4b839ecdf964a5206e0b31e3


#### Still there are columns that are not needed for the further evaluation (eg. 'crossStreet'). These should be deleted

In [20]:
london_hotels_extract.drop(['crossStreet', 'labeledLatLngs', 'postalCode', 'state', 'country', 'city',], axis=1, inplace=True)
london_hotels_extract

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,name,categories,address,cc,distance,formattedAddress,lat,lng,neighborhood,id
0,Millennium & Copthorne Hotels,Hotel,Stamford Bridge Fulham Road,GB,5165,"[Stamford Bridge Fulham Road (Fulham Road), Ch...",51.481057,-0.189093,,4bc1e83c2a89ef3b78daf288
1,Best Value London Hotels,Hotel,2 Devonshire Terrace,GB,527,"[2 Devonshire Terrace, London, Greater London,...",51.503614,-0.12291,,544a2903498e333e2f510b01
2,Z Hotels Office,Hotel,45 Monmouth St,GB,619,"[45 Monmouth St, London, Greater London, Unite...",51.512878,-0.127164,,53eddd5a498e6be4b226555d
3,Preferred Hotels & Resorts,Hotel,1 Wilder Walk,GB,649,"[1 Wilder Walk, London, Greater London, W1B 5A...",51.510494,-0.135523,,5c90d1ee6f0aa2002c3de13e
4,Rocco Forte Hotels,Hotel,70 Jermyn St,GB,786,"[70 Jermyn St, London, Greater London, SW1Y 6N...",51.507465,-0.139002,,4f0c30ade4b0dfc434e298fe
5,Design Hotels,Hotel,,GB,923,"[London, Greater London, United Kingdom]",51.514835,-0.133289,,4e3924cad22dea80c52cf532
6,Marriott Hotels International,Hotel,86 Fetter Ln,GB,1686,"[86 Fetter Ln, London, Greater London, EC4A 1E...",51.517167,-0.109152,,4ff19e2fe4b02f36dc3fd764
7,JJW Hotels & Resorts,Hotel,6 Queen Street,GB,1346,"[6 Queen Street, London, Greater London, W1U 2...",51.507293,-0.147087,,4c07ca4a88ba9521fe17e88f
8,Four Seasons Hotels and Resorts | London World...,Hotel,7 Old Park Ln,GB,1593,"[7 Old Park Ln, London, Greater London, W1K 1Q...",51.504095,-0.150051,,4dc92accb0fbf26798c46155
9,Imperial Hotel,Hotel,61-66 Russell Sq,GB,1620,"[61-66 Russell Sq, London, Greater London, WC1...",51.521697,-0.123935,,4b839ecdf964a5206e0b31e3


## Given we have nice & clean table, we can visualize the results

In [22]:
london_map = folium.Map(location=[latitude, longitude], zoom_start=13)

# London
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='London',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(london_map)

# Hotels
for lat, lng, label in zip(london_hotels_extract.lat, london_hotels_extract.lng, london_hotels_extract.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(london_map)

london_map

***

# Methodology

## The whole project is about to find the best hotel

### We have collected & cleaned & visualized data, now its the time for the next steps
### As discussed, hotels are graded using their online rating and nearby venues, so now we will try to obtain rating using Foursquare API for all the hotels - we will loop through API database to get everything we need. Once completed it will allow us to point best 5 hotels

### Once best hotels are pointed, we again use Foursquare API to find the venues in 1km range from the hotel - this is the second requirement for our decision process. When all data is in place, we are going to use k-means clustering method for venue categorization and final data visualization on a London map

***

# Data Analysis

### First step in this section is to obtain ratings using Foursquare API. Using Hotel ID we are going to loop the database and fill the lists with the ratings

In [25]:
london_ratings = pd.DataFrame()
rating_list = pd.Series([]) 
id_list = pd.Series([]) 
for i in range(len(london_hotels_extract.id)):
    hotel_id = london_hotels_extract.id[i] # ID of Hotels
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(hotel_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    #Let's Check for Ratings, If Available then store it and if not, then store 0
    try:
        rating_list[i] = result['response']['venue']['rating'] 
        id_list[i] = london_hotels_extract.id[i]
    except:
        rating_list[i] = 0
        id_list[i] = london_hotels_extract.id[i]
         
        
london_ratings.insert(0, "ID", id_list, allow_duplicates=True)
london_ratings.insert(1, "Ratings", rating_list, allow_duplicates=True)
london_ratings

Unnamed: 0,ID,Ratings
0,4bc1e83c2a89ef3b78daf288,6.4
1,544a2903498e333e2f510b01,0.0
2,53eddd5a498e6be4b226555d,0.0
3,5c90d1ee6f0aa2002c3de13e,0.0
4,4f0c30ade4b0dfc434e298fe,0.0
5,4e3924cad22dea80c52cf532,0.0
6,4ff19e2fe4b02f36dc3fd764,0.0
7,4c07ca4a88ba9521fe17e88f,0.0
8,4dc92accb0fbf26798c46155,0.0
9,4b839ecdf964a5206e0b31e3,4.8


### Adding ratings to the Hotels table

In [27]:
london_ratings.columns = ['id', 'Ratings']
london_hotels_extract = pd.merge(london_hotels_extract,
                 london_ratings,
                 on='id')
london_hotels_extract

Unnamed: 0,name,categories,address,cc,distance,formattedAddress,lat,lng,neighborhood,id,Ratings
0,Millennium & Copthorne Hotels,Hotel,Stamford Bridge Fulham Road,GB,5165,"[Stamford Bridge Fulham Road (Fulham Road), Ch...",51.481057,-0.189093,,4bc1e83c2a89ef3b78daf288,6.4
1,Best Value London Hotels,Hotel,2 Devonshire Terrace,GB,527,"[2 Devonshire Terrace, London, Greater London,...",51.503614,-0.12291,,544a2903498e333e2f510b01,0.0
2,Z Hotels Office,Hotel,45 Monmouth St,GB,619,"[45 Monmouth St, London, Greater London, Unite...",51.512878,-0.127164,,53eddd5a498e6be4b226555d,0.0
3,Preferred Hotels & Resorts,Hotel,1 Wilder Walk,GB,649,"[1 Wilder Walk, London, Greater London, W1B 5A...",51.510494,-0.135523,,5c90d1ee6f0aa2002c3de13e,0.0
4,Rocco Forte Hotels,Hotel,70 Jermyn St,GB,786,"[70 Jermyn St, London, Greater London, SW1Y 6N...",51.507465,-0.139002,,4f0c30ade4b0dfc434e298fe,0.0
5,Design Hotels,Hotel,,GB,923,"[London, Greater London, United Kingdom]",51.514835,-0.133289,,4e3924cad22dea80c52cf532,0.0
6,Marriott Hotels International,Hotel,86 Fetter Ln,GB,1686,"[86 Fetter Ln, London, Greater London, EC4A 1E...",51.517167,-0.109152,,4ff19e2fe4b02f36dc3fd764,0.0
7,JJW Hotels & Resorts,Hotel,6 Queen Street,GB,1346,"[6 Queen Street, London, Greater London, W1U 2...",51.507293,-0.147087,,4c07ca4a88ba9521fe17e88f,0.0
8,Four Seasons Hotels and Resorts | London World...,Hotel,7 Old Park Ln,GB,1593,"[7 Old Park Ln, London, Greater London, W1K 1Q...",51.504095,-0.150051,,4dc92accb0fbf26798c46155,0.0
9,Imperial Hotel,Hotel,61-66 Russell Sq,GB,1620,"[61-66 Russell Sq, London, Greater London, WC1...",51.521697,-0.123935,,4b839ecdf964a5206e0b31e3,4.8


### As we see, many hotels dont have the ratings in place - let's remove them

In [28]:
london_hotels_extract = london_hotels_extract.set_index("Ratings")
london_hotels_extract = london_hotels_extract.drop(0.0, axis=0)
london_hotels_extract

Unnamed: 0_level_0,name,categories,address,cc,distance,formattedAddress,lat,lng,neighborhood,id
Ratings,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
6.4,Millennium & Copthorne Hotels,Hotel,Stamford Bridge Fulham Road,GB,5165,"[Stamford Bridge Fulham Road (Fulham Road), Ch...",51.481057,-0.189093,,4bc1e83c2a89ef3b78daf288
4.8,Imperial Hotel,Hotel,61-66 Russell Sq,GB,1620,"[61-66 Russell Sq, London, Greater London, WC1...",51.521697,-0.123935,,4b839ecdf964a5206e0b31e3
8.8,The Z Hotel Gloucester Place,Hotel,51 Gloucester Pl,GB,2436,"[51 Gloucester Pl, London, Greater London, W1U...",51.518184,-0.158186,,57076c36498eaefd5ce319da
7.6,The Tower Hotel,Hotel,St Katherine's Way,GB,3772,"[St Katherine's Way, London, Greater London, E...",51.506392,-0.073223,,4b27f875f964a520098d24e3
7.4,The Z Hotel Victoria,Hotel,5 Lower Belgrave St,GB,1815,"[5 Lower Belgrave St, London, Greater London, ...",51.495789,-0.146172,,4fe64f83e4b04318c4140c67
5.4,The Rathbone Hotel,Hotel,Rathbone St.,GB,1380,"[Rathbone St., London, Greater London, W1T 1LB...",51.518707,-0.135557,,4be52dd5d4f7c9b6b8232520
5.8,The Z Hotel Soho,Hotel,17 Moor St,GB,716,"[17 Moor St, London, Greater London, W1D 5AP, ...",51.513614,-0.129795,,4eb8731f30f8d0f18da0e82e


### Table is in place, so now we can take the top 5 hotels based on a rating

In [29]:
london_hotels_extract.sort_values('Ratings', ascending=False, inplace=True) 
london_hotels_extract = london_hotels_extract.head(5)
london_hotels_extract

Unnamed: 0_level_0,name,categories,address,cc,distance,formattedAddress,lat,lng,neighborhood,id
Ratings,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
8.8,The Z Hotel Gloucester Place,Hotel,51 Gloucester Pl,GB,2436,"[51 Gloucester Pl, London, Greater London, W1U...",51.518184,-0.158186,,57076c36498eaefd5ce319da
7.6,The Tower Hotel,Hotel,St Katherine's Way,GB,3772,"[St Katherine's Way, London, Greater London, E...",51.506392,-0.073223,,4b27f875f964a520098d24e3
7.4,The Z Hotel Victoria,Hotel,5 Lower Belgrave St,GB,1815,"[5 Lower Belgrave St, London, Greater London, ...",51.495789,-0.146172,,4fe64f83e4b04318c4140c67
6.4,Millennium & Copthorne Hotels,Hotel,Stamford Bridge Fulham Road,GB,5165,"[Stamford Bridge Fulham Road (Fulham Road), Ch...",51.481057,-0.189093,,4bc1e83c2a89ef3b78daf288
5.8,The Z Hotel Soho,Hotel,17 Moor St,GB,716,"[17 Moor St, London, Greater London, W1D 5AP, ...",51.513614,-0.129795,,4eb8731f30f8d0f18da0e82e


#### Column 'neighbourhood' at the end doesnt provide any useful information as well so we can delete this as well

In [30]:
london_hotels_extract.drop(['neighborhood',], axis=1, inplace=True)
london_hotels_extract

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0_level_0,name,categories,address,cc,distance,formattedAddress,lat,lng,id
Ratings,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
8.8,The Z Hotel Gloucester Place,Hotel,51 Gloucester Pl,GB,2436,"[51 Gloucester Pl, London, Greater London, W1U...",51.518184,-0.158186,57076c36498eaefd5ce319da
7.6,The Tower Hotel,Hotel,St Katherine's Way,GB,3772,"[St Katherine's Way, London, Greater London, E...",51.506392,-0.073223,4b27f875f964a520098d24e3
7.4,The Z Hotel Victoria,Hotel,5 Lower Belgrave St,GB,1815,"[5 Lower Belgrave St, London, Greater London, ...",51.495789,-0.146172,4fe64f83e4b04318c4140c67
6.4,Millennium & Copthorne Hotels,Hotel,Stamford Bridge Fulham Road,GB,5165,"[Stamford Bridge Fulham Road (Fulham Road), Ch...",51.481057,-0.189093,4bc1e83c2a89ef3b78daf288
5.8,The Z Hotel Soho,Hotel,17 Moor St,GB,716,"[17 Moor St, London, Greater London, W1D 5AP, ...",51.513614,-0.129795,4eb8731f30f8d0f18da0e82e


# Last point in data capture - Searching for venues for each hotel

### We have managed to get top 5 hotels in London as per table above - using their geological position we are able to get the venues that are placed in 1 km range (using Foursquare API)

In [46]:
d = {}
for i in range(5): #5 as this we have 5 hotels
    d[i] = pd.DataFrame()
    lat = london_hotels_extract['lat'].iloc[i]
    lng = london_hotels_extract['lng'].iloc[i]
    LIMIT=300
    radius = 1000
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&near={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, radius, LIMIT)
    results = requests.get(url).json()
    items = results['response']['groups'][0]['items']
    dataframe = json_normalize(items)
      
    # filter columns
    valid_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
    df = dataframe.loc[:, valid_columns]
    
    # filter the category for each row
    df['venue.categories'] = df.apply(get_category_type, axis=1)
    
    # clean columns
    df.columns = [col.split('.')[-1] for col in df.columns]
    
    d[i] = df

#### We should have all venues for these 5 hotels, so let's organize those tables

In [47]:
london_hotel1 = d[0]
london_hotel2 = d[1]
london_hotel3 = d[2]
london_hotel4 = d[3]
london_hotel5 = d[4]

#### As example let's look at venues table for 'The Z Hotel Gloucester Place'.

In [48]:
london_hotel1.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,DW Fitness First,Gym / Fitness Center,55 Baker St,GB,London,United Kingdom,,"[55 Baker St, London, Greater London, W1U 8EU,...","[{'label': 'display', 'lat': 51.51879825605098...",51.518798,-0.156122,,W1U 8EU,Greater London,5290a36511d23b77d9e8d1c4
1,The Z Hotel Gloucester Place,Hotel,51 Gloucester Pl,GB,London,United Kingdom,,"[51 Gloucester Pl, London, Greater London, W1U...","[{'label': 'display', 'lat': 51.51818357322121...",51.518184,-0.158186,,W1U 8JF,Greater London,57076c36498eaefd5ce319da
2,Carousel,Restaurant,71 Blandford St,GB,London,United Kingdom,,"[71 Blandford St, London, Greater London, W1U ...","[{'label': 'display', 'lat': 51.51799163692346...",51.517992,-0.156356,,W1U 8AB,Greater London,53f62a59498ea58f17690a49
3,JOE & THE JUICE,Juice Bar,7 Baker Street,GB,London,United Kingdom,,"[7 Baker Street, London, Greater London, W1U 3...","[{'label': 'display', 'lat': 51.51703278849285...",51.517033,-0.155232,,W1U 3AH,Greater London,58c6abda7b88a758faedc68e
4,Chiltern Firehouse,Modern European Restaurant,1 Chiltern St,GB,London,United Kingdom,,"[1 Chiltern St, London, Greater London, W1U 7P...","[{'label': 'display', 'lat': 51.51861880796264...",51.518619,-0.154835,,W1U 7PA,Greater London,5305f35711d21b05c826da58


#### As we see, again we have unwanted columns as we are working with fresh JSON data extracted from Foursquare - once again we need to undertake removal process

In [49]:
# Remove All the Unwanted Columns
london_hotel1.drop(['crossStreet', 'labeledLatLngs', 'postalCode', 'state', 'country', 'city', 'neighborhood', 'address', 'cc'], axis=1, inplace=True)
london_hotel2.drop(['crossStreet', 'labeledLatLngs', 'postalCode', 'state', 'country', 'city', 'neighborhood', 'address', 'cc'], axis=1, inplace=True)
london_hotel3.drop(['crossStreet', 'labeledLatLngs', 'postalCode', 'state', 'country', 'city', 'neighborhood', 'address', 'cc'], axis=1, inplace=True)
london_hotel4.drop(['crossStreet', 'labeledLatLngs', 'postalCode', 'state', 'country', 'city', 'neighborhood', 'address', 'cc'], axis=1, inplace=True)
london_hotel5.drop(['crossStreet', 'labeledLatLngs', 'postalCode', 'state', 'country', 'city', 'neighborhood', 'address', 'cc'], axis=1, inplace=True)

#### Let's see how it looks

In [50]:
london_hotel1.head()

Unnamed: 0,name,categories,formattedAddress,lat,lng,id
0,DW Fitness First,Gym / Fitness Center,"[55 Baker St, London, Greater London, W1U 8EU,...",51.518798,-0.156122,5290a36511d23b77d9e8d1c4
1,The Z Hotel Gloucester Place,Hotel,"[51 Gloucester Pl, London, Greater London, W1U...",51.518184,-0.158186,57076c36498eaefd5ce319da
2,Carousel,Restaurant,"[71 Blandford St, London, Greater London, W1U ...",51.517992,-0.156356,53f62a59498ea58f17690a49
3,JOE & THE JUICE,Juice Bar,"[7 Baker Street, London, Greater London, W1U 3...",51.517033,-0.155232,58c6abda7b88a758faedc68e
4,Chiltern Firehouse,Modern European Restaurant,"[1 Chiltern St, London, Greater London, W1U 7P...",51.518619,-0.154835,5305f35711d21b05c826da58


#### In the latest version of Folium, we cannot use special characters as this won't allow us to generate the map

In [51]:
london_hotel1 = london_hotel1.replace('\'','',regex=True)
london_hotel2 = london_hotel2.replace('\'','',regex=True)
london_hotel3 = london_hotel3.replace('\'','',regex=True)
london_hotel4 = london_hotel4.replace('\'','',regex=True)
london_hotel5 = london_hotel5.replace('\'','',regex=True)

# Now we can visualize the London map with hotels & Venues

In [52]:
london_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add a marker to represent The Z Gloucseter Place
folium.Marker(
    [london_hotels_extract['lat'].iloc[0], london_hotels_extract['lng'].iloc[0]],
    popup=london_hotels_extract['name'].iloc[0],
).add_to(london_map)

# add the Nearby Places as blue circle markers
for lat, lng, label in zip(london_hotel1.lat, london_hotel1.lng, london_hotel1.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(london_map)
    
# add a marker to represent The Tower Hotel
folium.Marker(
    [london_hotels_extract['lat'].iloc[1], london_hotels_extract['lng'].iloc[1]],
    popup=london_hotels_extract['name'].iloc[1],
).add_to(london_map)

# add the Nearby Places as green circle markers
for lat1, lng1, label1 in zip(london_hotel2.lat, london_hotel2.lng, london_hotel2.name):
    folium.features.CircleMarker(
        [lat1, lng1],
        radius=5,
        color='green',
        popup=label1,
        fill = True,
        fill_color='green',
        fill_opacity=0.6
    ).add_to(london_map)
    
# add a marker to represent The Z Hotel Victoria
folium.Marker(
    [london_hotels_extract['lat'].iloc[2], london_hotels_extract['lng'].iloc[2]],
    popup=london_hotels_extract['name'].iloc[2],
).add_to(london_map)

# add the Nearby Places as orange circle markers
for lat, lng, label in zip(london_hotel3.lat, london_hotel3.lng, london_hotel3.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='orange',
        popup=label,
        fill = True,
        fill_color='orange',
        fill_opacity=0.6
    ).add_to(london_map)
    
# add a marker to represent Millenium & Copthorne Hotel
folium.Marker(
    [london_hotels_extract['lat'].iloc[3], london_hotels_extract['lng'].iloc[3]],
    popup=london_hotels_extract['name'].iloc[3],
).add_to(london_map)

# add the Nearby Places as purple circle markers
for lat, lng, label in zip(london_hotel4.lat, london_hotel4.lng, london_hotel4.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='purple',
        popup=label,
        fill = True,
        fill_color='purple',
        fill_opacity=0.6
    ).add_to(london_map)
    
# add a marker to represent The Z Hotel Soho
folium.Marker(
    [london_hotels_extract['lat'].iloc[4], london_hotels_extract['lng'].iloc[4]],
    popup=london_hotels_extract['name'].iloc[4],
).add_to(london_map)

# add the Nearby Places as red circle markers
for lat, lng, label in zip(london_hotel5.lat, london_hotel5.lng, london_hotel5.name):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='red',
        popup=label,
        fill = True,
        fill_color='red',
        fill_opacity=0.6
    ).add_to(london_map)

# display map
london_map

## Looking at the map we can clearly see that 'Z' family hotels have most concentrated venues within the range. Given that 'Gloucester place' belongs to the 'Z' family and it has the highest rating - then we have our best choice of hotel in London!

#### For the user information we can create quick handbook about top5 hotels and the most popular venue within the range

In [108]:
d = {'Hotel': london_hotels_extract['name'], 'Most Popular Venue in area': [
    london_hotel1.groupby('categories',as_index=False).count().sort_values(by='name', ascending=False).iloc[1,0],
    london_hotel2.groupby('categories',as_index=False).count().sort_values(by='name', ascending=False).iloc[1,0],
    london_hotel3.groupby('categories',as_index=False).count().sort_values(by='name', ascending=False).iloc[1,0],
    london_hotel4.groupby('categories',as_index=False).count().sort_values(by='name', ascending=False).iloc[1,0],
    london_hotel5.groupby('categories',as_index=False).count().sort_values(by='name', ascending=False).iloc[1,0]
]}

Handbook = pd.DataFrame()

Handbook = pd.DataFrame(data=d)

In [109]:
Handbook

Unnamed: 0_level_0,Hotel,Most Popular Venue in area
Ratings,Unnamed: 1_level_1,Unnamed: 2_level_1
8.8,The Z Hotel Gloucester Place,Chinese Restaurant
7.6,The Tower Hotel,Coffee Shop
7.4,The Z Hotel Victoria,Italian Restaurant
6.4,Millennium & Copthorne Hotels,Café
5.8,The Z Hotel Soho,Ice Cream Shop


***

# Results and Discussion

This project has started from capturing the data about the hotels within the 10km range. Next using this data we have managed to obtain the ratings for them - this allowed us to select top 5 hotels within the range. As we know rating is not the only requirement for tourists, but a venues around. Given the hotel data we have managed to get the venues in a range of 1 km and succesfully visualized them on a map of London. When examining the map we can clearly see the biggest concentration of venues in the 'Z' family of hotels. The hotel with the highest rating overall is 'The Z Hotel Gloucester Place' - it does belong to the 'Z' family of hotels. Assuming that rating + venues concentration are these 2 main requirements we can state that 'The Z Hotel Gloucester Place' was chosen as the best place to stay in London.

# Future improvements

+ We can add to analysis the average price for a room and customer's preferences for the venue type, to prepare deep personalized analysis
+ We can enhance this project to analyze every big city in the world

# Conclusion

The purpose of this project was to help tourists to find the best place to stay - this is one of the most important factors when deciding about the trip. Preferences greatly differ for every individual, but there is one common value - "What is others opinion about this place" ? This has placed the fundamentals for this project, but we have gone 1 step ahead and tried to answer the second question "What i can do there once im placed in hotel" ? Combining these two charactersitics allows us to navigate to the best hotel in city - with highest rating and big concentration of venues nearby. We have made our choice in terms of which hotel is the best as per dataset, but we know that it still might not be the best for the person who is planning a trip. In order to satisfy the most people, we have created a small leaflet about the most popular venues around the top 5 places to stay - I'm sure this will be helpful to anyone going to London this year :) 

# Thank you for reaching to the end! - have a great day