In [2]:
conda install -c conda-forge folium

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  added / updated specs:
    - folium


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |           1_llvm           5 KB  conda-forge
    _py-xgboost-mutex-2.0      |            cpu_0           8 KB  conda-forge
    _pytorch_select-0.2        |            gpu_0           2 KB
    absl-py-0.12.0         

In [1]:
import requests
import pandas as pd
import folium

# Introduction

In every age and time, people have been moving around to the world for one reason or the other. When they move to a new city, they need to select a place where they want to buy new house or where they want to reside. While selecting a place they have several factors such as nearby coffee shops, restaurants and more.  
So to find a place of their choice they have to physically go in different neighborhoods in the city or search thoroughly on the internet, which could be cumbersome.  And it is also difficult to have a comparison of all neighborhoods in a big city. The aim of this project is to cluster neighborhoods based on the top 10 factors/venues  and make it easy for the people to decide which neighborhood to choose to buy a house. 


# Data 

The solution is specifically provided for the Toronto City that is for the people who are moving to Toronto. For this data for neighborhood along with their postal codes has been collected from Wikipedia page: 'https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1011037969'

To scrap the data about neighborhoods of canada from wikipedia page

In [2]:
url='https://en.wikipedia.org/w/index.php?title=List_of_postal_codes_of_Canada:_M&oldid=1011037969'
urlcontent=requests.get(url).content
df_list=pd.read_html(urlcontent)
df=df_list[0]
df.head()


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [6]:
df.shape

(180, 3)

Neighborhood with no borough assigned needs to be deleted from the data

In [3]:

df=df[df.Borough!='Not assigned']
df.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


To get Longitude Latitude of each neighborhood that we will need in Foursquare API for getting venues

In [4]:
path='https://cocl.us/Geospatial_data'
Geo_df=pd.read_csv(path)
Geo_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge this Longitude,Latitude with the neighborhood data 

In [5]:
#merging to dataframes 
TorontoN=pd.merge(left=df,right=Geo_df,left_on='Postal Code',right_on='Postal Code')
TorontoN.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [10]:
TorontoN.shape

(103, 5)

In [6]:
import json # library to handle JSON files

from pandas import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim


In [8]:
address = 'Toronto'
geolocator = Nominatim(user_agent="t_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


visualize neighborhoods on map

In [182]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(TorontoN['Latitude'], TorontoN['Longitude'], TorontoN['Neighbourhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

# Foursquare credentials

In [9]:
CLIENT_ID = 'OGUHTKTJWO1XUHLTFMD23CJYXPECZFTSFK2IFPLBUKFT5QNA' 
CLIENT_SECRET = 'IHEJ2CUC1Z2VJAWTRAQCAMXHMUQHHWSHPG0A2CLQ5EP54AGK' 
ACCESS_TOKEN = 'KVFYVABYJAHCA1JQTSMELUD03R44H5KMLKHDMISVQL3YMLX4' # FourSquare Access Token
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value


getting venues of every neighborhood in each borough in Toronto City

In [10]:

radius=1000
venues_list=[]
for lat, lng, borough, neighborhood in zip(TorontoN['Latitude'], TorontoN['Longitude'], TorontoN['Borough'], TorontoN['Neighbourhood']):
    #search_query= neighborhood
    url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    lat, 
    lng, 
    radius, 
    LIMIT)
    results=requests.get(url).json()['response']['groups'][0]['items']
    
     # return only relevant information for each nearby venue
    venues_list.append([(
            neighborhood, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

In [11]:
nearby_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.759840,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
2,Parkwoods,43.753259,-79.329656,Tim Hortons,43.760668,-79.326368,Café
3,Parkwoods,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food Restaurant
4,Parkwoods,43.753259,-79.329656,Bruno's valu-mart,43.746143,-79.324630,Grocery Store
...,...,...,...,...,...,...,...
4878,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Mr.Sub,43.636174,-79.520655,Restaurant
4879,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Queensway Fish & Chips,43.621720,-79.524588,Fish & Chips Shop
4880,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Global Pet Foods,43.621304,-79.526146,Pet Store
4881,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Sleep Country,43.621340,-79.526708,Mattress Store


In [12]:
#how many unique venue types we have in Toronto neighborhoods
len(nearby_venues['Venue Category'].unique())

330

In [13]:
#display unique venues
nearby_venues['Venue Category'].unique()

array(['Caribbean Restaurant', 'Park', 'Café', 'Fast Food Restaurant',
       'Grocery Store', 'Fish & Chips Shop', 'Pharmacy', 'Supermarket',
       'Pizza Place', 'Food & Drink Shop', 'Intersection', 'Bus Stop',
       'Train Station', 'Discount Store', 'Laundry Service', 'ATM',
       'Chinese Restaurant', 'Coffee Shop', 'Convenience Store',
       'Shopping Mall', 'Skating Rink', 'Tennis Court', 'Cosmetics Shop',
       'Shop & Service', 'Hockey Arena', 'Portuguese Restaurant',
       'Golf Course', 'French Restaurant', 'Playground',
       'Sporting Goods Shop', "Men's Store", 'Lounge',
       'Gym / Fitness Center', 'Bakery', 'Distribution Center',
       'Restaurant', 'Historic Site', 'Farmers Market', 'Chocolate Shop',
       'Mediterranean Restaurant', 'Pub', 'Italian Restaurant',
       'Dessert Shop', 'Performing Arts Venue', 'Spa', 'Breakfast Spot',
       'Liquor Store', 'Tech Startup', 'Thai Restaurant',
       'Greek Restaurant', 'Mexican Restaurant', 'Pool', 'Yoga Studi

In [15]:
#getting top 20 venues with highest value count 
top20=nearby_venues['Venue Category'].value_counts()[:20].index.tolist()
top20

['Coffee Shop',
 'Café',
 'Park',
 'Pizza Place',
 'Restaurant',
 'Italian Restaurant',
 'Bakery',
 'Grocery Store',
 'Sandwich Place',
 'Japanese Restaurant',
 'Sushi Restaurant',
 'Fast Food Restaurant',
 'Bank',
 'Gym',
 'Hotel',
 'Pharmacy',
 'Thai Restaurant',
 'Pub',
 'Bar',
 'Indian Restaurant']

We can use use top 10 venues to define simiarity among neighborhood.
This could be done better with an app that asks user to choose from the list of venues he/she wants to have in the neighborhoods 

In [40]:
#make a list of top 10 venues
top10=nearby_venues['Venue Category'].value_counts()[:10].index.tolist()
top10

['Coffee Shop',
 'Café',
 'Park',
 'Pizza Place',
 'Restaurant',
 'Italian Restaurant',
 'Bakery',
 'Grocery Store',
 'Sandwich Place',
 'Japanese Restaurant']

In [41]:
#to make our venues as features
# one hot encoding
venues_onehot = pd.get_dummies(nearby_venues[['Venue Category']], prefix="", prefix_sep="")
# add column back to dataframe
venues_onehot['Neighborhood'] = nearby_venues['Neighborhood']
venues_onehot['Longitude'] = nearby_venues['Neighborhood Longitude'] 
venues_onehot['Latitude'] = nearby_venues['Neighborhood Latitude'] 


cols = list(venues_onehot)
# move the column to head of list using index, pop and insert
cols.insert(0, cols.pop(cols.index('Neighborhood')))
venues_onehot = venues_onehot[cols]

venues_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,...,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Longitude,Latitude
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,-79.329656,43.753259
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,-79.329656,43.753259
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,-79.329656,43.753259
3,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,-79.329656,43.753259
4,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,-79.329656,43.753259


In [19]:
# grouping venues based on neighborhood
venues_grouped = venues_onehot.groupby('Neighborhood').mean().reset_index()
venues_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,...,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Longitude,Latitude
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,-79.262029,43.794200
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,-79.543484,43.602414
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,-79.442259,43.754328
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,-79.385975,43.786947
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,-79.419750,43.733283
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,-79.442259,43.782736
94,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,-79.216917,43.770992
95,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,-79.318389,43.695344
96,York Mills West,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,0.0,-79.400049,43.752758


In [20]:
venues_grouped.shape

(98, 332)

In [70]:
# selecting top 10 venues with neighborhood data  
top10_venues=top10.copy()
top10_venues.insert(0,'Neighborhood')
top10_venues.append("Longitude")
top10_venues.append("Latitude")
top_venues=venues_grouped[top10_venues]
top_venues.head()

Unnamed: 0,Neighborhood,Coffee Shop,Café,Park,Pizza Place,Restaurant,Italian Restaurant,Bakery,Grocery Store,Sandwich Place,Japanese Restaurant,Longitude,Latitude
0,Agincourt,0.02381,0.0,0.0,0.02381,0.02381,0.0,0.047619,0.02381,0.047619,0.0,-79.262029,43.7942
1,"Alderwood, Long Branch",0.041667,0.0,0.083333,0.083333,0.0,0.0,0.0,0.041667,0.041667,0.0,-79.543484,43.602414
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0625,0.0,0.0625,0.0625,0.03125,0.0,0.0,0.0,0.03125,0.0,-79.442259,43.754328
3,Bayview Village,0.0,0.066667,0.066667,0.0,0.066667,0.0,0.0,0.133333,0.0,0.133333,-79.385975,43.786947
4,"Bedford Park, Lawrence Manor East",0.073171,0.02439,0.02439,0.02439,0.02439,0.073171,0.02439,0.02439,0.04878,0.0,-79.41975,43.733283


In [65]:
top_venues.describe()

Unnamed: 0,Coffee Shop,Café,Park,Pizza Place,Restaurant,Italian Restaurant,Bakery,Grocery Store,Sandwich Place,Japanese Restaurant,Longitude,Latitude,Clus_Db
count,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0,98.0
mean,0.074706,0.032823,0.054821,0.040442,0.027067,0.021256,0.023179,0.024924,0.019478,0.014473,-79.396574,43.701868,-1.0
std,0.047264,0.035933,0.087594,0.046958,0.032005,0.031845,0.025775,0.033571,0.021244,0.022787,0.095902,0.051503,0.0
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-79.615819,43.602414,-1.0
25%,0.05,0.0,0.01,0.0025,0.0,0.0,0.0,0.0,0.0,0.0,-79.455622,43.658346,-1.0
50%,0.070714,0.021981,0.03,0.0241,0.020842,0.01,0.02,0.01,0.01,0.0,-79.38879,43.694563,-1.0
75%,0.1,0.065957,0.068391,0.0625,0.04,0.03,0.037448,0.035406,0.032006,0.03,-79.342066,43.743405,-1.0
max,0.25,0.133333,0.75,0.272727,0.222222,0.2,0.111111,0.176471,0.086957,0.133333,-79.160497,43.815252,-1.0


# Modeling 

As we donot know how many how similar or dissimilar the neighborhoods are and how many clusters should have formed, we are going to use DBSCAN for clustering.
The value for eps has been selected after testing values ranging from .003 to 10 with different intervals in between 

In [43]:
data=top_venues[top10]

In [44]:
from sklearn.preprocessing import StandardScaler
scaler= StandardScaler()
S_data=scaler.fit_transform(data)

In [175]:
from sklearn.cluster import DBSCAN
import sklearn.utils
from sklearn.preprocessing import StandardScaler
import numpy as np


#Compute DBSCAN
db = DBSCAN(eps=1.5, min_samples=3).fit(S_data)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

realClusterNum=len(set(labels)) - (1 if -1 in labels else 0)
clusterNum = len(set(labels)) 

top_venues["Clus_Db"]=labels


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [176]:
set(labels)

{-1, 0, 1, 2}

# Visualize final clusters formed based on the similarity given by DBSCAN

In [166]:
#cluster map
# create map
import numpy as np
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
kclusters=len(set(labels))
print(kclusters)
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(top_venues['Latitude'], top_venues['Longitude'], top_venues['Neighborhood'], top_venues['Clus_Db']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

4


# Examin Clusters


## Cluster 1

In [167]:
top_venues.loc[top_venues['Clus_Db'] == 0,top10]

Unnamed: 0,Coffee Shop,Café,Park,Pizza Place,Restaurant,Italian Restaurant,Bakery,Grocery Store,Sandwich Place,Japanese Restaurant
5,0.12,0.07,0.04,0.01,0.03,0.01,0.02,0.01,0.01,0.04
7,0.05,0.07,0.02,0.01,0.05,0.02,0.04,0.0,0.02,0.01
13,0.09,0.04,0.03,0.02,0.01,0.01,0.01,0.01,0.01,0.03
15,0.1,0.02,0.03,0.02,0.02,0.04,0.0,0.02,0.01,0.04
18,0.11,0.07,0.02,0.0,0.04,0.02,0.02,0.0,0.01,0.03
29,0.1,0.06,0.02,0.0,0.04,0.02,0.01,0.0,0.01,0.03
31,0.09,0.03,0.01,0.02,0.02,0.03,0.02,0.01,0.01,0.04
35,0.09,0.07,0.04,0.01,0.03,0.01,0.01,0.0,0.0,0.04
49,0.03,0.08,0.02,0.03,0.06,0.04,0.04,0.0,0.01,0.02
63,0.08,0.04,0.04,0.02,0.02,0.03,0.0,0.01,0.01,0.05


Neighborhoods  inlcuded in this cluster as shown on the map with red color,has higher number of coffee shops, cafes, parks, Restaurants and specifically Japanees Restaurants as compare to other clusters. So these neighborhoods are most likely have a good number of Japanees people around and a good fit for Japanees.

In [177]:
top_venues.loc[top_venues['Clus_Db'] == 1, top10]

Unnamed: 0,Coffee Shop,Café,Park,Pizza Place,Restaurant,Italian Restaurant,Bakery,Grocery Store,Sandwich Place,Japanese Restaurant
8,0.0625,0.0,0.083333,0.0625,0.0,0.041667,0.041667,0.020833,0.0,0.0
19,0.09,0.04,0.01,0.04,0.03,0.09,0.01,0.0,0.02,0.01
20,0.12,0.04,0.01,0.03,0.04,0.06,0.01,0.01,0.02,0.01
23,0.071429,0.0,0.02381,0.02381,0.047619,0.02381,0.02381,0.0,0.02381,0.0
26,0.064516,0.075269,0.043011,0.032258,0.010753,0.010753,0.021505,0.010753,0.021505,0.0
36,0.05,0.08,0.03,0.02,0.01,0.03,0.02,0.02,0.02,0.0
41,0.051282,0.025641,0.038462,0.012821,0.038462,0.012821,0.025641,0.025641,0.025641,0.012821
44,0.04,0.1,0.03,0.01,0.01,0.0,0.01,0.01,0.02,0.02
48,0.068966,0.0,0.0,0.0,0.034483,0.017241,0.017241,0.051724,0.034483,0.017241
56,0.085106,0.042553,0.042553,0.021277,0.042553,0.06383,0.021277,0.0,0.0,0.0


This cluster of neighbors has higher number of coffee shops, some parks, pizza places and restaurants but very few  other venues.

In [178]:
top_venues.loc[top_venues['Clus_Db'] == 2, top10]

Unnamed: 0,Coffee Shop,Café,Park,Pizza Place,Restaurant,Italian Restaurant,Bakery,Grocery Store,Sandwich Place,Japanese Restaurant
4,0.073171,0.02439,0.02439,0.02439,0.02439,0.073171,0.02439,0.02439,0.04878,0.0
53,0.078125,0.03125,0.046875,0.03125,0.03125,0.078125,0.015625,0.0625,0.03125,0.015625
78,0.092105,0.039474,0.039474,0.026316,0.026316,0.052632,0.013158,0.039474,0.039474,0.013158


The above data shows that this cluster has only three neighborhoods and these neighborhoods have many coffee shops and italian restaurants but at the same time it has also more cafes, parks, pizza places and all other venues compared to neighborhoods in other clusters. So this could be a good choice of neighborhood mainly for itallian food lover.

In [180]:
top_venues.loc[top_venues['Clus_Db'] == -1, top10]

Unnamed: 0,Coffee Shop,Café,Park,Pizza Place,Restaurant,Italian Restaurant,Bakery,Grocery Store,Sandwich Place,Japanese Restaurant
0,0.023810,0.000000,0.000000,0.023810,0.023810,0.0,0.047619,0.023810,0.047619,0.000000
1,0.041667,0.000000,0.083333,0.083333,0.000000,0.0,0.000000,0.041667,0.041667,0.000000
2,0.062500,0.000000,0.062500,0.062500,0.031250,0.0,0.000000,0.000000,0.031250,0.000000
3,0.000000,0.066667,0.066667,0.000000,0.066667,0.0,0.000000,0.133333,0.000000,0.133333
6,0.000000,0.071429,0.142857,0.000000,0.071429,0.0,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...
93,0.100000,0.000000,0.100000,0.100000,0.000000,0.0,0.100000,0.000000,0.000000,0.000000
94,0.250000,0.000000,0.250000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000
95,0.103448,0.068966,0.103448,0.068966,0.000000,0.0,0.000000,0.000000,0.068966,0.000000
96,0.055556,0.000000,0.166667,0.000000,0.222222,0.0,0.000000,0.055556,0.000000,0.000000


The above neighborhoods are consider as noise by the DBSCAN. Looking at the above data we can see that DBSCAN consider it as noisy because most of the neighborhoods in this cluster has zero values. 