## Introduction

In this project, I use the Foursquare API to explore municipalities in Hudson County, NJ. I use the explore function to get the most common venue categories in each municipality, and then use this feature to group the municipalities into 3 clusters. 

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Municipalities in Hudson County, NJ</a>

3. <a href="#item3">Analyze Each Municipality</a>

4. <a href="#item4">Cluster Municipalities</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

#### Solve the environment and import packages.

In [1]:
from bs4 import BeautifulSoup
import requests

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as np

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



## 1. Download and Explore Dataset

Neighborhood has a total of 565 counties and 565 municipalities. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 

This dataset can be found in this link: 

#### Load and explore the data

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_municipalities_in_New_Jersey'
html_file = requests.get(url).text

In [3]:
soup = BeautifulSoup(html_file, 'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of municipalities in New Jersey - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"XeyybgpAADkAAHX2z4YAAABW","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_municipalities_in_New_Jersey","wgTitle":"List of municipalities in New Jersey","wgCurRevisionId":927424903,"wgRevisionId":927424903,"wgArticleId":1125658,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories"

In [4]:
table= soup.find('table')
print(table)

<table border="0" cellpadding="2" cellspacing="3" class="wikitable sortable">
<tbody><tr>
<th>2010 Rank
</th>
<th>Municipality
</th>
<th>County
</th>
<th>Population in 2010
</th>
<th>Population<br/>in 2017<sup class="reference" id="cite_ref-3"><a href="#cite_note-3">[3]</a></sup>
</th>
<th>Municipal<br/>type
</th>
<th>Form of<br/>government
</th>
<th>Community<br/>established
</th>
<th>Incorporated<sup class="reference" id="cite_ref-4"><a href="#cite_note-4">[4]</a></sup>
</th></tr>
<tr>
<td>1
</td>
<td><a href="/wiki/Newark,_New_Jersey" title="Newark, New Jersey">Newark</a></td>
<td><a href="/wiki/Essex_County,_New_Jersey" title="Essex County, New Jersey">Essex</a>
</td>
<td>277,140</td>
<td>285,154</td>
<td>City</td>
<td><a href="/wiki/Faulkner_Act_(mayor%E2%80%93council)" title="Faulkner Act (mayor–council)">Faulkner Act (mayor–council)</a></td>
<td>1666</td>
<td>1693<sup class="reference" id="cite_ref-5"><a href="#cite_note-5">[note 1]</a></sup>
</td></tr>
<tr>
<td>2
</td>
<td><a h

#### Transform the data into a pandas dataframe

In [5]:
columns = []
for head in table.find_all('th'):
    col = head.text.strip('\n')
    columns.append(col)
print(columns)

['2010 Rank', 'Municipality', 'County', 'Population in 2010', 'Populationin 2017[3]', 'Municipaltype', 'Form ofgovernment', 'Communityestablished', 'Incorporated[4]']


In [6]:
values = []
for tr in table.find_all('tr'):
    if tr.td is not None:
        value = []
        for td in tr.find_all('td'):
            value.append(td.text.strip('\n'))
        values.append(value) 
values
        

[['1',
  'Newark',
  'Essex',
  '277,140',
  '285,154',
  'City',
  'Faulkner Act (mayor–council)',
  '1666',
  '1693[note 1]'],
 ['2',
  'Jersey City',
  'Hudson',
  '247,597',
  '270,753',
  'City',
  'Faulkner Act (mayor–council)',
  '1630',
  '1838'],
 ['3',
  'Paterson',
  'Passaic',
  '146,199',
  '148,678',
  'City',
  'Faulkner Act (mayor–council)',
  '1791',
  '1831[note 2]'],
 ['4',
  'Elizabeth',
  'Union',
  '124,969',
  '130,215',
  'City',
  'Faulkner Act (mayor–council)',
  '1664',
  '1855'],
 ['5',
  'Edison',
  'Middlesex',
  '99,967',
  '102,450',
  'Township',
  'Faulkner Act (mayor–council)',
  '1666[5]',
  '1870[note 3]'],
 ['6',
  'Woodbridge Township',
  'Middlesex',
  '99,585',
  '101,965',
  'Township',
  'Faulkner Act (mayor–council)',
  '1664',
  '1798'],
 ['7',
  'Lakewood Township',
  'Ocean',
  '92,843',
  '102,682',
  'Township',
  'Township (New Jersey)',
  '1750',
  '1892'],
 ['8',
  'Toms River',
  'Ocean',
  '91,239',
  '93,017',
  'Township',
  'Faul

In [7]:
nj_data = pd.DataFrame(values, columns = columns)
nj_data.head()

Unnamed: 0,2010 Rank,Municipality,County,Population in 2010,Populationin 2017[3],Municipaltype,Form ofgovernment,Communityestablished,Incorporated[4]
0,1,Newark,Essex,277140,285154,City,Faulkner Act (mayor–council),1666,1693[note 1]
1,2,Jersey City,Hudson,247597,270753,City,Faulkner Act (mayor–council),1630,1838
2,3,Paterson,Passaic,146199,148678,City,Faulkner Act (mayor–council),1791,1831[note 2]
3,4,Elizabeth,Union,124969,130215,City,Faulkner Act (mayor–council),1664,1855
4,5,Edison,Middlesex,99967,102450,Township,Faulkner Act (mayor–council),1666[5],1870[note 3]


#### Extract the data of Hudson County

In [8]:
hud_data = nj_data.loc[nj_data['County']=='Hudson', ['Municipality', 'County']].reset_index(drop = True)
print('There are {} municipalities in Hudson County, NJ'.format(hud_data.shape[0]))

There are 12 municipalities in Hudson County, NJ


In [9]:
Latitude = []
Longitude = []
for city in hud_data.Municipality:
    address = city + ', NJ'
    geolocator = Nominatim(user_agent="nj_explorer")
    location = geolocator.geocode(address)
    Latitude.append(location.latitude)
    Longitude.append(location.longitude)
hud_data['Latitude'] = Latitude
hud_data['Longitude'] = Longitude
hud_data  

Unnamed: 0,Municipality,County,Latitude,Longitude
0,Jersey City,Hudson,40.728158,-74.077642
1,Union City,Hudson,40.779545,-74.023751
2,Bayonne,Hudson,40.668714,-74.114309
3,North Bergen,Hudson,40.804267,-74.012084
4,Hoboken,Hudson,40.743307,-74.032375
5,West New York,Hudson,40.785529,-74.0083
6,Kearny,Hudson,40.768434,-74.145421
7,Secaucus,Hudson,40.789929,-74.056674
8,Harrison,Hudson,40.74649,-74.156255
9,Weehawken,Hudson,40.769546,-74.020418


#### Use geopy library to get the latitude and longitude values of Hudson County.

In [10]:
address = 'Hudson County, NJ'    
geolocator = Nominatim(user_agent="nj_explorer")
location = geolocator.geocode(address)
hud_lat = location.latitude
hud_long = location.longitude
print('The geograpical coordinate of Hudson county are {}, {}.'.format(hud_lat, hud_long))

The geograpical coordinate of Hudson county are 40.7381635, -74.0550731.


#### Create a map of Hudson County with its municipalities superimposed on top.

In [11]:
hud_map = folium.Map(location = [hud_lat, hud_long], zoom_start = 12)
for city, lat, lng in zip(hud_data['Municipality'], hud_data['Latitude'], hud_data['Longitude']):
    label = '{}, {}'.format(city, 'Hudson')
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(hud_map) 
hud_map

In [26]:
from pandas.io.json import json_normalize
i = 0
venue_list = []

for city in hud_data['Municipality']:
    city_lat = hud_data.loc[i, 'Latitude']
    city_lng = hud_data.loc[i, 'Longitude']
    
    CLIENT_ID = 'NY3Y0NMSA2AMHLPV521V4TUKK42A5WAUIHRQONMJAOJ2L1MO'
    CLIENT_SECRET = 'HJIGFFCSETXGCXLFQ5D2R5KPEH2I4DUWV1CEW01J5WKPXMRP'
    VERSION = '20191212'
    radius = 1000
    limit = 100
    
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, city_lat, city_lng, VERSION, radius, limit)
    url
    
    results = requests.get(url).json()
    
    venues = results['response']['groups'][0]['items']
    nearby_venues = json_normalize(venues) # flatten JSON
    
    filtered_col = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    nearby_venues = nearby_venues[filtered_col]
    category_list = []
    for category in nearby_venues['venue.categories']:
        category = category[0]['name']
        category_list.append(category)
    nearby_venues['venue.categories'] = category_list
    
    for v in nearby_venues['venue.name']:
        venue_list.append([city, 
                           city_lat, 
                           city_lng, 
                           v, 
                           nearby_venues.loc[nearby_venues['venue.name']== v].values[0][1],
                           nearby_venues.loc[nearby_venues['venue.name']== v].values[0][2],
                           nearby_venues.loc[nearby_venues['venue.name']== v].values[0][3]])
    
    i = i + 1
hud_venues = pd.DataFrame(venue_list, columns = ['Municipality', 
                                                    'Municipality Latitude',
                                                    'Municipality Longitude',
                                                    'Venue',
                                                    'Venue Category',
                                                    'Venue Latitude', 
                                                    'Venue Longitude'])
hud_venues.head()

Unnamed: 0,Municipality,Municipality Latitude,Municipality Longitude,Venue,Venue Category,Venue Latitude,Venue Longitude
0,Jersey City,40.728158,-74.077642,Fiesta Grill,Filipino Restaurant,40.727928,-74.075945
1,Jersey City,40.728158,-74.077642,Lincoln Park,Park,40.724137,-74.083686
2,Jersey City,40.728158,-74.077642,Gusto Latino,Latin American Restaurant,40.725869,-74.07721
3,Jersey City,40.728158,-74.077642,15 Fox Place,Italian Restaurant,40.733937,-74.072321
4,Jersey City,40.728158,-74.077642,Wonder Bagels,Bagel Shop,40.734344,-74.080727


Let's check how many venues were returned for each neighborhood

In [27]:
hud_venues.groupby('Municipality').count()

Unnamed: 0_level_0,Municipality Latitude,Municipality Longitude,Venue,Venue Category,Venue Latitude,Venue Longitude
Municipality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bayonne,97,97,97,97,97,97
East Newark,39,39,39,39,39,39
Guttenberg,100,100,100,100,100,100
Harrison,61,61,61,61,61,61
Hoboken,100,100,100,100,100,100
Jersey City,58,58,58,58,58,58
Kearny,30,30,30,30,30,30
North Bergen,84,84,84,84,84,84
Secaucus,76,76,76,76,76,76
Union City,100,100,100,100,100,100


In [25]:
print('There are {} uniques categories.'.format(len(hud_venues['Venue Category'].unique())))

There are 170 uniques categories.


3. Analyze Each Neighborhood

In [21]:
# one hot encoding
hud_onehot = pd.get_dummies(hud_venues['Venue Category'], prefix = '', prefix_sep = '')
hud_onehot['Municipality'] = hud_venues['Municipality']
ordered_col = [hud_onehot.columns[-1]] + list(hud_onehot.columns[:-1])
hud_onehot = hud_onehot[ordered_col]
hud_onehot.head()

Unnamed: 0,Municipality,American Restaurant,Arcade,Argentinian Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Big Box Store,Bike Shop,Board Shop,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Stop,Business Service,Butcher,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Colombian Restaurant,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner,Discount Store,Doctor's Office,Dog Run,Donut Shop,Electronics Store,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Flower Shop,Food,Food Court,Food Truck,Football Stadium,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Gas Station,General Entertainment,General Travel,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Heliport,Hotel,Housing Development,Ice Cream Shop,Indian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Laundromat,Laundry Service,Light Rail Station,Liquor Store,Lounge,Market,Martial Arts Dojo,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Motorcycle Shop,Moving Target,Multiplex,Music Store,Music Venue,Office,Optical Shop,Other Great Outdoors,Paper / Office Supplies Store,Park,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Post Office,Pub,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,South American Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toll Plaza,Track,Trail,Train,Train Station,Video Game Store,Video Store,Warehouse Store,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Jersey City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Jersey City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Jersey City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Jersey City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Jersey City,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [23]:
hud_onehot.shape

(890, 171)

Next, let's group rows by municipality and by taking the mean of the frequency of occurrence of each category

In [53]:
hud_grouped = hud_onehot.groupby('Municipality').mean().reset_index()
hud_grouped.head()

Unnamed: 0,Municipality,American Restaurant,Arcade,Argentinian Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Big Box Store,Bike Shop,Board Shop,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Stop,Business Service,Butcher,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Colombian Restaurant,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Dance Studio,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner,Discount Store,Doctor's Office,Dog Run,Donut Shop,Electronics Store,Empanada Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Flower Shop,Food,Food Court,Food Truck,Football Stadium,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Gas Station,General Entertainment,General Travel,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Heliport,Hotel,Housing Development,Ice Cream Shop,Indian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Laundromat,Laundry Service,Light Rail Station,Liquor Store,Lounge,Market,Martial Arts Dojo,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Motorcycle Shop,Moving Target,Multiplex,Music Store,Music Venue,Office,Optical Shop,Other Great Outdoors,Paper / Office Supplies Store,Park,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Post Office,Pub,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,South American Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Toll Plaza,Track,Trail,Train,Train Station,Video Game Store,Video Store,Warehouse Store,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Bayonne,0.030928,0.0,0.0,0.0,0.010309,0.0,0.010309,0.030928,0.010309,0.030928,0.0,0.010309,0.0,0.0,0.0,0.010309,0.0,0.0,0.010309,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020619,0.0,0.010309,0.0,0.0,0.0,0.020619,0.020619,0.0,0.0,0.0,0.030928,0.020619,0.0,0.010309,0.010309,0.0,0.0,0.020619,0.0,0.0,0.0,0.0,0.030928,0.0,0.0,0.0,0.0,0.010309,0.0,0.010309,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.010309,0.010309,0.010309,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.020619,0.0,0.0,0.010309,0.030928,0.020619,0.0,0.0,0.010309,0.0,0.0,0.0,0.010309,0.0,0.0,0.020619,0.0,0.0,0.0,0.010309,0.0,0.0,0.0,0.010309,0.0,0.030928,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020619,0.051546,0.020619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010309,0.010309,0.030928,0.0,0.0,0.0,0.010309,0.0,0.010309,0.0,0.010309,0.0,0.0,0.0,0.020619,0.020619,0.0,0.0,0.010309,0.010309,0.020619,0.0,0.010309,0.010309,0.0,0.0,0.010309,0.0,0.0,0.0,0.0,0.010309,0.0,0.0,0.020619,0.020619,0.010309,0.0,0.010309,0.010309,0.0
1,East Newark,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.051282,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.0,0.051282,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.025641,0.0,0.051282,0.0,0.0,0.0,0.0,0.025641,0.0,0.051282,0.0,0.0,0.0,0.0,0.025641,0.076923,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.051282,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Guttenberg,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.05,0.04,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.07,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.11,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.01,0.0,0.02,0.1,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Harrison,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.016393,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.065574,0.0,0.0,0.016393,0.032787,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.032787,0.0,0.0,0.0,0.04918,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.016393,0.0,0.0,0.0,0.032787,0.0,0.032787,0.0,0.0,0.0,0.032787,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.016393,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.04918,0.065574,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Hoboken,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.07,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.02,0.0,0.03,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.03,0.02,0.0,0.0,0.02,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.04,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.06,0.0,0.03,0.0,0.05,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02


Let's print each neighborhood along with the top 5 most common venues

In [67]:
num_top_venues = 5

for city in hud_grouped['Municipality']:
    print('-----'+city+'-----')
    temp = hud_grouped[hud_grouped['Municipality'] == city].T.reset_index()
    temp = temp.iloc[1:]
    temp.columns = ['Venue', 'Frequency']
    temp['Frequency'] = temp['Frequency'].astype(float)
    temp = temp.round({'Frequency': 2})
    temp = temp.sort_values('Frequency', ascending = False).reset_index(drop = True)
    print(temp.head(num_top_venues))
    print('\n')

-----Bayonne-----
                  Venue  Frequency
0              Pharmacy       0.05
1            Bagel Shop       0.03
2        Sandwich Place       0.03
3    Italian Restaurant       0.03
4  Fast Food Restaurant       0.03


-----East Newark-----
            Venue  Frequency
0             Bar       0.08
1  Sandwich Place       0.08
2             Pub       0.05
3     Pizza Place       0.05
4  Discount Store       0.05


-----Guttenberg-----
                       Venue  Frequency
0  Latin American Restaurant       0.11
1                Pizza Place       0.10
2                       Park       0.07
3           Cuban Restaurant       0.07
4                     Bakery       0.05


-----Harrison-----
                Venue  Frequency
0  Chinese Restaurant       0.07
1         Pizza Place       0.07
2          Donut Shop       0.05
3            Pharmacy       0.05
4               Diner       0.03


-----Hoboken-----
                 Venue  Frequency
0               Bakery       0.07
1   

Let's put that into a pandas dataframe

In [90]:
num_top_venues = 10
col_list = ['Municipality']
indicators = ['st', 'nd', 'rd']
for i in range(num_top_venues):
    try:
        col_list.append('{}{} Most Common Venue'.format(i+1,indicators[i]))
    except:
        col_list.append('{}th Most Common Venue'.format(i+1))
        
# create a new dataframe
hud_venues_sorted = pd.DataFrame(columns=col_list)
hud_venues_sorted['Municipality'] = hud_grouped['Municipality']
hud_venues_sorted

Unnamed: 0,Municipality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bayonne,,,,,,,,,,
1,East Newark,,,,,,,,,,
2,Guttenberg,,,,,,,,,,
3,Harrison,,,,,,,,,,
4,Hoboken,,,,,,,,,,
5,Jersey City,,,,,,,,,,
6,Kearny,,,,,,,,,,
7,North Bergen,,,,,,,,,,
8,Secaucus,,,,,,,,,,
9,Union City,,,,,,,,,,


In [103]:
for i in range(hud_grouped.shape[0]):
    venues = hud_grouped.iloc[i, 1:].sort_values(ascending = False).index.values[:num_top_venues]
    hud_venues_sorted.iloc[i, 1:] = venues
hud_venues_sorted

Unnamed: 0,Municipality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bayonne,Pharmacy,American Restaurant,Deli / Bodega,Sandwich Place,Mobile Phone Shop,Italian Restaurant,Fast Food Restaurant,Bank,Bagel Shop,Cosmetics Shop
1,East Newark,Bar,Sandwich Place,Italian Restaurant,Pizza Place,Discount Store,Train Station,Pub,Chinese Restaurant,Lounge,South American Restaurant
2,Guttenberg,Latin American Restaurant,Pizza Place,Park,Cuban Restaurant,Bakery,Bank,Scenic Lookout,Brazilian Restaurant,Gym,Convenience Store
3,Harrison,Chinese Restaurant,Pizza Place,Donut Shop,Pharmacy,Fast Food Restaurant,Park,Bar,Bakery,Coffee Shop,Diner
4,Hoboken,Bakery,Park,Pizza Place,American Restaurant,Deli / Bodega,Italian Restaurant,Cuban Restaurant,Pet Store,Gym,Ice Cream Shop
5,Jersey City,Sandwich Place,Italian Restaurant,Fast Food Restaurant,Bar,Bagel Shop,Café,Chinese Restaurant,Filipino Restaurant,American Restaurant,Pizza Place
6,Kearny,Pharmacy,Deli / Bodega,Park,Bakery,Burger Joint,Chinese Restaurant,Bubble Tea Shop,Spa,Liquor Store,Shipping Store
7,North Bergen,Bakery,Fast Food Restaurant,Latin American Restaurant,Pizza Place,Convenience Store,Mexican Restaurant,Chinese Restaurant,Hardware Store,Donut Shop,Italian Restaurant
8,Secaucus,Deli / Bodega,Sandwich Place,Pizza Place,Department Store,Italian Restaurant,Bank,Park,Shopping Mall,Clothing Store,Bagel Shop
9,Union City,Pizza Place,Cuban Restaurant,Bank,Donut Shop,Bakery,Latin American Restaurant,Mexican Restaurant,Japanese Restaurant,Colombian Restaurant,Convenience Store


4. Cluster Neighborhoods

Run k-means to cluster the neighborhood into 3 clusters.

In [108]:
hud_cluster = hud_grouped.drop('Municipality', axis = 1)
hud_cluster
kclusters = 3
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(hud_cluster)
kmeans.labels_

array([0, 2, 1, 2, 0, 2, 0, 1, 0, 1, 1, 1], dtype=int32)

In [110]:
hud_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)


ValueError: cannot insert Cluster Labels, already exists

In [113]:
hud_merged = hud_data.join(hud_venues_sorted.set_index('Municipality'), on = 'Municipality')
hud_merged

Unnamed: 0,Municipality,County,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Jersey City,Hudson,40.728158,-74.077642,2,Sandwich Place,Italian Restaurant,Fast Food Restaurant,Bar,Bagel Shop,Café,Chinese Restaurant,Filipino Restaurant,American Restaurant,Pizza Place
1,Union City,Hudson,40.779545,-74.023751,1,Pizza Place,Cuban Restaurant,Bank,Donut Shop,Bakery,Latin American Restaurant,Mexican Restaurant,Japanese Restaurant,Colombian Restaurant,Convenience Store
2,Bayonne,Hudson,40.668714,-74.114309,0,Pharmacy,American Restaurant,Deli / Bodega,Sandwich Place,Mobile Phone Shop,Italian Restaurant,Fast Food Restaurant,Bank,Bagel Shop,Cosmetics Shop
3,North Bergen,Hudson,40.804267,-74.012084,1,Bakery,Fast Food Restaurant,Latin American Restaurant,Pizza Place,Convenience Store,Mexican Restaurant,Chinese Restaurant,Hardware Store,Donut Shop,Italian Restaurant
4,Hoboken,Hudson,40.743307,-74.032375,0,Bakery,Park,Pizza Place,American Restaurant,Deli / Bodega,Italian Restaurant,Cuban Restaurant,Pet Store,Gym,Ice Cream Shop
5,West New York,Hudson,40.785529,-74.0083,1,Cuban Restaurant,Latin American Restaurant,Pizza Place,Ice Cream Shop,Pharmacy,Park,Japanese Restaurant,Sandwich Place,Mobile Phone Shop,Italian Restaurant
6,Kearny,Hudson,40.768434,-74.145421,0,Pharmacy,Deli / Bodega,Park,Bakery,Burger Joint,Chinese Restaurant,Bubble Tea Shop,Spa,Liquor Store,Shipping Store
7,Secaucus,Hudson,40.789929,-74.056674,0,Deli / Bodega,Sandwich Place,Pizza Place,Department Store,Italian Restaurant,Bank,Park,Shopping Mall,Clothing Store,Bagel Shop
8,Harrison,Hudson,40.74649,-74.156255,2,Chinese Restaurant,Pizza Place,Donut Shop,Pharmacy,Fast Food Restaurant,Park,Bar,Bakery,Coffee Shop,Diner
9,Weehawken,Hudson,40.769546,-74.020418,1,Pizza Place,Park,Latin American Restaurant,Plaza,Cuban Restaurant,Scenic Lookout,Convenience Store,Brewery,Creperie,Snack Place


In [124]:
# create map
cluster_map = folium.Map(location=[hud_lat, hud_long], zoom_start=11)

# set color scheme for the clusters
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
rainbow = [colors.rgb2hex(i) for i in colors_array] 
rainbow

['#8000ff', '#80ffb4', '#ff0000']

In [129]:
# add markers to the map
markers_colors = []
for lat, lng, city, cluster in zip(hud_merged['Latitude'], hud_merged['Longitude'], hud_merged['Municipality'], hud_merged['Cluster Labels']):
    label = folium.Popup(city + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(cluster_map)
       
cluster_map

5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

Cluster 0

In [142]:
cluster_col = hud_merged.columns[[0]+list(range(5, hud_merged.shape[1]))]
cluster = hud_merged.loc[hud_merged['Cluster Labels'] == 0, cluster_col]
cluster

Unnamed: 0,Municipality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Bayonne,Pharmacy,American Restaurant,Deli / Bodega,Sandwich Place,Mobile Phone Shop,Italian Restaurant,Fast Food Restaurant,Bank,Bagel Shop,Cosmetics Shop
4,Hoboken,Bakery,Park,Pizza Place,American Restaurant,Deli / Bodega,Italian Restaurant,Cuban Restaurant,Pet Store,Gym,Ice Cream Shop
6,Kearny,Pharmacy,Deli / Bodega,Park,Bakery,Burger Joint,Chinese Restaurant,Bubble Tea Shop,Spa,Liquor Store,Shipping Store
7,Secaucus,Deli / Bodega,Sandwich Place,Pizza Place,Department Store,Italian Restaurant,Bank,Park,Shopping Mall,Clothing Store,Bagel Shop


Cluster 1

In [143]:
cluster_col = hud_merged.columns[[0]+list(range(5, hud_merged.shape[1]))]
cluster = hud_merged.loc[hud_merged['Cluster Labels'] == 1, cluster_col]
cluster

Unnamed: 0,Municipality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Union City,Pizza Place,Cuban Restaurant,Bank,Donut Shop,Bakery,Latin American Restaurant,Mexican Restaurant,Japanese Restaurant,Colombian Restaurant,Convenience Store
3,North Bergen,Bakery,Fast Food Restaurant,Latin American Restaurant,Pizza Place,Convenience Store,Mexican Restaurant,Chinese Restaurant,Hardware Store,Donut Shop,Italian Restaurant
5,West New York,Cuban Restaurant,Latin American Restaurant,Pizza Place,Ice Cream Shop,Pharmacy,Park,Japanese Restaurant,Sandwich Place,Mobile Phone Shop,Italian Restaurant
9,Weehawken,Pizza Place,Park,Latin American Restaurant,Plaza,Cuban Restaurant,Scenic Lookout,Convenience Store,Brewery,Creperie,Snack Place
10,Guttenberg,Latin American Restaurant,Pizza Place,Park,Cuban Restaurant,Bakery,Bank,Scenic Lookout,Brazilian Restaurant,Gym,Convenience Store


Cluster 2

In [144]:
cluster_col = hud_merged.columns[[0]+list(range(5, hud_merged.shape[1]))]
cluster = hud_merged.loc[hud_merged['Cluster Labels'] == 2, cluster_col]
cluster

Unnamed: 0,Municipality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Jersey City,Sandwich Place,Italian Restaurant,Fast Food Restaurant,Bar,Bagel Shop,Café,Chinese Restaurant,Filipino Restaurant,American Restaurant,Pizza Place
8,Harrison,Chinese Restaurant,Pizza Place,Donut Shop,Pharmacy,Fast Food Restaurant,Park,Bar,Bakery,Coffee Shop,Diner
11,East Newark,Bar,Sandwich Place,Italian Restaurant,Pizza Place,Discount Store,Train Station,Pub,Chinese Restaurant,Lounge,South American Restaurant
