# Outline
Part 1 Geographic and Demographic Data
1. Build dependencies
2. Load Chicago geographic and demographic data
3. Data cleaning

Part 2 Explore Neighborhoods in Chicago
1. Visualize Chicago Neighborhoods
2. Explore first neighborhood's restaurants
3. Repeat the process to explore all neighborhoods in Chicago
4. Feature Engineering
5. Cluster Neighborhoods with restaurant types¶

Part 3 Analysis and Summary

# Part 1  Geographic and Demographic Data

### 1. Build Dependencies


In [49]:
#!pip install geocoder
#!pip install simpledbf

In [1]:
import datetime
import geocoder
import numpy as np
import pandas as pd

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# library to read dbf files and convert them into pandas dataframe
from simpledbf import Dbf5

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### 2. Load Chicago  geographic and demographic data
The data source is file 'hoods3155lite.dbf' downloaded from 'Stanford Digital Repository'(https://purl.stanford.edu/xq082nw3443).
It contains major US cities neighborhoods, latitudes, longitudes, average and medium household income and so on.

In [4]:
dbf = Dbf5('hoods3155lite.dbf')
df = dbf.to_dataframe()
df.head()

Unnamed: 0,STATE,CITY,NAME,REGIONID,SHAPE_LENG,SHAPE_AREA,X,Y,REGION_ID,LA_CITY,...,MEDHINC_CY,MEDFINC_CY,AVGFINC_CY,MEDVAL_CY,DG_STD,DP_STD,CENTROID,NEW_GEOM,PAVED,CITY_PARKS
0,CA,Long Beach,Airport Area,272732.0,17308.184793,8359173.0,-118.154496,33.8167,272732,0,...,73346,72176,85057,413713,200.28,106.79,0101000020E61000002081ED32B6895DC03F66F88377E8...,,0.161726,0.129
1,CA,Long Beach,Alamitos Heights,272737.0,4385.328031,862308.2,-118.125871,33.7738,272737,0,...,83046,98472,128898,606807,93.71,54.88,0101000020E610000057FA0EF819885DC0E8600C60FCE2...,,0.102917,0.129
2,CA,Long Beach,Belmont Heights,272933.0,7952.049738,2498741.0,-118.151191,33.7639,272933,0,...,66667,87687,103958,516184,254.38,156.95,0101000020E6100000C4867E9CB4895DC08BAC4825D6E1...,,0.296049,0.129
3,CA,Long Beach,Belmont Shore,113713.0,5727.509526,1621976.0,-118.137396,33.7589,113713,0,...,83425,99794,123601,734805,212.84,176.14,0101000020E61000002780304CBE885DC0CA50820926E1...,,0.352219,0.129
4,CA,Long Beach,Bixby Area,272968.0,11518.915038,4495448.0,-118.176421,33.8405,272968,0,...,62332,67868,80716,357036,134.62,111.01,0101000020E610000056B56193288B5DC0B671DB4A8BEB...,,0.210809,0.129


In [33]:
df.columns

Index(['STATE', 'CITY', 'NAME', 'REGIONID', 'SHAPE_LENG', 'SHAPE_AREA', 'X',
       'Y', 'REGION_ID', 'LA_CITY', 'REGIONID_1', 'DG_N', 'DG_NINV', 'DP_N',
       'DP_NINV', 'PCTPARK_N', 'MEANEQ_N', 'DG_MEAN', 'DP_MEAN', 'PCT_PARK',
       'MEAN_EQ', 'YOUNGFOLKS', 'POPDENSITY', 'DIVERSITY', 'PC_INCOME',
       'AVG_HINC', 'AVG_HVAL', 'PCT_OWN', 'PCT_RENT', 'PCT_WHITE',
       'PCT_HISPAN', 'PCT_BLACK', 'MEDAGE_CY', 'UNEMPRT_CY', 'MEDHINC_CY',
       'MEDFINC_CY', 'AVGFINC_CY', 'MEDVAL_CY', 'DG_STD', 'DP_STD', 'CENTROID',
       'NEW_GEOM', 'PAVED', 'CITY_PARKS'],
      dtype='object')

### 3. Data cleaning
For the purpose of this project, we will focus on the city of Chicago.
We will keep and rename relevant features and discard the others.
The features to keep are:
* 'NAME': name of neighborhood
* 'X': longitude
* 'Y': latitude
* 'POPDENSITY': population density
* 'DIVERSITY': diversity index
* 'PC_INCOME': per capita income
* 'MEDAGE_CY': median age
* 'UNEMPRT_CY': unemployment pct 2010
* 'MEDVAL_CY': median home value

In [50]:
# keep chicago and desired features
df_chicago = df[df['CITY']=='Chicago']
df_chicago = df_chicago[['NAME', 'X', 'Y', 'POPDENSITY', 'DIVERSITY', 'PC_INCOME','MEDAGE_CY', 'UNEMPRT_CY', 'MEDVAL_CY']]

# change column names to more familiar ones
df_chicago.columns = ['Neighborhood','Longitude','Latitude','POPDENSITY', 'DIVERSITY', 'PC_INCOME','MEDAGE_CY', 'UNEMPRT_CY', 'MEDVAL_CY']
df_chicago.reset_index(drop=True,inplace=True)
print(df_chicago.shape)
df_chicago.head()

(77, 9)


Unnamed: 0,Neighborhood,Longitude,Latitude,POPDENSITY,DIVERSITY,PC_INCOME,MEDAGE_CY,UNEMPRT_CY,MEDVAL_CY
0,Chatham,-87.616624,41.7385,12681.364151,7.839623,23599,40.60566,19.645283,126001
1,North Center,-87.684523,41.9473,17814.894231,65.521154,39612,35.211538,10.348077,405797
2,O'hare,-87.847436,41.9633,6591.975,34.5,28952,44.06,9.175,247836
3,Washington Park,-87.61758,41.7916,13106.814286,11.428571,14888,29.214286,35.517857,194914
4,Garfield Ridge,-87.766976,41.7997,9271.435,51.655,22925,40.62,12.911667,168838


# Part 2 Explore Neighborhoods in Chicago

### 1. Visualize Chicago Neighborhoods

In [51]:
# find the geograpical coordinate of Chicago
longitude = -87.6298
latitude = 41.8781

In [52]:
# create map of Chicago using latitude and longitude values
map_chicago = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(df_chicago['Latitude'], df_chicago['Longitude'], df_chicago['Neighborhood']): 
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  
    
map_chicago

## 2. Explore first neighborhood's restaurants

In [53]:
# define Foursquare credentials and version
CLIENT_ID = '1FZ5D3IFSLKIRBG42HEPKV10T5IOGKIO3NGTUZWBMYLKT2CX' # your Foursquare ID
CLIENT_SECRET = 'TBGVXN3CCJHJDVV2TJQWIVMOO1AVZLBDYZC3W3JAI2UR1JN4' # your Foursquare Secret
VERSION = (datetime.date.today()-datetime.timedelta(1)).strftime("%Y%m%d") # Foursquare API version: yesterday

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET: ' + CLIENT_SECRET)
print('VERSION: ' + VERSION)

Your credentails:
CLIENT_ID: 1FZ5D3IFSLKIRBG42HEPKV10T5IOGKIO3NGTUZWBMYLKT2CX
CLIENT_SECRET: TBGVXN3CCJHJDVV2TJQWIVMOO1AVZLBDYZC3W3JAI2UR1JN4
VERSION: 20191224


In [55]:
# Let's explore the first neighborhood in our dataframe
neighborhood_latitude = df_chicago.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_chicago.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_chicago.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Chatham are 41.7385, -87.6166239583.


In [21]:
# Now, let's get the top 100 food places that are in Chatham within a radius of 1000 meters.
# First, let's create the GET request URL.
radius = 1000
LIMIT = 100
# the id for food category, this was found in FourSquare website, when a request contains the category, it only fetches venues within
# this specific category
FOODID = '4d4b7105d754a06374d81259' 
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&categoryId={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, FOODID,LIMIT)
# send the get request
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e02cfbc9da7ee001ccceadf'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Chatham',
  'headerFullLocation': 'Chatham, Chicago',
  'headerLocationGranularity': 'neighborhood',
  'query': 'food',
  'totalResults': 29,
  'suggestedBounds': {'ne': {'lat': 41.747500009000014,
    'lng': -87.60458521108869},
   'sw': {'lat': 41.72949999099999, 'lng': -87.6286627055113}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bfd96d74cf820a16eb0ecf4',
       'name': "Dunkin'",
       'location': {'address': '448 E 87th St',
        'crossStreet': 'at Cottage Grove',
        'lat': 41.73674056104464,
        'lng': -87.61256230679408,
        'labeledLatLng

In [56]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [57]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Dunkin',Donut Shop,41.736741,-87.612562
1,Garrett Popcorn Shops,Snack Place,41.736535,-87.605829
2,Mather's More than a Café,Café,41.743548,-87.623089
3,Kam's Chop Suey,Chinese Restaurant,41.74333,-87.623933
4,Chipotle Mexican Grill,Fast Food Restaurant,41.735792,-87.625955


### 3. Repeat the process to explore all neighborhoods in Chicago
Let's create a function to repeat the same process to all the neighborhoods in Chicago

In [24]:
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry



def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    i = 0
    
    session = requests.Session()
    retry = Retry(connect=3, backoff_factor=0.5)
    adapter = HTTPAdapter(max_retries=retry)
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(str(i)+" "+name)
        i += 1
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            FOODID,
            LIMIT)
            
        # make the GET request
        results = session.get(url).json()["response"]['groups'][0]['items']#requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [30]:
chicago_venues = getNearbyVenues(names=df_chicago['Neighborhood'],
                                   latitudes=df_chicago['Latitude'],
                                   longitudes=df_chicago['Longitude'],
                                   radius=1000)
print(chicago_venues.shape)
chicago_venues.head()

0 Chatham
1 North Center
2 O'hare
3 Washington Park
4 Garfield Ridge
5 Beverly
6 Ashburn
7 Forest Glen
8 Edison Park
9 Riverdale
10 Woodlawn
11 Lincoln Park
12 Englewood
13 Jefferson Park
14 Hermosa
15 Loop
16 Morgan Park
17 Near South Side
18 South Chicago
19 North Lawndale
20 Mount Greenwood
21 Fuller Park
22 Uptown
23 North Park
24 Dunning
25 East Side
26 Pottage Park
27 Edgewater
28 West Garfield Park
29 Roseland
30 Irving Park
31 West Englewood
32 Auburn Gresham
33 Chicago Lawn
34 West Town
35 Bridgeport
36 Hegewisch
37 Lower West Side
38 Brighton Park
39 Avalon Park
40 Burnside
41 South Shore
42 Belmont Cragin
43 New City
44 Oakland
45 Washington Heights
46 Albany Park
47 West Ridge
48 Douglas
49 West Pullman
50 Rogers Park
51 Pullman
52 Austin
53 Logan Square
54 Archer Heights
55 Near North Side
56 Lake View
57 West Lawn
58 West Elsdon
59 Avondale
60 Armour Square
61 Kenwood
62 Clearing
63 South Deering
64 Near West Side
65 Calumet Heights
66 Grand Boulevard
67 Norwood Park
68 H

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Chatham,41.7385,-87.616624,Dunkin',41.736741,-87.612562,Donut Shop
1,Chatham,41.7385,-87.616624,Garrett Popcorn Shops,41.736535,-87.605829,Snack Place
2,Chatham,41.7385,-87.616624,Mather's More than a Café,41.743548,-87.623089,Café
3,Chatham,41.7385,-87.616624,Kam's Chop Suey,41.74333,-87.623933,Chinese Restaurant
4,Chatham,41.7385,-87.616624,Chipotle Mexican Grill,41.735792,-87.625955,Fast Food Restaurant


In [59]:
# save data to chicago_venues.pkl, when we continue to conduct analysis, we don't need to repeat calls to FourSquare API again
chicago_venues.to_pickle("chicago_venues.pkl")

In [101]:
chicago_venues = pd.read_pickle("chicago_venues.pkl")
chicago_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Chatham,41.7385,-87.616624,Dunkin',41.736741,-87.612562,Donut Shop
1,Chatham,41.7385,-87.616624,Garrett Popcorn Shops,41.736535,-87.605829,Snack Place
2,Chatham,41.7385,-87.616624,Mather's More than a Café,41.743548,-87.623089,Café
3,Chatham,41.7385,-87.616624,Kam's Chop Suey,41.74333,-87.623933,Chinese Restaurant
4,Chatham,41.7385,-87.616624,Chipotle Mexican Grill,41.735792,-87.625955,Fast Food Restaurant


In [102]:
# check how many venues returned for each neighborhood
print(chicago_venues.groupby('Neighborhood')['Venue'].count())
# how many unique categories were found
print(len(chicago_venues["Venue Category"].unique()))

Neighborhood
Albany Park            64
Archer Heights         24
Armour Square          62
Ashburn                14
Auburn Gresham         21
Austin                 32
Avalon Park            28
Avondale               81
Belmont Cragin         28
Beverly                32
Bridgeport             63
Brighton Park          27
Burnside               14
Calumet Heights        13
Chatham                29
Chicago Lawn           24
Clearing               13
Douglas                30
Dunning                33
East Garfield Park     17
East Side              20
Edgewater             100
Edison Park            29
Englewood              15
Forest Glen            15
Fuller Park            15
Gage Park              22
Garfield Ridge         22
Grand Boulevard        29
Grand Crossing         25
                     ... 
Mount Greenwood        10
Near North Side       100
Near South Side        70
Near West Side        100
New City               34
North Center           62
North Lawndale         18

We can see from above that Chicago has 77 neighborhoods, while we only found restaurants from FourSquare for 76 of them.
For the purpose of our project, we skip the neighborhood with no restaurants and delete it from df_chicago.

In [108]:
mask = df_chicago['Neighborhood'].isin(chicago_venues['Neighborhood'].unique())
print(df_chicago[~mask])
df_chicago = df_chicago[mask]
print(df_chicago.shape)

     Neighborhood  Longitude  Latitude  POPDENSITY  DIVERSITY  PC_INCOME  \
63  South Deering -87.571007   41.6911      6769.5  55.694595      17307   

    MEDAGE_CY  UNEMPRT_CY  MEDVAL_CY  
63  34.197297   19.316216     106193  
(76, 9)


In [62]:
chicago_venues["Venue Category"].unique()

array(['Donut Shop', 'Snack Place', 'Café', 'Chinese Restaurant',
       'Fast Food Restaurant', 'Wings Joint', 'Seafood Restaurant',
       'Sandwich Place', 'Food', 'Fried Chicken Joint', 'Gastropub',
       'Steakhouse', 'Bakery', 'Breakfast Spot', 'Pizza Place',
       'Mediterranean Restaurant', 'Sushi Restaurant',
       'French Restaurant', 'German Restaurant', 'Cuban Restaurant',
       'Thai Restaurant', 'Burger Joint', 'Italian Restaurant',
       'Middle Eastern Restaurant', 'Burrito Place', 'Mexican Restaurant',
       'American Restaurant', 'Greek Restaurant',
       'Latin American Restaurant', 'Vegetarian / Vegan Restaurant',
       'Diner', 'Bistro', 'Asian Restaurant', 'Irish Pub',
       'Vietnamese Restaurant', 'Restaurant', 'Noodle House',
       'Bagel Shop', 'Dumpling Restaurant', 'Food Truck', 'Deli / Bodega',
       'Hot Dog Joint', 'Eastern European Restaurant', 'BBQ Joint',
       'Caribbean Restaurant', 'Indian Restaurant', 'Soup Place',
       'Japanese Rest

### 4. Feature Engineering

convert the dataframe with one hot encoding and group venues by neighborhood

In [63]:
# one hot encoding
chicago_onehot = pd.get_dummies(chicago_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
chicago_onehot['Neighborhood'] = chicago_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [chicago_onehot.columns[-1]] + list(chicago_onehot.columns[:-1])
chicago_onehot = chicago_onehot[fixed_columns]

print(chicago_onehot.shape)
chicago_onehot.head()

(2870, 97)


Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,...,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Chatham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Chatham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Chatham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Chatham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Chatham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [64]:
# group venues by neighborhood, take the mean of each category
chicago_grouped = chicago_onehot.groupby(["Neighborhood"]).mean().reset_index()
print(chicago_grouped.shape)
chicago_grouped.head()

(76, 97)


Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,...,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Albany Park,0.0,0.0,0.03125,0.0,0.0,0.03125,0.03125,0.0,0.03125,...,0.046875,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125
1,Archer Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,...,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667
2,Armour Square,0.0,0.0,0.064516,0.0,0.0,0.048387,0.0,0.016129,0.048387,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.016129
3,Ashburn,0.0,0.0,0.142857,0.0,0.0,0.0,0.071429,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429
4,Auburn Gresham,0.0,0.0,0.095238,0.0,0.0,0.0,0.047619,0.0,0.047619,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


select top 10 venus for each neighborhood and put them into a new dataframe

In [109]:
# a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = chicago_grouped['Neighborhood']

for ind in np.arange(chicago_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(chicago_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albany Park,Mexican Restaurant,Pizza Place,Chinese Restaurant,Korean Restaurant,Sandwich Place,Donut Shop,Fast Food Restaurant,Taco Place,Fried Chicken Joint,Wings Joint
1,Archer Heights,Mexican Restaurant,Pizza Place,Fast Food Restaurant,Food,Seafood Restaurant,Bakery,Chinese Restaurant,Restaurant,Sandwich Place,Donut Shop
2,Armour Square,Chinese Restaurant,Pizza Place,American Restaurant,Asian Restaurant,Bakery,Szechuan Restaurant,Sandwich Place,Mexican Restaurant,Seafood Restaurant,Italian Restaurant
3,Ashburn,American Restaurant,Fast Food Restaurant,Fried Chicken Joint,Wings Joint,Seafood Restaurant,BBQ Joint,Chinese Restaurant,Italian Restaurant,Mexican Restaurant,Pizza Place
4,Auburn Gresham,Fast Food Restaurant,Food,American Restaurant,Bakery,Greek Restaurant,Seafood Restaurant,Mexican Restaurant,Southern / Soul Food Restaurant,Chinese Restaurant,Dim Sum Restaurant


In [110]:
neighborhoods_venues_sorted.shape

(76, 11)

###  5. Cluster Neighborhoods with restaurant types

run k means to cluster the neighborhoods into 5 clusters

In [111]:
# set number of clusters
kclusters = 5

chicago_grouped_clustering = chicago_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(chicago_grouped_clustering)

# check cluster labels generated
print(kmeans.labels_)

[2 2 1 0 4 0 3 1 2 1 1 2 4 3 3 2 1 3 1 4 2 1 1 3 1 3 2 1 0 0 2 2 0 1 1 1 1
 1 1 1 1 1 2 1 1 0 1 1 1 1 2 1 4 1 1 1 1 1 4 4 1 3 2 2 1 1 0 3 2 3 3 2 1 1
 1 3]


Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [163]:
# add clustering labels to venues df
#neighborhoods_venues_sorted.drop('Cluster Labels 1',axis=1,inplace=True)
neighborhoods_venues_sorted.insert(0, 'Cluster Labels 1', kmeans.labels_)

#merge chicago_grouped with df_chicago to add latitude/longitude for each neighborhood
chicago_merged = df_chicago.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
chicago_merged.head() # check the last columns!

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,Neighborhood,Longitude,Latitude,POPDENSITY,DIVERSITY,PC_INCOME,MEDAGE_CY,UNEMPRT_CY,MEDVAL_CY,Cluster Labels 1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Chatham,-87.616624,41.7385,12681.364151,7.839623,23599,40.60566,19.645283,126001,3,Fast Food Restaurant,Chinese Restaurant,Sandwich Place,Fried Chicken Joint,Donut Shop,Wings Joint,Breakfast Spot,Food,Pizza Place,Café
1,North Center,-87.684523,41.9473,17814.894231,65.521154,39612,35.211538,10.348077,405797,1,Pizza Place,Mexican Restaurant,Sandwich Place,American Restaurant,Fast Food Restaurant,Donut Shop,Sushi Restaurant,Bakery,Mediterranean Restaurant,Latin American Restaurant
2,O'hare,-87.847436,41.9633,6591.975,34.5,28952,44.06,9.175,247836,1,Pizza Place,Chinese Restaurant,Café,Mediterranean Restaurant,Bakery,Dumpling Restaurant,Seafood Restaurant,Italian Restaurant,Sushi Restaurant,Fast Food Restaurant
3,Washington Park,-87.61758,41.7916,13106.814286,11.428571,14888,29.214286,35.517857,194914,3,Fast Food Restaurant,Pizza Place,Fried Chicken Joint,Breakfast Spot,Food Truck,Donut Shop,Deli / Bodega,Food Court,Empanada Restaurant,Dim Sum Restaurant
4,Garfield Ridge,-87.766976,41.7997,9271.435,51.655,22925,40.62,12.911667,168838,1,Pizza Place,Hot Dog Joint,American Restaurant,Café,Fast Food Restaurant,Sandwich Place,Breakfast Spot,Chinese Restaurant,Donut Shop,Eastern European Restaurant


Finally, let's visualize the resulting clusters

In [164]:
# create map
map_clusters1 = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chicago_merged['Latitude'], chicago_merged['Longitude'], chicago_merged['Neighborhood'], chicago_merged['Cluster Labels 1']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters1)
       
map_clusters1

# Part 3 Analysis and Summary

To analyze the tastes, we will 
1. calculate the average population density, diversity, per capita income, medium age, unemployment rate and medium home value
2. display the 10 hottest restaurant types for each neighborhood.

In [169]:
chicago_merged.groupby('Cluster Labels 1').mean().drop(['Longitude','Latitude'],axis=1)

Unnamed: 0_level_0,POPDENSITY,DIVERSITY,PC_INCOME,MEDAGE_CY,UNEMPRT_CY,MEDVAL_CY
Cluster Labels 1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,13635.509258,27.751672,19533.857143,34.905602,23.570329,151878.857143
1,17327.022482,53.640663,30281.918919,36.606176,13.15922,256659.810811
2,17298.096562,73.360604,17349.4,30.85133,18.338592,154517.0
3,12611.215708,14.798024,17675.363636,34.328857,27.660768,135440.0
4,10574.164968,20.430978,16521.0,32.565829,27.068369,119362.666667


In [166]:
# the column number of ten most popular food types
displayCols = [0] + list(range(chicago_merged.shape[1]-10, chicago_merged.shape[1]))
displayCols

[0, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Cluster 0

These neighborhoods have medium demographic statistic numbers.

As for their taste, their favorite is fast food such as fried chicken, burger and donut.

In [167]:
chicago_merged.loc[chicago_merged['Cluster Labels 1'] == 0, chicago_merged.columns[displayCols]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Ashburn,American Restaurant,Fast Food Restaurant,Fried Chicken Joint,Wings Joint,Seafood Restaurant,BBQ Joint,Chinese Restaurant,Italian Restaurant,Mexican Restaurant,Pizza Place
16,Morgan Park,BBQ Joint,Fast Food Restaurant,Burger Joint,Pizza Place,American Restaurant,Mexican Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
45,Washington Heights,Sandwich Place,Fried Chicken Joint,American Restaurant,BBQ Joint,Donut Shop,Food,Fast Food Restaurant,Chinese Restaurant,Ethiopian Restaurant,Dim Sum Restaurant
52,Austin,Food,Donut Shop,Sandwich Place,Seafood Restaurant,American Restaurant,BBQ Joint,Breakfast Spot,Chinese Restaurant,Fast Food Restaurant,Fried Chicken Joint
66,Grand Boulevard,Fast Food Restaurant,BBQ Joint,American Restaurant,Fried Chicken Joint,Deli / Bodega,Seafood Restaurant,Wings Joint,Burger Joint,Food,Mexican Restaurant
68,Humboldt Park,Fast Food Restaurant,Food,Wings Joint,Donut Shop,Latin American Restaurant,Seafood Restaurant,American Restaurant,BBQ Joint,Chinese Restaurant,Fried Chicken Joint
76,Grand Crossing,American Restaurant,Fast Food Restaurant,Seafood Restaurant,Fried Chicken Joint,BBQ Joint,Pizza Place,Vegetarian / Vegan Restaurant,Chinese Restaurant,Bakery,Food


Cluster 1

These neighborhoods have the highest population density, highest per capita income, highest medium age, lowest unemployment rate and highest medium home value. They are the richest.

As for their taste, they are very diversified with pizza places, Chinese, American and Japanese cuisines among the most popular. Fast food seems not in their consideration at all.

In [170]:
chicago_merged.loc[chicago_merged['Cluster Labels 1'] == 1, chicago_merged.columns[displayCols]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North Center,Pizza Place,Mexican Restaurant,Sandwich Place,American Restaurant,Fast Food Restaurant,Donut Shop,Sushi Restaurant,Bakery,Mediterranean Restaurant,Latin American Restaurant
2,O'hare,Pizza Place,Chinese Restaurant,Café,Mediterranean Restaurant,Bakery,Dumpling Restaurant,Seafood Restaurant,Italian Restaurant,Sushi Restaurant,Fast Food Restaurant
4,Garfield Ridge,Pizza Place,Hot Dog Joint,American Restaurant,Café,Fast Food Restaurant,Sandwich Place,Breakfast Spot,Chinese Restaurant,Donut Shop,Eastern European Restaurant
5,Beverly,Pizza Place,Sandwich Place,Bakery,Burger Joint,Fried Chicken Joint,Chinese Restaurant,Italian Restaurant,Hot Dog Joint,Donut Shop,Caribbean Restaurant
7,Forest Glen,Fast Food Restaurant,Restaurant,Sandwich Place,Indian Restaurant,Diner,Café,Italian Restaurant,Pizza Place,Asian Restaurant,Thai Restaurant
8,Edison Park,Italian Restaurant,Pizza Place,Mexican Restaurant,Chinese Restaurant,American Restaurant,Bakery,Soup Place,Greek Restaurant,French Restaurant,Food
11,Lincoln Park,American Restaurant,Mexican Restaurant,Pizza Place,Sushi Restaurant,Italian Restaurant,Sandwich Place,Café,Donut Shop,Fried Chicken Joint,Hot Dog Joint
13,Jefferson Park,Pizza Place,Chinese Restaurant,Mexican Restaurant,Greek Restaurant,American Restaurant,Deli / Bodega,Fast Food Restaurant,Seafood Restaurant,Restaurant,Burger Joint
15,Loop,Sandwich Place,Italian Restaurant,Pizza Place,Donut Shop,American Restaurant,Café,Mediterranean Restaurant,Snack Place,Poke Place,Food Truck
17,Near South Side,American Restaurant,Food Court,Fast Food Restaurant,Italian Restaurant,Pizza Place,Café,Deli / Bodega,Sandwich Place,Burger Joint,Restaurant


Cluster 2

These neighborhoods have the highest population diversity, youngest residents, second to highest per capita income, second to lowest unemployment rate and second to highest medium home value.

As for their taste, Mexican food seems really popular(ranked 1st for all but one). Besides, pizza, Chinese and taco places are among the most popular. Their passion to fast food seems moderate(ranked about 5th place). I would guess these are neighborhoods where Mexican community mainly resides.

In [171]:
chicago_merged.loc[chicago_merged['Cluster Labels 1'] == 2, chicago_merged.columns[displayCols]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Hermosa,Mexican Restaurant,Fast Food Restaurant,Pizza Place,Taco Place,Food,Diner,Cuban Restaurant,Chinese Restaurant,Burrito Place,Greek Restaurant
18,South Chicago,Mexican Restaurant,Bakery,Pizza Place,American Restaurant,Food,Italian Restaurant,Southern / Soul Food Restaurant,Burger Joint,Ethiopian Restaurant,Dim Sum Restaurant
25,East Side,Mexican Restaurant,Pizza Place,Chinese Restaurant,Sandwich Place,Fast Food Restaurant,Italian Restaurant,Bakery,Taco Place,BBQ Joint,Food
33,Chicago Lawn,Fast Food Restaurant,Mexican Restaurant,Pizza Place,American Restaurant,Sandwich Place,Fish & Chips Shop,Donut Shop,Cafeteria,Breakfast Spot,Latin American Restaurant
36,Hegewisch,Mexican Restaurant,Food Court,Snack Place,Wings Joint,Deli / Bodega,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
37,Lower West Side,Mexican Restaurant,Food,Pizza Place,Food Truck,Bakery,Sandwich Place,Gastropub,Breakfast Spot,Donut Shop,Chinese Restaurant
38,Brighton Park,Mexican Restaurant,Seafood Restaurant,Pizza Place,Donut Shop,Sandwich Place,Taco Place,Fast Food Restaurant,Café,Burger Joint,Breakfast Spot
42,Belmont Cragin,Mexican Restaurant,Donut Shop,Sandwich Place,Chinese Restaurant,Fast Food Restaurant,Burger Joint,Food,Diner,Cuban Restaurant,Restaurant
43,New City,Mexican Restaurant,Pizza Place,Chinese Restaurant,Fast Food Restaurant,Food,Sandwich Place,Bakery,Food Truck,Fried Chicken Joint,American Restaurant
46,Albany Park,Mexican Restaurant,Pizza Place,Chinese Restaurant,Korean Restaurant,Sandwich Place,Donut Shop,Fast Food Restaurant,Taco Place,Fried Chicken Joint,Wings Joint


Cluster 3

These neighborhoods have the lowest population diversity, and highest unemployment rate.

As for their taste, fast food is definitely the No.1 choice. Besides, cafe, sandwich, donut and fried chicken(these are relatively cheaper among all type of food) all seem very popular.

In [172]:
chicago_merged.loc[chicago_merged['Cluster Labels 1'] == 3, chicago_merged.columns[displayCols]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Chatham,Fast Food Restaurant,Chinese Restaurant,Sandwich Place,Fried Chicken Joint,Donut Shop,Wings Joint,Breakfast Spot,Food,Pizza Place,Café
3,Washington Park,Fast Food Restaurant,Pizza Place,Fried Chicken Joint,Breakfast Spot,Food Truck,Donut Shop,Deli / Bodega,Food Court,Empanada Restaurant,Dim Sum Restaurant
10,Woodlawn,Fast Food Restaurant,Pizza Place,Chinese Restaurant,Sandwich Place,BBQ Joint,Food,Café,English Restaurant,Dim Sum Restaurant,Diner
12,Englewood,Fast Food Restaurant,Café,Wings Joint,Food,Mexican Restaurant,Donut Shop,Restaurant,Sandwich Place,Seafood Restaurant,Chinese Restaurant
21,Fuller Park,Fast Food Restaurant,Food,Restaurant,Sandwich Place,Pizza Place,Steakhouse,Bakery,Chinese Restaurant,Fried Chicken Joint,American Restaurant
28,West Garfield Park,Fast Food Restaurant,Food,Fried Chicken Joint,Sandwich Place,Taco Place,Café,Middle Eastern Restaurant,Caribbean Restaurant,Pizza Place,Cafeteria
29,Roseland,Fast Food Restaurant,Fried Chicken Joint,Food,Donut Shop,Chinese Restaurant,Sandwich Place,Fish & Chips Shop,Wings Joint,Empanada Restaurant,Deli / Bodega
31,West Englewood,Fast Food Restaurant,American Restaurant,Sandwich Place,Fried Chicken Joint,Food,Pizza Place,Wings Joint,Food Truck,Food Court,German Restaurant
39,Avalon Park,Fast Food Restaurant,Food,Chinese Restaurant,Burger Joint,Fried Chicken Joint,Diner,Fish & Chips Shop,Pizza Place,Restaurant,Caribbean Restaurant
48,Douglas,Fast Food Restaurant,Sandwich Place,Pizza Place,Wings Joint,Snack Place,Fried Chicken Joint,Café,Restaurant,Donut Shop,Southern / Soul Food Restaurant


Cluster 4

These neighborhoods have the lowest population density, lowest per capita income, and lowest home value.

As for their taste, they don't seem to spare the time differentiate between different types of food, so "food" and fast food are among the most popular. Other fast and cheap options are also under their considerations like hot dog, deli, donut and cafe.

In [173]:
chicago_merged.loc[chicago_merged['Cluster Labels 1'] == 4, chicago_merged.columns[displayCols]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Riverdale,Food,Fast Food Restaurant,Falafel Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
19,North Lawndale,Food,Fast Food Restaurant,Fried Chicken Joint,Hot Dog Joint,Café,Food Truck,Pizza Place,Seafood Restaurant,Restaurant,Bakery
32,Auburn Gresham,Fast Food Restaurant,Food,American Restaurant,Bakery,Greek Restaurant,Seafood Restaurant,Mexican Restaurant,Southern / Soul Food Restaurant,Chinese Restaurant,Dim Sum Restaurant
40,Burnside,Food,Fast Food Restaurant,Seafood Restaurant,Wings Joint,Southern / Soul Food Restaurant,Deli / Bodega,Caribbean Restaurant,Dim Sum Restaurant,Diner,Donut Shop
51,Pullman,Food,American Restaurant,Fried Chicken Joint,Food Court,Wings Joint,Falafel Restaurant,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant
71,East Garfield Park,Food,American Restaurant,Diner,Hot Dog Joint,Bakery,Café,Seafood Restaurant,Pizza Place,Southern / Soul Food Restaurant,Burger Joint
