# Coursera Data Science Capstone - Final Project Report
## Worcester Neighborhoods

### Introduction:

**A description of the background.**

Our family live in Worcester County, MA. My husband has recently got a job in Boylston and his new office is located almost 15 miles away from home. This means his daily travel time is close to 1.5 hours. We are considering moving closer to his office to save time. We currently stay in the neighborhood of the city of Worcester, which is in the west of Worcester County, and his office is in Boylston, which is in the east of Worcester County. We don’t have to stay in Boylston where has a relatively high rent price. But we prefer moving to a neighborhood closer to his workplace and similar to our current neighborhood of Worcester.

**A discussion of the problems.**
1.	Which neighborhoods in Worcester County are good choices for our family to move to?
2.	These candidate neighborhoods should be both close to my husband’s new office and similar to our current living environment.

### Data and Methodology:

**To address this problem, we need:**

1. Geographical Data of Worcester County neighborhoods - latitude and longitude details of the cities in Worcester County 
2. Neighborhoods Environment Data - data describing interests and venues within a given radius

**Sources of Data:**
For the latitude and longitude coordinates of neighborhoods, our primary source of information is the following Wikipedia page: https://en.wikipedia.org/wiki/Worcester_County,_Massachusetts

- This wiki page comprises a list of links to 60 cities/towns neighborhoods. Each link contains the latitude and longitude of that particular location. Hence, we can compile a database of 88 neighborhoods in Worcester County from these wiki pages, using a web-scraping tool i.e. *BeautifulSoup*. 

- For neighborhood environment details, we use the Foursquare database. Since we are interested in the character and composition of each neighborhood, we use the ‘explore’ endpoint to learn about the popular venues in each neighborhood. 

- We will run a unsupervised k-means clustering model to look for similar locations as our current neighborhood.

**Data Preparation:**

In [4]:
import urllib3.request
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
import folium
import os
import requests
import json
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors

In [119]:
wor_df = pd.read_csv('worcester_county.csv')
wor_df.head()

Unnamed: 0,City,Latitude,Longitude
0,Southborough,42.305556,-71.525
1,Bolton,42.433333,-71.608333
2,Harvard,42.5,-71.583333
3,Northborough,42.319444,-71.641667
4,Westborough,42.269444,-71.616667


In [88]:
wor_df.shape

(57, 3)

## Results
### Neighborhoods Clustering

**Build Worcester County map including markers for cities.**

In [143]:
wor_map = folium.Map(location=[42.35, -71.91], zoom_start=10)


# add markers to map
for lat, lng, city in zip(wor_df['Latitude'], wor_df['Longitude'], wor_df['City']):
    label = folium.Popup(city, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.6).add_to(wor_map)  
    
wor_map

In [61]:
# Foursquare ID and Secret are taken from environment variables for security.
CLIENT_ID = "JNF3AFTD51O3PG20RZ5YNAXSH22Y4TSSQTIFJDP2PH22WPTX"
CLIENT_SECRET = "DP3YOWN14C4PIHY3Q2KFFQK05VFVZY0SO5G3VCKY2A1LRSTK"
VERSION = '20180605' # Foursquare API version


In [92]:
neighborhood_latitude = wor_df.loc[53,'Latitude'] # neighborhood latitude value
neighborhood_longitude = wor_df.loc[53,'Longitude'] # neighborhood longitude value

neighborhood_name = wor_df.loc[53,'City'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Worcester are 42.271389, -71.798889.


In [93]:
LIMIT = 100
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude,
    neighborhood_longitude,
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=JNF3AFTD51O3PG20RZ5YNAXSH22Y4TSSQTIFJDP2PH22WPTX&client_secret=DP3YOWN14C4PIHY3Q2KFFQK05VFVZY0SO5G3VCKY2A1LRSTK&v=20180605&ll=42.271389,-71.798889&radius=500&limit=100'

In [66]:
import requests
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c9bbd3a6a607153484f83ec'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Central Business District',
  'headerFullLocation': 'Central Business District, Worcester',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 13,
  'suggestedBounds': {'ne': {'lat': 42.2758890045, 'lng': -71.79281899685492},
   'sw': {'lat': 42.2668889955, 'lng': -71.80495900314509}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '49c931f4f964a5203e581fe3',
       'name': 'Armsby Abbey',
       'location': {'address': '144 Main St',
        'crossStreet': 'at School Street',
        'lat': 42.268748836172506,
      

- **Get venues for every neighborhood.**

In [94]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [95]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
        
        LIMIT = 100 # limit of number of venues returned by Foursquare API
        radius = 500    
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except:
            print("ERROR: ", url)
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [96]:
wor_venues = getNearbyVenues(names=wor_df['City'],
                                   latitudes=wor_df['Latitude'],
                                   longitudes=wor_df['Longitude']
                                  )

print(wor_venues.shape)
wor_venues.head()

(461, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Southborough,42.305556,-71.525,Mauro's,42.30584,-71.524282,Breakfast Spot
1,Southborough,42.305556,-71.525,Southborough House Of Pizza,42.305947,-71.524667,Pizza Place
2,Southborough,42.305556,-71.525,Walgreens Pharmacy at Southboro Medical Group,42.31002,-71.525548,Pharmacy
3,Southborough,42.305556,-71.525,Phaidra Nail Salon,42.305849,-71.524878,Spa
4,Southborough,42.305556,-71.525,Santander Bank,42.306135,-71.526181,Bank


In [97]:
print('There are {} uniques categories.'.format(len(wor_venues['Venue Category'].unique())))

There are 119 uniques categories.


- **Build venue categories dataframe.**
- **Group by neighborhood and calculate mean value for each.**

In [98]:
wor_onehot = pd.get_dummies(wor_venues[['Venue Category']], prefix="", prefix_sep="")
wor_onehot['Neighborhood'] = wor_venues['Neighborhood'] 
wor_onehot.head()

# add neighborhood column back to dataframe and move column to the first column
wor_onehot['Neighborhood'] = wor_venues['Neighborhood'] 
col_index = wor_onehot.columns.tolist().index('Neighborhood')
col_order = [wor_onehot.columns[col_index]] \
                + list(wor_onehot.columns[0:col_index]) \
                + list(wor_onehot.columns[col_index+1:])
wor_onehot = wor_onehot[col_order]
print("categories dataset shape {}".format(wor_onehot.shape))
wor_onehot.head()

categories dataset shape (461, 120)


Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,...,Theater,Thrift / Vintage Store,Toy / Game Store,Train Station,Tree,Video Game Store,Video Store,Vietnamese Restaurant,Weight Loss Center,Women's Store
0,Southborough,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Southborough,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Southborough,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Southborough,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Southborough,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [99]:
wor_grouped = wor_onehot.groupby('Neighborhood').mean().reset_index()
print("categories grouped by neighborhood shape {}".format(wor_grouped.shape))
wor_grouped.head()

categories grouped by neighborhood shape (57, 120)


Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,...,Theater,Thrift / Vintage Store,Toy / Game Store,Train Station,Tree,Video Game Store,Video Store,Vietnamese Restaurant,Weight Loss Center,Women's Store
0,Ashburnham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Athol,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Auburn,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.057143
3,Barre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Berlin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [100]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

- **Build ten top venues dataset.**

In [101]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = wor_grouped['Neighborhood']

for ind in np.arange(wor_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(wor_grouped.iloc[ind, :], num_top_venues)

print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(57, 11)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ashburnham,Pizza Place,Grocery Store,Convenience Store,Coffee Shop,Baseball Field,Liquor Store,Gourmet Shop,Fast Food Restaurant,Discount Store,Dive Bar
1,Athol,Pizza Place,Baseball Field,Breakfast Spot,Chinese Restaurant,Laundromat,Gastropub,Gym,American Restaurant,Doctor's Office,Donut Shop
2,Auburn,Clothing Store,Shoe Store,Kids Store,Women's Store,Jewelry Store,Lingerie Store,Snack Place,Cosmetics Shop,Department Store,Doctor's Office
3,Barre,Pizza Place,Massage Studio,Flower Shop,Bistro,Restaurant,Gas Station,Bakery,Italian Restaurant,Breakfast Spot,Dumpling Restaurant
4,Berlin,Deli / Bodega,Performing Arts Venue,Design Studio,Diner,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Dumpling Restaurant,Electronics Store


- **Calculate clustering using k-means algorithm.**

In [102]:
# set number of clusters
kclusters = 5
wor_grouped_clustering = wor_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(wor_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 3, 0, 1, 3, 3, 3, 0, 0, 3, 4, 3, 0, 3, 3, 3, 3, 0, 3, 3, 0,
       0, 3, 0, 0, 4, 3, 0, 0, 4, 1, 0, 0, 3, 1, 0, 2, 1, 0, 0, 0, 0, 0,
       3, 3, 3, 3, 0, 4, 0, 0, 3, 3, 0, 0, 0])

- **Build cluster dataset and plot the map.**

In [115]:
wor_merged = wor_df
wor_merged.rename(columns={'City':'Neighborhood'}, inplace=True)

In [118]:
# add clustering labels
wor_merged['Cluster Labels'] = kmeans.labels_
# merge wor_grouped with wor_data to add latitude/longitude for each neighborhood
wor_merged = wor_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

print(wor_merged.shape)
wor_merged.head() # check the last columns!

(57, 14)


Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Southborough,42.305556,-71.525,0,Pizza Place,Bank,Pharmacy,Print Shop,Breakfast Spot,Baseball Field,Spa,Gastropub,Gas Station,Gym / Fitness Center
1,Bolton,42.433333,-71.608333,0,Park,Playground,Performing Arts Venue,Light Rail Station,Baseball Field,Dumpling Restaurant,Farmers Market,Farm,Electronics Store,Donut Shop
2,Harvard,42.5,-71.583333,3,Farmers Market,Hobby Shop,Gourmet Shop,Moving Target,Food,Diner,Discount Store,Dive Bar,Doctor's Office,Donut Shop
3,Northborough,42.319444,-71.641667,0,Pharmacy,Train Station,Pizza Place,Donut Shop,Bowling Alley,Tea Room,Bistro,Park,American Restaurant,Gym
4,Westborough,42.269444,-71.616667,1,Diner,Sushi Restaurant,Coffee Shop,Mediterranean Restaurant,Sandwich Place,Bed & Breakfast,Korean Restaurant,Thai Restaurant,Gas Station,Convenience Store


In [None]:
# create map
map_clusters = folium.Map(location=[42.35, -71.91], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(wor_merged['Latitude'], wor_merged['Longitude'], \
                                  wor_merged['Neighborhood'], wor_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster 1:

In [138]:
wor_merged.loc[wor_merged['Cluster Labels'] == 0, wor_merged.columns[[0] + list(range(4, wor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Southborough,Pizza Place,Bank,Pharmacy,Print Shop,Breakfast Spot,Baseball Field,Spa,Gastropub,Gas Station,Gym / Fitness Center
1,Bolton,Park,Playground,Performing Arts Venue,Light Rail Station,Baseball Field,Dumpling Restaurant,Farmers Market,Farm,Electronics Store,Donut Shop
3,Northborough,Pharmacy,Train Station,Pizza Place,Donut Shop,Bowling Alley,Tea Room,Bistro,Park,American Restaurant,Gym
8,Upton,Deli / Bodega,Food,Restaurant,Business Service,Diner,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Dumpling Restaurant
9,Berlin,Deli / Bodega,Performing Arts Venue,Design Studio,Diner,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Dumpling Restaurant,Electronics Store
13,Holden,Coffee Shop,Donut Shop,Seafood Restaurant,Video Store,Pharmacy,Convenience Store,Sandwich Place,Business Service,Electronics Store,Dumpling Restaurant
18,Lunenburg,Pizza Place,American Restaurant,Convenience Store,Café,Food & Drink Shop,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Dumpling Restaurant
21,Barre,Pizza Place,Massage Studio,Flower Shop,Bistro,Restaurant,Gas Station,Bakery,Italian Restaurant,Breakfast Spot,Dumpling Restaurant
22,Millbury,Pizza Place,Seafood Restaurant,Pharmacy,Thai Restaurant,Ice Cream Shop,Baseball Field,Credit Union,Donut Shop,Dumpling Restaurant,Café
24,Charlton,Pizza Place,Convenience Store,Café,Baseball Field,Food & Drink Shop,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Dumpling Restaurant


### Cluster 2:

In [139]:
wor_merged.loc[wor_merged['Cluster Labels'] == 1, wor_merged.columns[[0] + list(range(4, wor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Westborough,Diner,Sushi Restaurant,Coffee Shop,Mediterranean Restaurant,Sandwich Place,Bed & Breakfast,Korean Restaurant,Thai Restaurant,Gas Station,Convenience Store
31,Oakham,Motorcycle Shop,Women's Store,Home Service,Diner,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Dumpling Restaurant,Electronics Store
35,Leicester,Convenience Store,Donut Shop,Liquor Store,American Restaurant,Sandwich Place,Food & Drink Shop,Discount Store,Dive Bar,Doctor's Office,Dumpling Restaurant
38,West Brookfield,Park,Design Studio,Convenience Store,Garden Center,Bar,Women's Store,Farm,Fast Food Restaurant,Farmers Market,Dumpling Restaurant


### Cluster 3:

In [140]:
wor_merged.loc[wor_merged['Cluster Labels'] == 2, wor_merged.columns[[0] + list(range(4, wor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,North Brookfield,Pizza Place,Supermarket,Convenience Store,Sandwich Place,Pharmacy,Gourmet Shop,Farm,Department Store,Design Studio,Gym / Fitness Center


### Cluster 4:

In [141]:
wor_merged.loc[wor_merged['Cluster Labels'] == 3, wor_merged.columns[[0] + list(range(4, wor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Harvard,Farmers Market,Hobby Shop,Gourmet Shop,Moving Target,Food,Diner,Discount Store,Dive Bar,Doctor's Office,Donut Shop
5,Sterling,Pizza Place,Grocery Store,Mexican Restaurant,Flower Shop,Antique Shop,American Restaurant,Diner,Discount Store,Dive Bar,Food
6,Princeton,Pizza Place,Mountain,Bakery,Food,Diner,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Dumpling Restaurant
7,Sutton,Park,Convenience Store,Gas Station,Women's Store,Food,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Dumpling Restaurant
10,Mendon,Thrift / Vintage Store,Convenience Store,Gift Shop,Coffee Shop,Flower Shop,American Restaurant,Doctor's Office,Dive Bar,Donut Shop,Food
12,Grafton,Park,Hotel,Business Service,Plaza,Convenience Store,Sandwich Place,Baseball Field,Women's Store,Electronics Store,Farm
14,Shrewsbury,Pizza Place,Donut Shop,Pharmacy,Steakhouse,Plaza,Convenience Store,Coffee Shop,Liquor Store,Flower Shop,Italian Restaurant
15,Sturbridge,Park,Hotel,Arts & Crafts Store,Bar,Gift Shop,Bakery,Doctor's Office,Donut Shop,Dumpling Restaurant,Food & Drink Shop
16,Paxton,Pizza Place,Grocery Store,Breakfast Spot,Food,Diner,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Dumpling Restaurant
17,Douglas,Construction & Landscaping,Business Service,Women's Store,Design Studio,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Dumpling Restaurant,Electronics Store


### Cluster 5:

In [142]:
wor_merged.loc[wor_merged['Cluster Labels'] == 4, wor_merged.columns[[0] + list(range(4, wor_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Hopedale,Pizza Place,Gym,Café,Monument / Landmark,Flower Shop,Diner,Discount Store,Dive Bar,Doctor's Office,Donut Shop
26,Ashburnham,Pizza Place,Grocery Store,Convenience Store,Coffee Shop,Baseball Field,Liquor Store,Gourmet Shop,Fast Food Restaurant,Discount Store,Dive Bar
30,Westminster,Sandwich Place,Pharmacy,Pizza Place,Liquor Store,Food,Chinese Restaurant,Baseball Field,Donut Shop,Convenience Store,Gift Shop
49,Dudley,Bar,Hot Dog Joint,American Restaurant,Historic Site,Food & Drink Shop,Discount Store,Dive Bar,Doctor's Office,Donut Shop,Dumpling Restaurant


## Clusters Analysis-Discussion

Using KMeans to cluster the Worcester County’s neighbourhoods, we have some insights into the character of the neighbourhoods based on our lifestyle profiles - reflected in the popular venues of each neighbourhood.

The neighborhood of Gardner, Petersham, and Warren come closest to our current neighbourhood in terms of nearby venues. But these neighbourhoods are still far away from my husband's workplace. Since our first consideration was to reduce his travel time, we might want to assess a few more neighbourhoods in Cluster 4.