# Finding Location to build a New Pet Store in Seattle, WA

IBM Applied Data Science Capstone Project
## Week 5 Final Report

- Build a data frame of neighborhoods in Seattle by web scrapping data from Wikipedia page
- Get the geographic coordinates of the neighborhoods
- Obtained the venue data of neighborhoods using Foursquare API
- Cluster the neighborhoods and visualize the results
- Gave suggestion to potential locations for opening a new pet store


## 1. Import libraries

In [9]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


## 2. Scrap data from Wikipedia page into a Dataframe


In this section, we scrapped the neighborhood names of Seattle from Wikipedia webpage. Then we cleaned the raw data using self-defined function. These name lists all have format "neighborhood_name, Seattle".
We find 27 neighborhoods in Seattle.

In [25]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighborhoods_in_Seattle").text

In [26]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [27]:
# create a list to store neighborhood data
neighborhoodList = []

In [28]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [123]:
# create a new DataFrame from the list
raw_df = pd.DataFrame({"Neighborhood": neighborhoodList})
raw_df.head()

Unnamed: 0,Neighborhood
0,"► Ballard, Seattle‎ (1 C, 16 P)"
1,"► Beacon Hill, Seattle‎ (10 P)"
2,"► Belltown, Seattle‎ (20 P)"
3,"► Broadview, Seattle‎ (3 P)"
4,"► Capitol Hill, Seattle‎ (46 P)"


In [124]:
# define a function to clean the neighborhoods expression
def clean_expression(neighborhood):
    cl_exp=re.sub(r'\([^)]*\)', '', neighborhood)
    cl_exp=re.sub('\\200e','',cl_exp)
    cl_exp=cl_exp[1:].strip()
    return cl_exp    

In [161]:
sea_df=pd.DataFrame({"Neighborhood":
                     [clean_expression(neighborhood) for neighborhood in raw_df["Neighborhood"].tolist()]})
sea_df.head()

Unnamed: 0,Neighborhood
0,"Ballard, Seattle‎"
1,"Beacon Hill, Seattle‎"
2,"Belltown, Seattle‎"
3,"Broadview, Seattle‎"
4,"Capitol Hill, Seattle‎"


## 3. Get the geographic coordinates

In this section, we applied geocoder module which will provide the latitude and longitude of a given neighborhood name. These geographic coordinates are needed by Foursquare API to get venue information for certain location.

Note for each neighborhood, the function we used return exact one geographic coordinates within that neighborhood. There's no schema provided explaining how that returned location is chosen in the neighborhood. And using Google map also didn't suggest that location being the geographic center of neighborhood. This finding thus suggested we should prefer a larger radius when using Foursquare API to explore neighborhood in order to cover larger percentage of each neighborhood.

In [162]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis(neighborhood)
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [163]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in sea_df["Neighborhood"].tolist() ]

In [168]:
coords

[[47.66867000000008, -122.38452999999998],
 [47.57686000000007, -122.31270999999998],
 [47.61576000000008, -122.34463999999997],
 [47.722380000000044, -122.36497999999995],
 [47.62396000000007, -122.31881999999996],
 [47.74274682097066, -122.36532072117258],
 [47.605530000000044, -122.33431999999999],
 [47.667037500000006, -122.38046775],
 [47.59582000000006, -122.32468999999998],
 [47.59680000000003, -122.33423000779023],
 [47.6070000038952, -122.33373992281736],
 [47.647080000000074, -122.32476999999994],
 [47.60878000000008, -122.32642999999996],
 [47.66137000000003, -122.35607999999996],
 [47.547510000000045, -122.32148999999998],
 [47.633480999999996, -122.38702840990703],
 [47.64085000000006, -122.30206999999996],
 [47.59830000000005, -122.33428999999995],
 [47.63749000000007, -122.36503999999996],
 [47.512350000000026, -122.26276999999999],
 [47.55123000000003, -122.28674999999998],
 [47.59028483065355, -122.327037],
 [47.623410000000035, -122.33434999999997],
 [47.6612700000000

In [169]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [170]:
# merge the coordinates into the original dataframe
sea_df['Latitude'] = df_coords['Latitude']
sea_df['Longitude'] = df_coords['Longitude']

In [171]:
# check the neighborhoods and the coordinates
print(sea_df.shape)
sea_df

(27, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Ballard, Seattle‎",47.66867,-122.38453
1,"Beacon Hill, Seattle‎",47.57686,-122.31271
2,"Belltown, Seattle‎",47.61576,-122.34464
3,"Broadview, Seattle‎",47.72238,-122.36498
4,"Capitol Hill, Seattle‎",47.62396,-122.31882
5,"Cascade, Seattle‎",47.742747,-122.365321
6,"Central District, Seattle‎",47.60553,-122.33432
7,"Central Waterfront, Seattle‎",47.667038,-122.380468
8,"Chinatown-International District, Seattle‎",47.59582,-122.32469
9,"Denny Triangle, Seattle‎",47.5968,-122.33423


In [172]:
# save the DataFrame as CSV file
sea_df.to_csv("sea_df.csv", index=False)

## 4. Create a map of Seattle and visualize the neighborhoods

In this section, we draw the neighborhoods based on map of Seattle. We examined the map by comparing the map using Google map manully, and by our experience living in Seattle. We found that at least one neighborhood "Central District" was not correctly recognized and lacked its dot on map. After carefully review this mistake, we found that Foursquare API doesn't have information of such a neighborhood as "Central District", instead it returned the geographic information of another location called "Central Business District". 

Based on living experience in Seattle, we knew that neighborhood "Central District" is residential cluster and isn't negligible to our business problem. Thus we manually chose Powell Barnett Park which appeared to be geogrphic center of Central District, and replaced its geographic coordinates with the errorneous one. 

In [314]:
# get the coordinates of Seattle
address = 'Seattle'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Seattle {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Seattle 47.6038321, -122.3300624.


In [318]:
# correct lon and lat for "Central District" entry
new_coord=get_latlng("Powell Barnett Park, Seattle,WA")
new_coord

[47.605270000000075, -122.29675999999995]

In [324]:
sea_df.loc[6,"Latitude"]=new_coord[0]
sea_df.loc[6,"Longitude"]=new_coord[1]
sea_df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Ballard, Seattle‎",47.66867,-122.38453
1,"Beacon Hill, Seattle‎",47.57686,-122.31271
2,"Belltown, Seattle‎",47.61576,-122.34464
3,"Broadview, Seattle‎",47.72238,-122.36498
4,"Capitol Hill, Seattle‎",47.62396,-122.31882
5,"Cascade, Seattle‎",47.742747,-122.365321
6,"Central District, Seattle‎",47.60527,-122.29676
7,"Central Waterfront, Seattle‎",47.667038,-122.380468
8,"Chinatown-International District, Seattle‎",47.59582,-122.32469
9,"Denny Triangle, Seattle‎",47.5968,-122.33423


In [325]:
# create map of Seattle using latitude and longitude values
map_sea = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(sea_df['Latitude'], sea_df['Longitude'], sea_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_sea)  
    
map_sea

In [326]:
# save the map as HTML file
map_sea.save('map_sea.html')

## 5. Use Foursquare API to explore the neighborhood

We used Foursquare API to explore the neighborhood. We choose the radius to be 3000 meters in order to cover most of each neighborhood. This radius ensure that at least 100 venues information will be explored for every neighborhood.

In [327]:
# define Foursquare Credentials and Version
CLIENT_ID = 'JOBRMWO0X5QP3MURGAWRE1LM2GQ3BTRK1TCJBXKVXJXVZJNB' # your Foursquare ID
CLIENT_SECRET = 'DZQK0J50ZKLCBJF0ZD5U2BRTEIDKXYYAC5UEEVENSXMQLBYT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JOBRMWO0X5QP3MURGAWRE1LM2GQ3BTRK1TCJBXKVXJXVZJNB
CLIENT_SECRET:DZQK0J50ZKLCBJF0ZD5U2BRTEIDKXYYAC5UEEVENSXMQLBYT


In [328]:
radius = 3000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(sea_df['Latitude'], sea_df['Longitude'], sea_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [329]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2671, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,"Ballard, Seattle‎",47.66867,-122.38453,Ballard Farmer's Market,47.667466,-122.384316,Farmers Market
1,"Ballard, Seattle‎",47.66867,-122.38453,Salt & Straw,47.66821,-122.385534,Ice Cream Shop
2,"Ballard, Seattle‎",47.66867,-122.38453,Mr. Gyros,47.669304,-122.382018,Mediterranean Restaurant
3,"Ballard, Seattle‎",47.66867,-122.38453,La Carta De Oaxaca,47.668169,-122.385767,Mexican Restaurant
4,"Ballard, Seattle‎",47.66867,-122.38453,DIGS,47.668859,-122.382379,Furniture / Home Store


In [330]:
# examine the number of returned venues for each neighborhood
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Ballard, Seattle‎",100,100,100,100,100,100
"Beacon Hill, Seattle‎",100,100,100,100,100,100
"Belltown, Seattle‎",100,100,100,100,100,100
"Broadview, Seattle‎",100,100,100,100,100,100
"Capitol Hill, Seattle‎",100,100,100,100,100,100
"Cascade, Seattle‎",100,100,100,100,100,100
"Central District, Seattle‎",100,100,100,100,100,100
"Central Waterfront, Seattle‎",100,100,100,100,100,100
"Chinatown-International District, Seattle‎",100,100,100,100,100,100
"Denny Triangle, Seattle‎",100,100,100,100,100,100


In [331]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 241 uniques categories.


In [332]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Farmers Market', 'Ice Cream Shop', 'Mediterranean Restaurant',
       'Mexican Restaurant', 'Furniture / Home Store', 'Burger Joint',
       'Miscellaneous Shop', 'New American Restaurant', 'Record Shop',
       'Post Office', 'Dessert Shop', 'Bakery', 'Donut Shop', 'Gym',
       'Rock Club', 'Tea Room', 'Toy / Game Store', 'Beer Bar', 'Bar',
       'Sandwich Place', 'Gaming Cafe', 'Coffee Shop',
       'Seafood Restaurant', 'French Restaurant', 'Marijuana Dispensary',
       'Brewery', 'Vegetarian / Vegan Restaurant', 'Museum',
       'Supermarket', 'Canal Lock', 'Grocery Store', 'Sushi Restaurant',
       'Italian Restaurant', 'Pizza Place', 'Yoga Studio',
       'Caribbean Restaurant', 'Cocktail Bar', 'Baseball Field', 'Park',
       'Smoothie Shop', 'Fish Market', 'Breakfast Spot',
       'Japanese Restaurant', 'Deli / Bodega', 'High School',
       'Warehouse Store', 'Gastropub', 'Café', 'Liquor Store',
       'Boat or Ferry'], dtype=object)

In [333]:
# check if the results contain "Pet Store"
"Pet Store" in venues_df['VenueCategory'].unique()

True

## 6. Analyze each neighborhood

In this section, we generated a table showing the constitution of venues in each neighborhood. Each row sum up to 1 and each coordinate indicates the proportion of the corresponding venue category in that neighborhood. In the end, we created a new DataFrame for Pet Shop data only.

In [334]:
# one hot encoding
sea_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sea_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sea_onehot.columns[-1]] + list(sea_onehot.columns[:-1])
sea_onehot = sea_onehot[fixed_columns]

print(sea_onehot.shape)
sea_onehot.head()

(2671, 242)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,African Restaurant,Airport,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Shop,Bike Trail,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Burger Joint,Bus Line,Café,Cajun / Creole Restaurant,Camera Store,Canal,Canal Lock,Caribbean Restaurant,Casino,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Science Building,College Theater,Comedy Club,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Disc Golf,Discount Store,Dive Bar,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Exhibit,Eye Doctor,Fabric Shop,Fair,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Heliport,Herbs & Spices Store,High School,Historic Site,History Museum,Hobby Shop,Hot Dog Joint,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Malay Restaurant,Marijuana Dispensary,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Park,Performing Arts Venue,Pet Store,Pharmacy,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Post Office,Pub,Radio Station,Ramen Restaurant,Record Shop,Rental Car Location,Rental Service,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Plaza,Smoothie Shop,Snack Place,Soccer Field,Social Club,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Storage Facility,Student Center,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Track,Trail,Travel & Transport,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,"Ballard, Seattle‎",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Ballard, Seattle‎",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Ballard, Seattle‎",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Ballard, Seattle‎",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Ballard, Seattle‎",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [335]:
# frequency table for venue categories in each neighborhood
sea_grouped = sea_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(sea_grouped.shape)
sea_grouped

(27, 242)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,African Restaurant,Airport,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Shop,Bike Trail,Board Shop,Boat or Ferry,Bookstore,Botanical Garden,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Burger Joint,Bus Line,Café,Cajun / Creole Restaurant,Camera Store,Canal,Canal Lock,Caribbean Restaurant,Casino,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Science Building,College Theater,Comedy Club,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Disc Golf,Discount Store,Dive Bar,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Exhibit,Eye Doctor,Fabric Shop,Fair,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Heliport,Herbs & Spices Store,High School,Historic Site,History Museum,Hobby Shop,Hot Dog Joint,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Light Rail Station,Liquor Store,Lounge,Malay Restaurant,Marijuana Dispensary,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Park,Performing Arts Venue,Pet Store,Pharmacy,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Post Office,Pub,Radio Station,Ramen Restaurant,Record Shop,Rental Car Location,Rental Service,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Plaza,Smoothie Shop,Snack Place,Soccer Field,Social Club,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Storage Facility,Student Center,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Track,Trail,Travel & Transport,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,"Ballard, Seattle‎",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.09,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0
1,"Beacon Hill, Seattle‎",0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.02,0.0,0.04,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.05,0.01,0.01,0.01,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
2,"Belltown, Seattle‎",0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.03,0.08,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.02,0.01,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0
3,"Broadview, Seattle‎",0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0
4,"Capitol Hill, Seattle‎",0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.13,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0
5,"Cascade, Seattle‎",0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.04,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Central District, Seattle‎",0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.04,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.07,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0
7,"Central Waterfront, Seattle‎",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.1,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01
8,"Chinatown-International District, Seattle‎",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.03,0.12,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0
9,"Denny Triangle, Seattle‎",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.02,0.09,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0


In [336]:
# there're 18 neighborhoods out of total 27 that has at least one Pet Store
len(sea_grouped[sea_grouped["Pet Store"] > 0])

18

In [337]:
# new DataFrame for Pet Store data only
sea_petStore = sea_grouped[["Neighborhoods","Pet Store"]]
sea_petStore.head()

Unnamed: 0,Neighborhoods,Pet Store
0,"Ballard, Seattle‎",0.0
1,"Beacon Hill, Seattle‎",0.01
2,"Belltown, Seattle‎",0.01
3,"Broadview, Seattle‎",0.0
4,"Capitol Hill, Seattle‎",0.01


## 7. Cluster neighborhoods in Seattle

In this section, we run k-means clustering and seperate Seattle neighborhoods into 3 clusters.

In [338]:
# set number of clusters
kclusters = 3

sea_clustering = sea_petStore.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sea_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 0, 1, 0, 1, 0, 1, 0, 2], dtype=int32)

In [339]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
sea_merged = sea_petStore.copy()

# add clustering labels
sea_merged["Cluster Labels"] = kmeans.labels_

In [340]:
sea_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
sea_merged.head()

Unnamed: 0,Neighborhood,Pet Store,Cluster Labels
0,"Ballard, Seattle‎",0.0,1
1,"Beacon Hill, Seattle‎",0.01,0
2,"Belltown, Seattle‎",0.01,0
3,"Broadview, Seattle‎",0.0,1
4,"Capitol Hill, Seattle‎",0.01,0


In [341]:
# merge sea_grouped with sea_data to add latitude/longitude for each neighborhood
sea_merged = sea_merged.join(sea_df.set_index("Neighborhood"), on="Neighborhood")

print(sea_merged.shape)
sea_merged.head() 

(27, 5)


Unnamed: 0,Neighborhood,Pet Store,Cluster Labels,Latitude,Longitude
0,"Ballard, Seattle‎",0.0,1,47.66867,-122.38453
1,"Beacon Hill, Seattle‎",0.01,0,47.57686,-122.31271
2,"Belltown, Seattle‎",0.01,0,47.61576,-122.34464
3,"Broadview, Seattle‎",0.0,1,47.72238,-122.36498
4,"Capitol Hill, Seattle‎",0.01,0,47.62396,-122.31882


In [342]:
# sort the results by Cluster Labels
print(sea_merged.shape)
sea_merged.sort_values(["Cluster Labels"], inplace=True)
sea_merged

(27, 5)


Unnamed: 0,Neighborhood,Pet Store,Cluster Labels,Latitude,Longitude
13,"Fremont, Seattle‎",0.01,0,47.66137,-122.35608
17,"Pioneer Square, Seattle‎",0.01,0,47.5983,-122.33429
14,"Georgetown, Seattle‎",0.010101,0,47.54751,-122.32149
25,"Wedgwood, Seattle‎",0.01,0,47.68701,-122.29494
12,"First Hill, Seattle‎",0.01,0,47.60878,-122.32643
11,"Eastlake, Seattle‎",0.01,0,47.64708,-122.32477
10,Downtown Seattle‎,0.01,0,47.607,-122.33374
8,"Chinatown-International District, Seattle‎",0.01,0,47.59582,-122.32469
26,"West Seattle, Seattle‎",0.01,0,47.60762,-122.33359
1,"Beacon Hill, Seattle‎",0.01,0,47.57686,-122.31271


Finally let's visualize the result.

In [348]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors=['yellow','green','blue']

# add markers to the map
for lat, lon, poi, cluster in zip(sea_merged['Latitude'], sea_merged['Longitude'], sea_merged['Neighborhood'], sea_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=colors[cluster-1],
        fill=True,
        fill_color=colors[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [344]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

## 8. Examine clustering result

### Cluster 0, blue dots in map, has moderate number of Pet Stores

In [352]:
sea_merged.loc[sea_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Pet Store,Cluster Labels,Latitude,Longitude
13,"Fremont, Seattle‎",0.01,0,47.66137,-122.35608
17,"Pioneer Square, Seattle‎",0.01,0,47.5983,-122.33429
14,"Georgetown, Seattle‎",0.010101,0,47.54751,-122.32149
25,"Wedgwood, Seattle‎",0.01,0,47.68701,-122.29494
12,"First Hill, Seattle‎",0.01,0,47.60878,-122.32643
11,"Eastlake, Seattle‎",0.01,0,47.64708,-122.32477
10,Downtown Seattle‎,0.01,0,47.607,-122.33374
8,"Chinatown-International District, Seattle‎",0.01,0,47.59582,-122.32469
26,"West Seattle, Seattle‎",0.01,0,47.60762,-122.33359
1,"Beacon Hill, Seattle‎",0.01,0,47.57686,-122.31271


### Cluster 1, yellow dots, barely no Pet Store



In [353]:
sea_merged.loc[sea_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Pet Store,Cluster Labels,Latitude,Longitude
21,"SoDo, Seattle‎",0.0,1,47.590285,-122.327037
19,"Rainier Beach, Seattle‎",0.0,1,47.51235,-122.26277
18,"Queen Anne, Seattle‎",0.0,1,47.63749,-122.36504
7,"Central Waterfront, Seattle‎",0.0,1,47.667038,-122.380468
0,"Ballard, Seattle‎",0.0,1,47.66867,-122.38453
15,"Magnolia, Seattle‎",0.0,1,47.633481,-122.387028
3,"Broadview, Seattle‎",0.0,1,47.72238,-122.36498
5,"Cascade, Seattle‎",0.0,1,47.742747,-122.365321
16,"Montlake, Seattle‎",0.0,1,47.64085,-122.30207


### Cluster 2, green dots, more Pet Stores



In [354]:
sea_merged.loc[sea_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Pet Store,Cluster Labels,Latitude,Longitude
20,"Rainier Valley, Seattle‎",0.02,2,47.55123,-122.28675
22,"South Lake Union, Seattle‎",0.02,2,47.62341,-122.33435
23,"University District, Seattle‎",0.03,2,47.66127,-122.31307
24,"Wallingford, Seattle‎",0.02,2,47.65555,-122.3265
9,"Denny Triangle, Seattle‎",0.02,2,47.5968,-122.33423


## 9. Suggestion of location for opening a Pet Store