# Introduction/Business Problem

Houston is the forth biggest city in U.S. and has been a place growing in population and business in the past few decades. With such a quick expansion, Houston is definitely one the most fast-growing city in the U.S. The growing city comes with increasing population and comercial activitis and one of them is coffee drinking. Coffee shops have turned into a place for people to chill out or to have some informal business meeting or social events. Therefore, how the coffee shop interact with its nearby neighborhood is quite an interesting question. In this study, I will explore the Houston neighborhood, discuss the interaction between different venues and the coffee shops and then conclude on where is a reasonable place to run a coffee shop in Houston.

##### Business problem
Where to run a coffee shop in Houston? I will discuss on the Houston neighborhoods and demonstrate where would be an ideal place to run a coffee shop in Houston based on the available data.
##### Target Audience
People interested in opening a coffee shop in one of Houston neighborhood would be the target audience for this study.

# Data

## Data used in this study

In this study Foursquare spatial data would be used to explore the venues in Houston neighborhoods and the demographic data of each neighborhood would be analyzed. Foursquare (https://foursquare.com/) currently has the most comprehensive location dataset and quite accurate and the data can be acquired through API. Many popular services like Apple Maps, Uber, Snapchat and Twitter use their location data. It collects data from users to add or complete the information, which is the crowd-sourced data and it keeps growing while more and more people use their service. To acquire the data, two queries need to be specified: what to find and where to find. In this study, the Houston neighborhood data was acquired from Wikipedia (https://en.wikipedia.org/wiki/List_of_Houston_neighborhoods), which contains nine location and 88 neighborhoods. The 2017 demographic data of Houston neighborhood was used in this study and was acquired from city of Houston website (https://www.houstontx.gov/planning/Demographics/) including total population, persons per square mile, median household income (USD) and median housing value (USD). The geojson data of Houston neighborhoods was acquired from GitHub portal (https://github.com/blackmad/neighborhoods).

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
 
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2021.5.30  |       h033912b_0         136 KB  conda-forge
    certifi-2021.5.30          |   py38h50d1736_0         141 KB  conda-forge
    conda-4.10.3               |   py38h50d1736_2         3.1 MB  conda-forge
    geographiclib-1.52         |     pyhd8ed1ab_0          35 KB  conda-forge
    geopy-2.2.0                |     pyhd8ed1ab_0          67 KB  conda-forge
    openssl-1.1.1l             |       h0d85af4_0         1.9 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         5.3 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda

## 1. Download and Explore Dataset
### Acquire data from Wikipedia page
Neighborhood Data: Wikipedia https://en.wikipedia.org/wiki/List_of_Houston_neighborhoods

In [5]:
from bs4 import BeautifulSoup

In [7]:
html_doc='List of Houston neighborhoods - Wikipedia.htm'
with open(html_doc) as html_file:
    soup = BeautifulSoup(html_file, 'html.parser')
#print(soup.prettify())

In [8]:
data=[]
for lines in soup.find_all('td'):
    data.append(lines.text)
#print(data)

In [9]:
print(data.index('Lawndale / Wayside\n'))
print(len(data))

349
419


In [10]:
#### Keep the needed data
del data[352:419]
print(len(data))

352


In [11]:
##### Create list for Location and Neighborhood
Location=[]
Neighborhood=[]
for i in range(len(data)):
    if i%4==1:
        Neighborhood.append(data[i].strip('\n'))
    elif i%4==2:
        Location.append(data[i].strip('\n'))
    
Neighborhood[0:10]

['Willowbrook',
 'Greater Greenspoint',
 'Carverdale',
 'Fairbanks / Northwest Crossing',
 'Greater Inwood',
 'Acres Home',
 'Hidden Valley',
 'Westbranch',
 'Addicks / Park Ten',
 'Spring Branch West']

In [12]:
df_Houston = pd.DataFrame(
    {'Location': Location, 'Neighborhood': Neighborhood})
df_Houston.sort_values(['Neighborhood'], ascending=True, inplace=True)
df_Houston.head()

Unnamed: 0,Location,Neighborhood
5,Northwest,Acres Home
8,West,Addicks / Park Ten
22,West,Afton Oaks / River Oaks
24,Southwest,Alief
33,South,Astrodome Area


In [13]:
##### Write Neighborhood data to Houston Neighborhoods.csv and then manually copy and paste the demographic data from PDF files. 
df_Houston.to_csv('Houston Neighborhoods')

In [14]:
Demographic=pd.read_csv('Houston_neighborhood_demographic.csv')
Demographic.drop(['Unnamed: 8','Unnamed: 9','Unnamed: 10'], inplace=True, axis=1)
Demographic['Persons per Square Mile'].astype("float64")
Demographic.head()

Unnamed: 0,Location,Neighborhood,Latitude,Longitude,Total Population,Persons per Square Mile,Median Household Income (USD),Median Housing Value (USD)
0,Northwest,Acres Home,29.870715,-95.436558,25846.0,2885.0,41358.0,134810.0
1,West,Addicks / Park Ten,29.813269,-95.645476,20694.0,883.0,82869.0,221710.0
2,West,Afton Oaks / River Oaks,29.747474,-95.433124,14875.0,4119.0,152092.0,1203824.0
3,Southwest,Alief,29.682815,-95.593005,119598.0,8106.0,42928.0,100590.0
4,South,Astrodome Area,29.685404,-95.403244,17944.0,5646.0,51510.0,126921.0


In [15]:
#### Check geojson data
import json
with open ('houston_geo.json') as json_file:
    houston_geo=json.load(json_file)

name=[]
for i in range(len(houston_geo['features'])):
    name.append(houston_geo['features'][i]['properties']['name'])
name.sort()
#name

In [16]:
#### Match the Neighborhood name in Demographic with houston_geo.json

for i in range(len(name)):
    if name[i] != Demographic['Neighborhood'][i]:
        Demographic['Neighborhood'][i]=name[i]


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Demographic['Neighborhood'][i]=name[i]


In [17]:
print('The dataframe has {} Location and {} Neighborhoods.'.format(
        len(Demographic['Location'].unique()),
        Demographic.shape[0]
    )
)

The dataframe has 9 Location and 88 Neighborhoods.


## Get the top 100 venues that are in each Houston neighborhood within a radius of 2000 meters

#### Define Foursquare Credentials and Version

In [18]:
CLIENT_ID = 'JDCIUT3B1CNOV0WPN4UD1NI11MIU5EPDWNAVBDBGMHKVVKW2' # your Foursquare ID
CLIENT_SECRET = 'ZSSF3VIB2TWEXKLZMWLQCACIFKU5TTWJTWOTO4L20BMB2OOW' # your Foursquare Secret
VERSION = '20190812'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JDCIUT3B1CNOV0WPN4UD1NI11MIU5EPDWNAVBDBGMHKVVKW2
CLIENT_SECRET:ZSSF3VIB2TWEXKLZMWLQCACIFKU5TTWJTWOTO4L20BMB2OOW


In [19]:
#####Create a function to download venues data from Foursquare and extract category repeatly for all the neighborhoods in Houston
def getNearbyVenues(names, latitudes, longitudes, radius=2000, LIMIT = 100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
#### Create a new dataframe
Houston_venues = getNearbyVenues(names=Demographic['Neighborhood'],
                                   latitudes=Demographic['Latitude'],
                                   longitudes=Demographic['Longitude']
                                  )

Acres Home
Addicks Park Ten
Afton Oaks / River Oaks Area
Alief
Astrodome Area
Braeburn
Braeswood Place
Brays Oaks
Briarforest Area
Carverdale
Central Northwest
Central Southwest
Clear Lake
Clinton Park Tri-Community
Denver Harbor / Port Houston
Downtown
East Houston
East Little York / Homestead
Eastex - Jensen Area
Edgebrook Area
El Dorado / Oates Prairie
Eldridge / West Oaks
Fairbanks / Northwest Crossing
Fondren Gardens
Fort Bend / Houston
Fourth Ward
Golfcrest / Bellfort / Reveille
Greater Eastwood
Greater Fifth Ward
Greater Greenspoint
Greater Heights
Greater Hobby Area
Greater Inwood
Greater Third Ward
Greater Uptown
Greenway / Upper Kirby Area
Gulfgate Riverview / Pine Valley
Gulfton
Harrisburg / Manchester
Hidden Valley
Hunterwood
Iah / Airport Area
Independence Heights
Kashmere Gardens
Kingwood Area
Lake Houston
Langwood
Lawndale / Wayside
Lazy Brook / Timbergrove
Macgregor
Magnolia Park
Meadowbrook / Allendale
Medical Center Area
Memorial
Meyerland Area
Mid West
Midtown
Minnet

In [21]:
print(Houston_venues.shape)
Houston_venues.head()

(5107, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Acres Home,29.870715,-95.436558,Burns Orginal BBQ Est. 1973,29.87695,-95.420984,BBQ Joint
1,Acres Home,29.870715,-95.436558,Family Dollar,29.872253,-95.437729,Discount Store
2,Acres Home,29.870715,-95.436558,Sylvester Turner Park,29.865159,-95.452752,Park
3,Acres Home,29.870715,-95.436558,Vera Steel,29.86184,-95.44203,Construction & Landscaping
4,Acres Home,29.870715,-95.436558,Houston Astros MLB Urban Youth Academy,29.869973,-95.452752,Baseball Field


In [22]:
#### Find unique categories
print('There are {} uniques categories.'.format(len(Houston_venues['Venue Category'].unique())))

There are 333 uniques categories.


# Methodology 

## Explore Neighborhood in Houston

In [23]:
# one hot encoding
Houston_onehot = pd.get_dummies(Houston_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Houston_onehot['Neighborhood Name'] = Houston_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Houston_onehot.columns[-1]] + list(Houston_onehot.columns[:-1])
Houston_onehot = Houston_onehot[fixed_columns]

Houston_onehot.head()

Unnamed: 0,Neighborhood Name,ATM,Accessories Store,Advertising Agency,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bay,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Shop,Bike Trail,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Casino,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Circus,Clothing Store,Cocktail Bar,Coffee Shop,College Baseball Diamond,College Football Field,College Gym,College Rec Center,College Residence Hall,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Fabric Shop,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gun Range,Gun Shop,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Himalayan Restaurant,History Museum,Hobby Shop,Home Service,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean BBQ Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Lawyer,Lebanese Restaurant,Light Rail Station,Lighting Store,Lingerie Store,Liquor Store,Locksmith,Lounge,Mac & Cheese Joint,Marine Terminal,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Mongolian Restaurant,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,Opera House,Optical Shop,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pawn Shop,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Service,Pet Store,Pharmacy,Picnic Area,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Port,Portuguese Restaurant,Post Office,Print Shop,Pub,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Restaurant,Road,Rugby Pitch,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Steakhouse,Storage Facility,Student Center,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Tennis Stadium,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Trail,Travel & Transport,Truck Stop,Turkish Restaurant,Vacation Rental,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waste Facility,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Acres Home,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Acres Home,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Acres Home,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Acres Home,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Acres Home,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [24]:
### Dataframe size
Houston_onehot.shape

(5107, 334)

#### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [25]:
Houston_grouped = Houston_onehot.groupby('Neighborhood Name').mean().reset_index()
Houston_grouped.head()

Unnamed: 0,Neighborhood Name,ATM,Accessories Store,Advertising Agency,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bay,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Shop,Bike Trail,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Casino,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Circus,Clothing Store,Cocktail Bar,Coffee Shop,College Baseball Diamond,College Football Field,College Gym,College Rec Center,College Residence Hall,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Fabric Shop,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gun Range,Gun Shop,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Himalayan Restaurant,History Museum,Hobby Shop,Home Service,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean BBQ Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Lawyer,Lebanese Restaurant,Light Rail Station,Lighting Store,Lingerie Store,Liquor Store,Locksmith,Lounge,Mac & Cheese Joint,Marine Terminal,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Mongolian Restaurant,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,Opera House,Optical Shop,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pawn Shop,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Service,Pet Store,Pharmacy,Picnic Area,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Port,Portuguese Restaurant,Post Office,Print Shop,Pub,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Restaurant,Road,Rugby Pitch,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stadium,Steakhouse,Storage Facility,Student Center,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Tennis Stadium,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Trail,Travel & Transport,Truck Stop,Turkish Restaurant,Vacation Rental,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waste Facility,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Acres Home,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Addicks Park Ten,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Afton Oaks / River Oaks Area,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.03,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0
3,Alief,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.068182,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.113636,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.068182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.068182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Astrodome Area,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0


In [26]:
### Dataframe size
Houston_grouped.shape

(88, 334)

#### Print each neighborhood along with the top 5 most common venues

In [27]:
num_top_venues = 5

for hood in Houston_grouped['Neighborhood Name']:
    print("----"+hood+"----")
    temp = Houston_grouped[Houston_grouped['Neighborhood Name'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Acres Home----
                        venue  freq
0  Construction & Landscaping   0.2
1                        Park   0.2
2              Baseball Field   0.2
3              Discount Store   0.2
4                   BBQ Joint   0.2


----Addicks Park Ten----
                   venue  freq
0      Indian Restaurant  0.25
1    Government Building  0.25
2                Dog Run  0.25
3  Outdoors & Recreation  0.25
4   Other Great Outdoors  0.00


----Afton Oaks / River Oaks Area----
                     venue  freq
0  New American Restaurant  0.06
1   Furniture / Home Store  0.06
2                     Café  0.04
3       Mexican Restaurant  0.04
4         Sushi Restaurant  0.04


----Alief----
                  venue  freq
0  Fast Food Restaurant  0.11
1           Pizza Place  0.07
2         Grocery Store  0.07
3        Discount Store  0.07
4     Mobile Phone Shop  0.05


----Astrodome Area----
                venue  freq
0      Sandwich Place  0.05
1   Mobile Phone Shop  0.05
2         

            venue  freq
0     Zoo Exhibit  0.09
1             Bar  0.08
2     Coffee Shop  0.06
3  Science Museum  0.04
4           Trail  0.04


----Northshore----
                venue  freq
0         Coffee Shop  0.07
1                 Bar  0.04
2  Mexican Restaurant  0.04
3  Italian Restaurant  0.04
4                Café  0.03


----Northside Village----
                  venue  freq
0    Mexican Restaurant  0.11
1  Fast Food Restaurant  0.08
2        Discount Store  0.06
3     Convenience Store  0.05
4        Sandwich Place  0.05


----Northside/Northline----
                    venue  freq
0    Fast Food Restaurant  0.07
1  Furniture / Home Store  0.05
2          Sandwich Place  0.05
3          Discount Store  0.05
4     Fried Chicken Joint  0.05


----Ost / South Union----
                  venue  freq
0    Mexican Restaurant  0.20
1        Discount Store  0.05
2           Video Store  0.04
3        Sandwich Place  0.04
4  Fast Food Restaurant  0.04


----Park Place----
        

#### Put into a *pandas* dataframe

In [28]:
#### First, write a function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [80]:
#### Create a new dataframe and display the top 8 venues for each neighborhood.
num_top_venues = 8

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood Name'] = Houston_grouped['Neighborhood Name']

for ind in np.arange(Houston_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Houston_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Acres Home,Construction & Landscaping,Park,Baseball Field,Discount Store,BBQ Joint,ATM,Outdoor Sculpture,Paper / Office Supplies Store
1,Addicks Park Ten,Indian Restaurant,Government Building,Dog Run,Outdoors & Recreation,Other Great Outdoors,Pakistani Restaurant,Paintball Field,Outdoor Supply Store
2,Afton Oaks / River Oaks Area,New American Restaurant,Furniture / Home Store,Café,Mexican Restaurant,Sushi Restaurant,Gym,Burger Joint,Italian Restaurant
3,Alief,Fast Food Restaurant,Pizza Place,Grocery Store,Discount Store,Mobile Phone Shop,Video Store,Cosmetics Shop,Golf Course
4,Astrodome Area,Sandwich Place,Mobile Phone Shop,Pizza Place,Mexican Restaurant,Coffee Shop,Grocery Store,Hotel,Food Truck


In [81]:
neighborhoods_venues_sorted.shape

(88, 9)

 ## Cluster Neighborhoods

In [104]:
#### Run kmeans to cluster the neighborhood into 8 clusters.

# set number of clusters
kclusters = 8

Houston_grouped_clustering = Houston_grouped.drop('Neighborhood Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Houston_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 0, 2, 3, 2, 5, 5, 3, 2, 5, 5, 3, 2, 5, 5, 2, 3, 3, 3, 5, 5, 2,
       2, 3, 1, 2, 5, 2, 3, 5, 2, 5, 3, 3, 2, 2, 2, 3, 2, 6, 3, 5, 5, 2,
       3, 2, 4, 5, 5, 2, 3, 5, 3, 2, 2, 2, 2, 2, 7, 2, 2, 5, 3, 5, 5, 5,
       5, 5, 3, 5, 7, 5, 5, 3, 5, 5, 5, 2, 3, 3, 2, 2, 5, 3, 2, 5, 5, 2],
      dtype=int32)

In [90]:
#### Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Houston_merged = Demographic[['Location', 'Neighborhood', 'Latitude', 'Longitude']]

# merge Houston_grouped with toronto_data to add latitude/longitude for each neighborhood
Houston_merged = Houston_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood Name'), on='Neighborhood')

Houston_merged.head() # check the last columns!

Unnamed: 0,Location,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Northwest,Acres Home,29.870715,-95.436558,1,Construction & Landscaping,Park,Baseball Field,Discount Store,BBQ Joint,ATM,Outdoor Sculpture,Paper / Office Supplies Store
1,West,Addicks Park Ten,29.813269,-95.645476,0,Indian Restaurant,Government Building,Dog Run,Outdoors & Recreation,Other Great Outdoors,Pakistani Restaurant,Paintball Field,Outdoor Supply Store
2,West,Afton Oaks / River Oaks Area,29.747474,-95.433124,2,New American Restaurant,Furniture / Home Store,Café,Mexican Restaurant,Sushi Restaurant,Gym,Burger Joint,Italian Restaurant
3,Southwest,Alief,29.682815,-95.593005,3,Fast Food Restaurant,Pizza Place,Grocery Store,Discount Store,Mobile Phone Shop,Video Store,Cosmetics Shop,Golf Course
4,South,Astrodome Area,29.685404,-95.403244,2,Sandwich Place,Mobile Phone Shop,Pizza Place,Mexican Restaurant,Coffee Shop,Grocery Store,Hotel,Food Truck


In [91]:
# create map
map_clusters = folium.Map(location=[29.8, -95.3676974], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Houston_merged['Latitude'], Houston_merged['Longitude'], Houston_merged['Neighborhood'], Houston_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Results
#### Examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, and then assign a name to each cluster.

## Cluster 0

In [92]:
Houston_merged.loc[Houston_merged['Cluster Labels'] == 0, Houston_merged.columns[[1] + list(range(5, Houston_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
1,Addicks Park Ten,Indian Restaurant,Government Building,Dog Run,Outdoors & Recreation,Other Great Outdoors,Pakistani Restaurant,Paintball Field,Outdoor Supply Store


## Cluster 1

In [93]:
Houston_merged.loc[Houston_merged['Cluster Labels'] == 1, Houston_merged.columns[[1] + list(range(5, Houston_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Acres Home,Construction & Landscaping,Park,Baseball Field,Discount Store,BBQ Joint,ATM,Outdoor Sculpture,Paper / Office Supplies Store
24,Fort Bend / Houston,Convenience Store,Hotel,Food Truck,Seafood Restaurant,Discount Store,Health & Beauty Service,Park,Construction & Landscaping


## Cluster 2

In [94]:
Houston_merged.loc[Houston_merged['Cluster Labels'] == 2, Houston_merged.columns[[1] + list(range(5, Houston_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
2,Afton Oaks / River Oaks Area,New American Restaurant,Furniture / Home Store,Café,Mexican Restaurant,Sushi Restaurant,Gym,Burger Joint,Italian Restaurant
4,Astrodome Area,Sandwich Place,Mobile Phone Shop,Pizza Place,Mexican Restaurant,Coffee Shop,Grocery Store,Hotel,Food Truck
8,Briarforest Area,Sandwich Place,Chinese Restaurant,Bakery,Mediterranean Restaurant,Grocery Store,Coffee Shop,Pizza Place,Mexican Restaurant
12,Clear Lake,Pizza Place,Intersection,Coffee Shop,Italian Restaurant,Japanese Restaurant,Salon / Barbershop,Gas Station,Park
15,Downtown,Hotel,Bar,Italian Restaurant,Burger Joint,Mexican Restaurant,Baseball Stadium,Gym,New American Restaurant
21,Eldridge / West Oaks,Sandwich Place,Burger Joint,Vietnamese Restaurant,Mexican Restaurant,Coffee Shop,Gas Station,Cajun / Creole Restaurant,Bank
22,Fairbanks / Northwest Crossing,Mobile Phone Shop,Mexican Restaurant,Deli / Bodega,Burger Joint,Ice Cream Shop,Seafood Restaurant,Gym / Fitness Center,Cosmetics Shop
25,Fourth Ward,Bar,Coffee Shop,Park,Mexican Restaurant,Burger Joint,Fried Chicken Joint,Gym,Pizza Place
27,Greater Eastwood,Bar,Fast Food Restaurant,Pizza Place,Coffee Shop,Mexican Restaurant,Dive Bar,Fried Chicken Joint,Taco Place
30,Greater Heights,Mexican Restaurant,Restaurant,Coffee Shop,American Restaurant,Taco Place,Park,Ice Cream Shop,Bakery


## Cluster 3

In [95]:
Houston_merged.loc[Houston_merged['Cluster Labels'] == 3, Houston_merged.columns[[1] + list(range(5, Houston_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
3,Alief,Fast Food Restaurant,Pizza Place,Grocery Store,Discount Store,Mobile Phone Shop,Video Store,Cosmetics Shop,Golf Course
7,Brays Oaks,Fast Food Restaurant,Grocery Store,Discount Store,Pizza Place,Bank,Fried Chicken Joint,Pharmacy,Sandwich Place
11,Central Southwest,Fried Chicken Joint,Gas Station,Fast Food Restaurant,Discount Store,Chinese Restaurant,Seafood Restaurant,Bus Station,Video Store
16,East Houston,Fast Food Restaurant,Discount Store,Fried Chicken Joint,Cosmetics Shop,Storage Facility,Mobile Phone Shop,Grocery Store,Seafood Restaurant
17,East Little York / Homestead,Discount Store,Food,Convenience Store,Gas Station,Clothing Store,Park,Spa,Pizza Place
18,Eastex - Jensen Area,Fast Food Restaurant,Discount Store,Gas Station,Pizza Place,Fried Chicken Joint,Bank,Event Service,Outdoor Supply Store
23,Fondren Gardens,Fast Food Restaurant,Discount Store,Fried Chicken Joint,Mobile Phone Shop,Sandwich Place,Intersection,Park,Burger Joint
28,Greater Fifth Ward,Fast Food Restaurant,Mexican Restaurant,Park,Intersection,Pharmacy,Gas Station,American Restaurant,Pizza Place
32,Greater Inwood,Fast Food Restaurant,Pharmacy,Pizza Place,Discount Store,Paper / Office Supplies Store,Automotive Shop,Sandwich Place,Spa
33,Greater Third Ward,Fast Food Restaurant,Fried Chicken Joint,Gas Station,Pizza Place,Burger Joint,Discount Store,Market,Pharmacy


## Cluster 4

In [96]:
Houston_merged.loc[Houston_merged['Cluster Labels'] == 4, Houston_merged.columns[[1] + list(range(5, Houston_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
46,Langwood,Park,Lake,Auto Garage,Vacation Rental,Beach,Scenic Lookout,Paper / Office Supplies Store,Pakistani Restaurant


## Cluster 5

In [97]:
Houston_merged.loc[Houston_merged['Cluster Labels'] == 5, Houston_merged.columns[[1] + list(range(5, Houston_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
5,Braeburn,Discount Store,Pharmacy,Clothing Store,Chinese Restaurant,Cosmetics Shop,Rental Car Location,Gym / Fitness Center,Pizza Place
6,Braeswood Place,Mexican Restaurant,Sandwich Place,Donut Shop,Burger Joint,Gas Station,Fried Chicken Joint,Pharmacy,Chinese Restaurant
9,Carverdale,Gas Station,BBQ Joint,Hotel,Breakfast Spot,Bowling Alley,Sandwich Place,Mexican Restaurant,Taco Place
10,Central Northwest,Park,Gas Station,Discount Store,Pizza Place,Mexican Restaurant,Trail,Salon / Barbershop,Cosmetics Shop
13,Clinton Park Tri-Community,Post Office,Lawyer,Boat or Ferry,Gas Station,Fast Food Restaurant,Outdoor Sculpture,Baseball Field,Park
14,Denver Harbor / Port Houston,Mexican Restaurant,Gas Station,Fast Food Restaurant,Sandwich Place,Fried Chicken Joint,Bank,Pizza Place,Discount Store
19,Edgebrook Area,Fast Food Restaurant,Mexican Restaurant,Pizza Place,Sandwich Place,Bank,Discount Store,Cosmetics Shop,Chinese Restaurant
20,El Dorado / Oates Prairie,Ice Cream Shop,Mexican Restaurant,Intersection,Rental Car Location,Rental Service,Paper / Office Supplies Store,Pakistani Restaurant,Paintball Field
26,Golfcrest / Bellfort / Reveille,Pizza Place,Mexican Restaurant,Breakfast Spot,Fast Food Restaurant,Diner,Vietnamese Restaurant,Burger Joint,Mobile Phone Shop
29,Greater Greenspoint,Fast Food Restaurant,Mexican Restaurant,Sandwich Place,Pizza Place,Fried Chicken Joint,Discount Store,Hotel,Shoe Store


## Cluster 6

In [98]:
Houston_merged.loc[Houston_merged['Cluster Labels'] == 6, Houston_merged.columns[[1] + list(range(5, Houston_merged.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
39,Hidden Valley,Boat or Ferry,Park,Marine Terminal,Fast Food Restaurant,Mexican Restaurant,Other Great Outdoors,Pakistani Restaurant,Paintball Field


## Cluster 7

In [99]:
Houston_merged.loc[Houston_merged['Cluster Labels'] == 7, Houston_merged.columns[[1] + list(range(5, Houston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
58,Museum Park,IT Services,Discount Store,Gas Station,Clothing Store,Grocery Store,Other Repair Shop,Paper / Office Supplies Store,Pakistani Restaurant
70,South Acres / Crestmont Park,Grocery Store,Discount Store,Convenience Store,Park,Seafood Restaurant,Pet Store,Gas Station,Business Service


# Cluster Neighborhoods
#### Drop clusters with only one neighborhood (e.g. Clusters 6)

In [25]:
Houston_grouped_01=Houston_grouped.drop([0, 1, 20, 46, 70])
Houston_grouped_01.iloc[1]

Neighborhood Name                               Alief
ATM                                                 0
Accessories Store                                   0
Advertising Agency                                  0
Afghan Restaurant                                   0
African Restaurant                                  0
Airport                                             0
Airport Lounge                                      0
Airport Service                                     0
Airport Terminal                                    0
American Restaurant                                 0
Aquarium                                            0
Arcade                                              0
Argentinian Restaurant                              0
Art Gallery                                         0
Art Museum                                          0
Arts & Crafts Store                                 0
Asian Restaurant                            0.0217391
Athletics & Sports          

In [26]:
#### Run k-means to cluster the neighborhood into 7 clusters.

# set number of clusters
kclusters = 7

Houston_grouped_01_clustering = Houston_grouped_01.drop('Neighborhood Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0, max_iter=1000, tol=0.01).fit(Houston_grouped_01_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([2, 5, 2, 5, 5, 5, 2, 5, 5, 6, 5, 0, 5, 2, 0, 3, 0, 5, 2, 2, 5, 3,
       2, 5, 5, 5, 5, 2, 1, 5, 0, 5, 2, 2, 5, 2, 3, 5, 0, 5, 2, 6, 5, 5,
       5, 2, 5, 1, 3, 2, 2, 2, 2, 2, 4, 2, 2, 5, 5, 1, 5, 5, 1, 1, 6, 5,
       5, 5, 6, 1, 5, 5, 2, 6, 6, 2, 2, 5, 5, 2, 5, 5, 2], dtype=int32)

In [27]:
#### Create a new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood Name'] = Houston_grouped_01['Neighborhood Name']

for ind in np.arange(Houston_grouped_01.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Houston_grouped_01.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.shape

(83, 11)

In [28]:
#### Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Houston_merged_01 = Demographic[['Location', 'Neighborhood', 'Latitude', 'Longitude']].drop([0, 1, 20, 46, 70])

# merge Houston_grouped with toronto_data to add latitude/longitude for each neighborhood
Houston_merged_01 = Houston_merged_01.join(neighborhoods_venues_sorted.set_index('Neighborhood Name'), on='Neighborhood')

Houston_merged_01.shape # check the last columns!

(83, 15)

In [29]:
# create map
map_clusters_01 = folium.Map(location=[29.8, -95.3676974], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Houston_merged_01['Latitude'], Houston_merged_01['Longitude'], Houston_merged_01['Neighborhood'], Houston_merged_01['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_01)
       
map_clusters_01

## Analyze each cluster

### Cluster 0
This cluster has various types of restaurants and some coffee shop. 

In [265]:
Houston_merged_01.loc[Houston_merged_01['Cluster Labels'] == 0, Houston_merged_01.columns[[1] + list(range(5, Houston_merged_01.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Afton Oaks / River Oaks Area,Furniture / Home Store,New American Restaurant,Mexican Restaurant,Seafood Restaurant,Sushi Restaurant,Café,Burger Joint,Cosmetics Shop,Coffee Shop,Italian Restaurant
15,Downtown,Hotel,Bar,Italian Restaurant,Burger Joint,Coffee Shop,Lounge,Cocktail Bar,Music Venue,New American Restaurant,Baseball Stadium
25,Fourth Ward,Burger Joint,Coffee Shop,Pizza Place,Park,Theater,Mexican Restaurant,American Restaurant,Southern / Soul Food Restaurant,Bar,Vietnamese Restaurant
30,Greater Heights,Mexican Restaurant,Juice Bar,Gift Shop,Italian Restaurant,Café,New American Restaurant,Park,Taco Place,Coffee Shop,Ice Cream Shop
35,Greenway / Upper Kirby Area,Hotel,Burger Joint,Clothing Store,Department Store,American Restaurant,Cosmetics Shop,Mexican Restaurant,Steakhouse,Men's Store,Bakery
36,Gulfgate Riverview / Pine Valley,Bakery,Seafood Restaurant,Mexican Restaurant,Burger Joint,Italian Restaurant,Coffee Shop,Chinese Restaurant,Sushi Restaurant,Pizza Place,New American Restaurant
43,Kashmere Gardens,Coffee Shop,Airport Terminal,Sandwich Place,Airport Lounge,Fast Food Restaurant,Airport,Electronics Store,Pizza Place,American Restaurant,Japanese Restaurant
53,Memorial,Zoo Exhibit,Burger Joint,Coffee Shop,Fast Food Restaurant,Restaurant,American Restaurant,Trail,Lingerie Store,Breakfast Spot,Smoothie Shop
54,Meyerland Area,Mexican Restaurant,American Restaurant,Clothing Store,Breakfast Spot,Coffee Shop,Shopping Mall,Steakhouse,Seafood Restaurant,Burger Joint,Sandwich Place
57,Minnetex,Vietnamese Restaurant,Bar,Coffee Shop,Mexican Restaurant,Breakfast Spot,Pizza Place,Wine Bar,Sandwich Place,American Restaurant,Beer Garden


### Cluster 1
This cluster although mainly has restaurant and food services, the venues categories are quite diverse. Cluster 1 contains too much data, and here only show part of it; therefore, further analysis is needed. Cluster 1 was extracted and clustered using K-means with 7 cluster centroids.

In [266]:
Houston_merged_01.loc[Houston_merged_01['Cluster Labels'] == 1, Houston_merged_01.columns[[1] + list(range(5, Houston_merged_01.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Astrodome Area,Mobile Phone Shop,Sandwich Place,Mexican Restaurant,Pizza Place,Sports Bar,BBQ Joint,Coffee Shop,Football Stadium,Bar,Pharmacy
5,Braeburn,Discount Store,Pharmacy,Cosmetics Shop,Sandwich Place,Bank,Bar,Rental Car Location,Pizza Place,Chinese Restaurant,Clothing Store
6,Braeswood Place,Fast Food Restaurant,Mexican Restaurant,Chinese Restaurant,Pizza Place,Bakery,Fried Chicken Joint,Donut Shop,Pharmacy,Coffee Shop,Sandwich Place
8,Briarforest Area,Sandwich Place,Grocery Store,Chinese Restaurant,Mexican Restaurant,Pharmacy,Pizza Place,Coffee Shop,Bakery,Hotel,Burger Joint
9,Carverdale,Gas Station,BBQ Joint,Hotel,Sandwich Place,Burger Joint,Convenience Store,Taco Place,Bank,Café,Furniture / Home Store
10,Central Northwest,Park,Gas Station,Mexican Restaurant,Pizza Place,Spa,Trail,Business Service,Bar,Convenience Store,Bank
12,Clear Lake,Pizza Place,Bank,Coffee Shop,Italian Restaurant,Salon / Barbershop,Sandwich Place,Intersection,Spa,Gas Station,Liquor Store
21,Eldridge / West Oaks,Sandwich Place,Coffee Shop,Mexican Restaurant,Burger Joint,Fast Food Restaurant,French Restaurant,Vietnamese Restaurant,Bakery,Bank,Thai Restaurant
22,Fairbanks / Northwest Crossing,Seafood Restaurant,Mexican Restaurant,Fast Food Restaurant,Bakery,Miscellaneous Shop,BBQ Joint,Mobile Phone Shop,Vietnamese Restaurant,Burger Joint,Cosmetics Shop
26,Golfcrest / Bellfort / Reveille,Mexican Restaurant,Pizza Place,Diner,Fast Food Restaurant,Burger Joint,Mobile Phone Shop,Breakfast Spot,Sandwich Place,Pharmacy,Bakery


### Cluster 2
This cluster has unique venue categories distribution, which has park, zoo and stores.

In [267]:
Houston_merged_01.loc[Houston_merged_01['Cluster Labels'] == 2, Houston_merged_01.columns[[1] + list(range(5, Houston_merged_01.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
58,Museum Park,Clothing Store,Park,Discount Store,Gas Station,Zoo Exhibit,Duty-free Shop,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Service


### Cluster 3
This cluster mainly has stores and fast food restaurant.

In [268]:
Houston_merged_01.loc[Houston_merged_01['Cluster Labels'] == 3, Houston_merged_01.columns[[1] + list(range(5, Houston_merged_01.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,East Houston,Fast Food Restaurant,Seafood Restaurant,Discount Store,Fried Chicken Joint,Grocery Store,Mobile Phone Shop,Waste Facility,Sandwich Place,Donut Shop,Video Game Store
17,East Little York / Homestead,Discount Store,Food,Clothing Store,Brewery,Fried Chicken Joint,Spa,Gas Station,Park,Convenience Store,Pizza Place
18,Eastex - Jensen Area,Discount Store,Fast Food Restaurant,Pizza Place,Gas Station,Food Truck,Arts & Crafts Store,Brewery,Construction & Landscaping,Supermarket,Fried Chicken Joint
44,Kingwood Area,Discount Store,Fast Food Restaurant,Fried Chicken Joint,Motel,Department Store,General Travel,Auto Workshop,Chinese Restaurant,Grocery Store,Sandwich Place
78,Sunnyside,Discount Store,Fried Chicken Joint,Pizza Place,Fast Food Restaurant,Convenience Store,Chinese Restaurant,Southern / Soul Food Restaurant,Electronics Store,Shoe Store,Bank


### Cluster 4
This cluster mainly has fast food restaurant, fried chicken and donuts place.

In [269]:
Houston_merged_01.loc[Houston_merged_01['Cluster Labels'] == 4, Houston_merged_01.columns[[1] + list(range(5, Houston_merged_01.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Central Southwest,Fried Chicken Joint,Fast Food Restaurant,Gas Station,Nightclub,Home Service,Convenience Store,Discount Store,Intersection,Shop & Service,Video Store
13,Clinton Park Tri-Community,Boat or Ferry,Fast Food Restaurant,Donut Shop,Discount Store,Sandwich Place,Lawyer,Gas Station,BBQ Joint,Filipino Restaurant,Financial or Legal Service
33,Greater Third Ward,Fried Chicken Joint,Fast Food Restaurant,Gas Station,Pizza Place,Grocery Store,Burger Joint,Market,Donut Shop,Sandwich Place,BBQ Joint
41,Iah / Airport Area,Fast Food Restaurant,Donut Shop,Gym / Fitness Center,Home Service,Rental Car Location,Fried Chicken Joint,Intersection,Gas Station,Sandwich Place,Dry Cleaner
68,Settegast,Bus Stop,Fast Food Restaurant,Discount Store,Fried Chicken Joint,Convenience Store,Burger Joint,Park,Gym / Fitness Center,Sandwich Place,Video Store
73,South Park,Gas Station,Burger Joint,Fast Food Restaurant,Fried Chicken Joint,Discount Store,Salon / Barbershop,Cosmetics Shop,Baseball Field,Pizza Place,Pharmacy
79,Trinity / Houston Gardens,Discount Store,Convenience Store,Gas Station,Fried Chicken Joint,BBQ Joint,Fast Food Restaurant,Pharmacy,Seafood Restaurant,Burger Joint,Sandwich Place


### Cluster 5
This cluster has unique venue categories distribution, which has food truck, BBQ, park, and stores.

In [270]:
Houston_merged_01.loc[Houston_merged_01['Cluster Labels'] == 5, Houston_merged_01.columns[[1] + list(range(5, Houston_merged_01.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
24,Fort Bend / Houston,Food Truck,BBQ Joint,Convenience Store,Discount Store,Park,Moving Target,Gastropub,Hotel,Himalayan Restaurant,Event Service


### Cluster 6
This cluster mainly has fast food restaurant, Mexican restaurant, pizza place and a few stores and parks.

In [271]:
Houston_merged_01.loc[Houston_merged_01['Cluster Labels'] == 6, Houston_merged_01.columns[[1] + list(range(5, Houston_merged_01.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Alief,Fast Food Restaurant,Pizza Place,Discount Store,Vietnamese Restaurant,Grocery Store,Gas Station,Taco Place,Video Store,Sandwich Place,Donut Shop
7,Brays Oaks,Discount Store,Grocery Store,Fast Food Restaurant,Pizza Place,Bank,Sandwich Place,Pharmacy,Fried Chicken Joint,Women's Store,Gas Station
14,Denver Harbor / Port Houston,Gas Station,Mexican Restaurant,Fast Food Restaurant,Pizza Place,Sandwich Place,Bank,Pharmacy,Discount Store,Fried Chicken Joint,Park
19,Edgebrook Area,Fast Food Restaurant,Mexican Restaurant,Pizza Place,Sandwich Place,Seafood Restaurant,Bank,Discount Store,Grocery Store,Rental Car Location,Video Store
23,Fondren Gardens,Fast Food Restaurant,Gas Station,Discount Store,Park,Sandwich Place,Fried Chicken Joint,Thrift / Vintage Store,Bakery,Tex-Mex Restaurant,Bar
28,Greater Fifth Ward,Mexican Restaurant,Fast Food Restaurant,Park,Intersection,Discount Store,Gas Station,Pharmacy,Pizza Place,Pub,Sandwich Place
31,Greater Hobby Area,Mexican Restaurant,Fast Food Restaurant,Food Truck,Burger Joint,Gas Station,Discount Store,Airport Terminal,Video Store,Business Service,Grocery Store
32,Greater Inwood,Fast Food Restaurant,Pizza Place,Pharmacy,Video Store,Discount Store,Trail,Grocery Store,Mexican Restaurant,Taco Place,Seafood Restaurant
37,Gulfton,Fast Food Restaurant,Convenience Store,Mexican Restaurant,Discount Store,Furniture / Home Store,Taco Place,Hardware Store,Bakery,Seafood Restaurant,Motorcycle Shop
39,Hidden Valley,Park,Harbor / Marina,Boat or Ferry,Home Service,Discount Store,Pizza Place,Mexican Restaurant,Fast Food Restaurant,Gas Station,Event Space


### Reprocess Cluster 1 because it contains too much data

In [30]:
cluster1 = Houston_merged_01.loc[Houston_merged_01['Cluster Labels'] == 1]
len(cluster1)
cluster1.index


Int64Index([31, 51, 63, 66, 67, 74], dtype='int64')

In [31]:
Houston_grouped_02=Houston_grouped.iloc[[4, 5, 6, 8, 9, 10, 12, 21, 22, 26, 27, 29, 34, 38, 42, 45, 47,49, 55, 56, 63, 64, 65, 66, 67, 69, 71, 72, 74, 75, 76, 82, 84, 85,86], :]
Houston_grouped_02.iloc[1]


Neighborhood Name                           Braeburn
ATM                                                0
Accessories Store                                  0
Advertising Agency                                 0
Afghan Restaurant                                  0
African Restaurant                                 0
Airport                                            0
Airport Lounge                                     0
Airport Service                                    0
Airport Terminal                                   0
American Restaurant                           0.0125
Aquarium                                           0
Arcade                                             0
Argentinian Restaurant                             0
Art Gallery                                        0
Art Museum                                         0
Arts & Crafts Store                                0
Asian Restaurant                              0.0125
Athletics & Sports                            

In [32]:
Houston_grouped_02.shape

(35, 335)

In [34]:
#### Run k-means to cluster the neighborhood into 7 clusters.

# set number of clusters
kclusters = 7

Houston_grouped_02_clustering = Houston_grouped_02.drop('Neighborhood Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Houston_grouped_02_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([2, 0, 5, 2, 3, 0, 6, 2, 2, 5, 2, 5, 2, 2, 5, 5, 3, 1, 2, 2, 1, 0,
       5, 4, 1, 0, 5, 5, 1, 5, 5, 3, 2, 5, 5], dtype=int32)

In [35]:
#### Create a new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood Name'] = Houston_grouped_02['Neighborhood Name']

for ind in np.arange(Houston_grouped_02.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Houston_grouped_02.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.shape

(35, 11)

In [36]:
#### Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Houston_merged_02 = Demographic[['Location', 'Neighborhood', 'Latitude', 'Longitude']].iloc[[4, 5, 6, 8, 9, 10, 12, 21, 22, 26, 27, 29, 34, 38, 42, 45, 47,49, 55, 56, 63, 64, 65, 66, 67, 69, 71, 72, 74, 75, 76, 82, 84, 85,86], :]

# merge Houston_grouped with toronto_data to add latitude/longitude for each neighborhood
Houston_merged_02 = Houston_merged_02.join(neighborhoods_venues_sorted.set_index('Neighborhood Name'), on='Neighborhood')

Houston_merged_02.shape # check the last columns!

(35, 15)

In [41]:
# create map
map_clusters_02 = folium.Map(location=[29.8, -95.3676974], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Houston_merged_02['Latitude'], Houston_merged_02['Longitude'], Houston_merged_02['Neighborhood'], Houston_merged_02['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_02)
       
map_clusters_02

### Cluster 1-0
This cluster mainly has gas station and Mexican restaurant.

In [38]:
Houston_merged_02.loc[Houston_merged_02['Cluster Labels'] == 0, Houston_merged_02.columns[[1] + list(range(5, Houston_merged_02.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Braeburn,Discount Store,Pharmacy,Cosmetics Shop,Sandwich Place,Bank,Bar,Rental Car Location,Pizza Place,Chinese Restaurant,Clothing Store
10,Central Northwest,Park,Gas Station,Mexican Restaurant,Pizza Place,Spa,Trail,Business Service,Bar,Convenience Store,Bank
64,Park Place,Gas Station,Discount Store,Mexican Restaurant,Golf Course,Home Service,Moving Target,Baseball Field,Pizza Place,Mobile Phone Shop,Breakfast Spot
69,Sharpstown,Fried Chicken Joint,Pizza Place,Sandwich Place,Pharmacy,Fast Food Restaurant,Restaurant,Convenience Store,Gym / Fitness Center,College Gym,Korean Restaurant


### Cluster 1-1
This cluster mainly has Mexican restaurant, gas station and hotel.

In [349]:
Houston_merged_02.loc[Houston_merged_02['Cluster Labels'] == 1, Houston_merged_02.columns[[1] + list(range(5, Houston_merged_02.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
49,Macgregor,Mexican Restaurant,Gas Station,Sandwich Place,Hotel,Italian Restaurant,Park,Wine Bar,Video Store,Burger Joint,Baseball Field
63,Ost / South Union,Mexican Restaurant,Discount Store,Hotel,Fried Chicken Joint,Park,Grocery Store,Gas Station,Video Store,Sandwich Place,Food Truck
67,Second Ward,Mexican Restaurant,Fast Food Restaurant,Park,Gas Station,Grocery Store,Snack Place,Bar,Video Store,Food Truck,BBQ Joint
74,Spring Branch Central,Mexican Restaurant,Pharmacy,Grocery Store,Sandwich Place,Burger Joint,Gym / Fitness Center,Discount Store,Taco Place,Video Store,Vietnamese Restaurant


### Cluster 1-2
This cluster mainly has fast food restaurant, sandwich place and coffee shop.

In [350]:
Houston_merged_02.loc[Houston_merged_02['Cluster Labels'] == 2, Houston_merged_02.columns[[1] + list(range(5, Houston_merged_02.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Astrodome Area,Mobile Phone Shop,Sandwich Place,Mexican Restaurant,Pizza Place,Sports Bar,BBQ Joint,Coffee Shop,Football Stadium,Bar,Pharmacy
8,Briarforest Area,Sandwich Place,Grocery Store,Chinese Restaurant,Mexican Restaurant,Pharmacy,Pizza Place,Coffee Shop,Bakery,Hotel,Burger Joint
21,Eldridge / West Oaks,Sandwich Place,Coffee Shop,Mexican Restaurant,Burger Joint,Fast Food Restaurant,French Restaurant,Vietnamese Restaurant,Bakery,Bank,Thai Restaurant
22,Fairbanks / Northwest Crossing,Seafood Restaurant,Mexican Restaurant,Fast Food Restaurant,Bakery,Miscellaneous Shop,BBQ Joint,Mobile Phone Shop,Vietnamese Restaurant,Burger Joint,Cosmetics Shop
27,Greater Eastwood,Fast Food Restaurant,Coffee Shop,Bar,Food Truck,Pizza Place,Sandwich Place,Taco Place,Gas Station,Mexican Restaurant,Discount Store
34,Greater Uptown,Fast Food Restaurant,Coffee Shop,Sandwich Place,Pizza Place,Bar,Bagel Shop,Bakery,Fried Chicken Joint,Food Truck,Smoothie Shop
38,Harrisburg / Manchester,Fast Food Restaurant,Indian Restaurant,Coffee Shop,Mexican Restaurant,Burger Joint,Fried Chicken Joint,Pizza Place,Taco Place,Grocery Store,Restaurant
55,Mid West,Department Store,Pizza Place,Furniture / Home Store,Fast Food Restaurant,Clothing Store,Coffee Shop,Mexican Restaurant,Burger Joint,Grocery Store,Gas Station
56,Midtown,Middle Eastern Restaurant,Mexican Restaurant,Food Truck,BBQ Joint,Sushi Restaurant,Japanese Restaurant,Thai Restaurant,Cosmetics Shop,Donut Shop,Sporting Goods Shop
84,Westchase,Sandwich Place,Coffee Shop,Grocery Store,Hotel,Pharmacy,Bakery,Breakfast Spot,Seafood Restaurant,Burger Joint,Mexican Restaurant


### Cluster 1-3
This cluster mainly has gas station, hotel and BBQ.

In [351]:
Houston_merged_02.loc[Houston_merged_02['Cluster Labels'] == 3, Houston_merged_02.columns[[1] + list(range(5, Houston_merged_02.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Carverdale,Gas Station,BBQ Joint,Hotel,Sandwich Place,Burger Joint,Convenience Store,Taco Place,Bank,Café,Furniture / Home Store
47,Lawndale / Wayside,Hotel,Fast Food Restaurant,Pizza Place,Sandwich Place,Mexican Restaurant,Gas Station,Bank,Rental Car Location,Donut Shop,Grocery Store
82,Westbranch,Gas Station,Hotel,Café,BBQ Joint,Burger Joint,Fast Food Restaurant,Taco Place,Video Store,Fried Chicken Joint,Donut Shop


### Cluster 1-4
This cluster has unique venue categories distribution, which mainly is Mexican restaurant, sandwich place and gas station.

In [352]:
Houston_merged_02.loc[Houston_merged_02['Cluster Labels'] == 4, Houston_merged_02.columns[[1] + list(range(5, Houston_merged_02.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
66,Pleasantville Area,Mexican Restaurant,Sandwich Place,Gas Station,Rental Service,Athletics & Sports,Electronics Store,Burger Joint,Bank,Fried Chicken Joint,Home Service


### Cluster 1-5
This cluster mainly has fast food restaurant, Mexican restaurant and pizza place.

In [353]:
Houston_merged_02.loc[Houston_merged_02['Cluster Labels'] == 5, Houston_merged_02.columns[[1] + list(range(5, Houston_merged_02.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Braeswood Place,Fast Food Restaurant,Mexican Restaurant,Chinese Restaurant,Pizza Place,Bakery,Fried Chicken Joint,Donut Shop,Pharmacy,Coffee Shop,Sandwich Place
26,Golfcrest / Bellfort / Reveille,Mexican Restaurant,Pizza Place,Diner,Fast Food Restaurant,Burger Joint,Mobile Phone Shop,Breakfast Spot,Sandwich Place,Pharmacy,Bakery
29,Greater Greenspoint,Fast Food Restaurant,Pizza Place,Mexican Restaurant,Discount Store,Sandwich Place,Hotel,Shoe Store,Bank,Burger Joint,Seafood Restaurant
42,Independence Heights,Mobile Phone Shop,Discount Store,Gas Station,Fried Chicken Joint,Mexican Restaurant,Pizza Place,Coffee Shop,Bank,BBQ Joint,Fast Food Restaurant
45,Lake Houston,Fast Food Restaurant,Pizza Place,Park,Coffee Shop,American Restaurant,Bank,Gym,Shipping Store,Chinese Restaurant,Grocery Store
65,Pecan Park,Mexican Restaurant,Pizza Place,Discount Store,Fried Chicken Joint,Gas Station,Mobile Phone Shop,Diner,Shoe Store,Motorcycle Shop,Sandwich Place
71,South Belt / Ellington,Fast Food Restaurant,Vietnamese Restaurant,Pizza Place,Mexican Restaurant,Bank,Grocery Store,Pharmacy,Video Store,Burger Joint,Cajun / Creole Restaurant
72,South Main,Mexican Restaurant,Fast Food Restaurant,Fried Chicken Joint,Sports Bar,Sandwich Place,Bar,BBQ Joint,Football Stadium,Southern / Soul Food Restaurant,Restaurant
75,Spring Branch East,Fast Food Restaurant,Mexican Restaurant,Bank,Pizza Place,Sandwich Place,Gas Station,Rental Car Location,Chinese Restaurant,Breakfast Spot,Vietnamese Restaurant
76,Spring Branch North,Mexican Restaurant,Pizza Place,Fast Food Restaurant,Park,Video Store,Baseball Field,Café,Sandwich Place,Vietnamese Restaurant,Fried Chicken Joint


### Cluster 1-6
This cluster has unique venue categories distribution, which mainly is pizza place, bank and coffee shop.

In [354]:
Houston_merged_02.loc[Houston_merged_02['Cluster Labels'] == 6, Houston_merged_02.columns[[1] + list(range(5, Houston_merged_02.shape[1]))]]


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Clear Lake,Pizza Place,Bank,Coffee Shop,Italian Restaurant,Salon / Barbershop,Sandwich Place,Intersection,Spa,Gas Station,Liquor Store


# Discussion 

The ideal location to run a coffee shop would be the neighborhood that does not have too many coffee shops or similar substitute. The ideal location would also be the place with more people and higher income.
Based on the neighborhood clustering analysis, Cluster 0 already has a lot of restaurant as well as coffee shops and café so would not be considered at this point. Cluster 2 has a very unique venue distribution, which is a more service oriented neighborhood. Although this neighborhood is not popular with coffee shops, people may not spend their time here for coffee instead for certain destination only. Cluster 3 and 1-0 have restaurants and stores, which means people do spend their time having life in the neighborhoods; however, this cluster do not popular with coffee shop. Therefore, Cluster 3 may be a potential candidate to run a coffee shop. Cluster 4 has mainly fast food, fried chick and donuts and this food tend to be faster which may suggest that neighborhoods in Cluster 4 more likely to have something quick and therefore not the slow brewing coffee so Cluster 4 may not be an idea place. Cluster 5 and 1-4 has a very unique venue distribution, which does not have coffee shop and therefore can be a possible candidate. Cluster 6 and 1-5 has mainly fast food and pizza place, which may also suggest that people in these neighborhoods tend to have something fast and therefore not an ideal place for coffee shop. Cluster 1-1 and 1-3 are popular with hotel, which may indicate that people in this neighborhood are mostly traveler and therefore would not hang around the neighborhood so would not be an ideal place to run a coffee shop. Cluster 1-2 already has certain amount of coffee shops, which makes it less likely a place to run a coffee shop. Cluster 1-6 has a very unique venue distribution, which is popular with coffee shop and therefore is less likely an ideal place to run a coffee shop.

### Generate map

In [42]:
# create map of Houston City using latitude and longitude values
latitude=29.83
longitude = -95.2676974
map_houston_01 = folium.Map(location=[latitude, longitude], zoom_start=10)

# generate choropleth map 
houston_geo = r'houston_geo.json'
map_houston_01.choropleth(
    geo_data=houston_geo,
    data=Demographic,
    columns=['Neighborhood','Persons per Square Mile'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Persons per Square Mile'
)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Houston_merged_01['Latitude'], Houston_merged_01['Longitude'], Houston_merged_01['Neighborhood'], Houston_merged_01['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_houston_01)
    

# display map
map_houston_01

In [43]:
# create map of Houston City using latitude and longitude values
latitude=29.83
longitude = -95.2676974
map_houston_01 = folium.Map(location=[latitude, longitude], zoom_start=10)

# generate choropleth map 
houston_geo = r'houston_geo.json'
map_houston_01.choropleth(
    geo_data=houston_geo,
    data=Demographic,
    columns=['Neighborhood','Median Household Income (USD)'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Median Household Income (USD)'
)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Houston_merged_01['Latitude'], Houston_merged_01['Longitude'], Houston_merged_01['Neighborhood'], Houston_merged_01['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_houston_01)
    

# display map
map_houston_01

In [44]:
# create map of Houston City using latitude and longitude values
latitude=29.83
longitude = -95.2676974
map_houston_02 = folium.Map(location=[latitude, longitude], zoom_start=10)

# generate choropleth map 
houston_geo = r'houston_geo.json'
map_houston_02.choropleth(
    geo_data=houston_geo,
    data=Demographic,
    columns=['Neighborhood','Persons per Square Mile'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Persons per Square Mile'
)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Houston_merged_02['Latitude'], Houston_merged_02['Longitude'], Houston_merged_02['Neighborhood'], Houston_merged_02['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_houston_02)
    

# display map
map_houston_02

In [45]:
# create map of Houston City using latitude and longitude values
latitude=29.83
longitude = -95.2676974
map_houston_02 = folium.Map(location=[latitude, longitude], zoom_start=10)

# generate choropleth map 
houston_geo = r'houston_geo.json'
map_houston_02.choropleth(
    geo_data=houston_geo,
    data=Demographic,
    columns=['Neighborhood','Median Household Income (USD)'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Median Household Income (USD)'
)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Houston_merged_02['Latitude'], Houston_merged_02['Longitude'], Houston_merged_02['Neighborhood'], Houston_merged_02['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_houston_02)
    

# display map
map_houston_02

Based on the discussion above the candidate neighborhood for running a coffee shop would be in Cluster 3, 5, 1-0 and 1-4. Only the northside village in Cluster 5 has the relative high population density and household income and therefore this neighborhood would be recommended for running a coffee shop.

# Conclusion

Based on the available data, the venue categories in each neighborhood is analyzed and clustered. Further comparison with the density of population and household income is able to provide us a more insightful spatial analysis for the business problem. In this study, the running a coffee shop in Houston was demonstrated as an example for this kind of analysis. If more data (e.g. crime rate) is provided in the future a more detailed analysis could be done.