# Capstone Project – The Battle of Neighborhoods 

##        Finding a better Place in Louisville, Kentucky to Open a Restaurant 

## 1. Introduction:
The purpose of this project is to help new entrepreneurs who want to make better decisions regarding a better location for openning any business place in the city of Luisville, Kentucky state in USA.

Before starting a business, it is necessary to carry out an analysis of features and facilities that the different neighborhoods of a city could offer. This project is for those people who are looking for safe and better neighborhoods in order to open a commercial establishments. Specifically, this report will be targeted to stakeholders interested in opening any food and beverage businesses in Luisville City, Usa.


### 1.1 Background

Louisville Jefferson County is the largest city in the Commonwealth of Kentucky and the 29th most-populous city in the United States. It is one of two cities in Kentucky designated as first-class, is a significant manufacturing, with two major Ford Motor Company plants, and the headquarters and major home appliance factory of GE Appliances, is also a major center of the American Whiskey industry, with about one third of all bourbon whiskey comming from Louisville. This city prides itself in its large assortmen of small, independent businesses and restaurants, some of which have become known for their ingenuity and creativity. This city is a place that offers many opportunities to invest since the purchasing power of citizens is very favorable.  


### 1.2 Business Problem

The main objective of this project is to adopt machine learning tools to develop an analysis of the neighborhoods of the selected city (Louisville, Kentucky) to suggest a better place to invest in order to set up a tex-mex food restaurant.     

To solve this business problem, we are going to cluster Louisville neighborhoods in order to recommend places with favorable characteristics for investing, easy access, security, commercial activity, etc.

### 1.3 Target Audience
<ul>
    <li> A business entrepreneurs that want to open a new restaurant in Louisville City of Kentucky State.</li>
    <li> Someone curious about data have an idea, how to beneficial it is to open a restaurant and what area the pros and cons.</li>
    <li> Business Analyst or Data Scientists, who wish to analyze the neighborhoods of Louisville using some machine learning techniques</li>
</ul>    

## 2. Data Section

Based on definition of the problem to solve, factors that may be relevants to make our decision are:
<ul>
    <li>Demographic information, e.g. population, density, education, age, income</li>
    <li>Louisville data containing neighborhoods and boroughs, latitudes, and longitudes, extracted from Wikipedia (<a href="https://en.wikipedia.org/wiki/Neighborhoods_in_Louisville,_Kentucky">Wikipedia</a>) and <a href="https://geodatos.net/">Geodatos</a>. In this section is necessary to clean the data in order to organize in a correct way</li>
    <li>Restaurants, shopping malls in every neighborhood</li>
    <li>Coordinate of Louisville obtained using https://geodatos.net</li>
    <li>It is necessary to use <strong>Geopy</strong> to get geological location by address name</li>
    <li>To get the most common venues of Borough of Louisville is necesary to use <strong>Foursquare API</strong></li>
    <li>To get the venues' record we use Foursquare API</li>
    <li><strong>Folium:</strong> this library is used to show Louisville boundary in the Folium map</li>
    <li>During the realization of the project, it is likely that it will be necessary to use other python libraries so that the data is adequate for better results.</li>
 </ul>   

## 3. Methodology Section
The purpose of this section is to develop the main component of the report where we discuss and describe any exploratory data analysis, and what machine learnings were used.
<ul style="list-style-type:none;">
   <li>3.1 Download and Explore Dataset</li> 
   <li>3.2 Explore Neighborhoods in Louisville Ky</li>
   <li>3.3 Analyze Each Neighborhoods</li>
   <li>3.4 Cluster Neighborhoods</li>
</ul>

Before getting data and start Exploring, it's necessary to Download all the packages   

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
get_ipython().system("pip install geopy")
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#install geJsonio package
get_ipython().system('pip install geojsonio')
import geojsonio

#install geopandas
get_ipython().system('pip install geopandas')
import geopandas as gpd
# Matplotlib and associated plotting modules
# use the inline backend to generate the plots within the browser
%matplotlib inline 
import matplotlib.cm as cm
import matplotlib.colors as colors
# Matplotlib and associated plotting modules


# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
#!pip install folium
import folium # map rendering library


print('Libraries imported.')


Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/f7/a4/e66aafbefcbb717813bf3a355c8c4fc3ed04ea1dd7feb2920f2f4f868921/geopandas-0.8.1-py2.py3-none-any.whl (962kB)
[K     |████████████████████████████████| 972kB 8.6MB/s eta 0:00:01
[?25hCollecting pyproj>=2.2.0 (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/e5/c3/071e080230ac4b6c64f1a2e2f9161c9737a2bc7b683d2c90b024825000c0/pyproj-2.6.1.post1-cp36-cp36m-manylinux2010_x86_64.whl (10.9MB)
[K     |████████████████████████████████| 10.9MB 20.4MB/s eta 0:00:01
Collecting shapely (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/9d/18/557d4f55453fe00f59807b111cc7b39ce53594e13ada88e16738fb4ff7fb/Shapely-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
[K     |████████████████████████████████| 1.0MB 36.7MB/s eta 0:00:01
[?25hCollecting fiona (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/36/8b/e8b2c11bed5373c8e98edb85ce891b09aa

### 3.1 Download and Explore Dataset 
Fortunately the data can be collected from the Luisville project (LOJIC) of open Geospatial data.<a href='https://data.lojic.org/datasets/louisville-ky-urban-neighborhoods?geometry=-86.627%2C37.979%2C-84.834%2C38.357'>Louisville(LOJIC) Open GeoSpatial Data</a>


In [25]:
louisville_df = gpd.read_file('https://opendata.arcgis.com/datasets/6e3dea8bd9cf49e6a764f7baa9141a95_30.geojson')
louisville_df.head()


Unnamed: 0,OBJECTID,NH_CODE,NH_NAME,SHAPEAREA,SHAPELEN,geometry
0,1,99,REMAINDER OF CITY,25583330.0,23718.32369,"POLYGON ((-85.70282 38.27046, -85.70268 38.270..."
1,2,53,PORTLAND,70098300.0,43851.334658,"POLYGON ((-85.82022 38.27686, -85.81992 38.276..."
2,3,62,SHAWNEE,59953850.0,34088.057055,"POLYGON ((-85.82022 38.27686, -85.82028 38.276..."
3,4,12,BROWNSBORO ZORN,21977320.0,28060.141437,"POLYGON ((-85.70282 38.27046, -85.70278 38.270..."
4,5,23,CLIFTON HEIGHTS,17858640.0,19464.933453,"POLYGON ((-85.71501 38.26404, -85.71394 38.263..."


#### Data Preparation and preprocessing

In [27]:
louisville_N = louisville_df[['NH_CODE', 'NH_NAME', 'geometry']].copy()
louisville_N.columns = ('Id', 'Neighborhood', 'Geometry')
louisville_N['Neighborhood'] = louisville_N['Neighborhood'].str.title()
louisville_N.tail()

Unnamed: 0,Id,Neighborhood,Geometry
87,40,Iroquois,"POLYGON ((-85.78123 38.16980, -85.78121 38.169..."
88,66,Southland Park,"POLYGON ((-85.75183 38.17000, -85.75189 38.169..."
89,43,Kenwood Hill,"POLYGON ((-85.77918 38.16163, -85.77915 38.161..."
90,2,Auburndale,"POLYGON ((-85.79126 38.15293, -85.79130 38.152..."
91,99,Remainder Of City,"POLYGON ((-85.77631 38.15065, -85.77630 38.150..."


In [29]:
#Extract the Longitude and Latitude
from shapely.wkt import loads as load_wkt

id_list = []

for polygon in louisville_df["geometry"]:
    box_str = str(polygon)
    p1 = load_wkt(box_str)
    point = p1.centroid
#     print(type(p1.centroid.x))
#     print(p1.centroid.y)
    id_list.append((p1.centroid.y, p1.centroid.x))

lat_centr, lon_centr = zip(*id_list)

louisville_N['Latitude'] = lat_centr
louisville_N['Longitude'] = lon_centr
louisville_N.head()

Unnamed: 0,Id,Neighborhood,Geometry,Latitude,Longitude
0,99,Remainder Of City,"POLYGON ((-85.70282 38.27046, -85.70268 38.270...",38.272222,-85.708712
1,53,Portland,"POLYGON ((-85.82022 38.27686, -85.81992 38.276...",38.268303,-85.792998
2,62,Shawnee,"POLYGON ((-85.82022 38.27686, -85.82028 38.276...",38.261047,-85.818672
3,12,Brownsboro Zorn,"POLYGON ((-85.70282 38.27046, -85.70278 38.270...",38.26663,-85.688639
4,23,Clifton Heights,"POLYGON ((-85.71501 38.26404, -85.71394 38.263...",38.263401,-85.704026


### 3.2 Explore Neighborhoods in Louisville KY
To explore the data, I will use the "Folium" python Library to create a map of the Louisville City 

In [31]:
address = 'Louisville, KY'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The Geographical Coordinate of Louisville are {}, {}.'.format(latitude, longitude))

The Geographical Coordinate of Louisville are 38.2542376, -85.759407.


It is important to create a visualization to better understand the area 

In [34]:
louisville_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(louisville_N['Latitude'], 
                                           louisville_N['Longitude'], 
                                           louisville_N['Id'], 
                                           louisville_N['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(louisville_map)  
    
louisville_map

Another Map with the GeoJson Data 

In [36]:
lmap = folium.Map([latitude, longitude], zoom_start=12)

folium.GeoJson(louisville_df,
    style_function=lambda x: {
        'color' : 'red',
        'opacity': 0.6,
        'fillColor' : 'green',
        }).add_to(lmap)

lmap

Next, I am going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [47]:
CLIENT_ID = '5OMA151KURBLFP0AZNN3AQICIDDTW5OBILH5PFLNKIZ2CYCO' # your Foursquare ID
CLIENT_SECRET = 'KBC0MZ10HDABVB04EDP4GARODRYVR4MFEENH2AUROHGCPLTR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100

print('My credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentials:
CLIENT_ID: 5OMA151KURBLFP0AZNN3AQICIDDTW5OBILH5PFLNKIZ2CYCO
CLIENT_SECRET:KBC0MZ10HDABVB04EDP4GARODRYVR4MFEENH2AUROHGCPLTR


#### Now I explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [48]:
louisville_N.loc[0, 'Neighborhood']

'Remainder Of City'

Get the neighborhood's latitude and longitude values.

In [49]:
neighborhood_latitude = louisville_N.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = louisville_N.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = louisville_N.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Remainder Of City are 38.27222193306387, -85.7087118431899.


#### Now, let's get venues in each Neighborhood  using a function.

In [50]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now I run the above function on each neighborhood and create a new dataframe called *louisville_venues*.

In [53]:
# type your answer here

louisville_venues = getNearbyVenues(names=louisville_N['Neighborhood'],
                                   latitudes=louisville_N['Latitude'],
                                   longitudes=louisville_N['Longitude']
                                  )

Remainder Of City
Portland
Shawnee
Brownsboro Zorn
Clifton Heights
Butchertown
Crescent Hill
Central Business District
Russell
Clifton
Irish Hill
Remainder Of City
Phoenix Hill
Rockcreek Lexington Road
Chickasaw
Parkland
California
Cherokee Triangle
Remainder Of City
Cherokee Gardens
Smoketown Jackson
Old Louisville
Cherokee Seneca
Paristown Pointe
Highlands
Limerick
Bowman
Remainder Of City
Germantown
Park Hill
Tyler Park
Shelby Park
Bonnycastle
Park Duvalle
Remainder Of City
Hikes Point
Highlands Douglass
Deer Park
Algonquin
 
Schnitzelburg
Merriwether
Remainder Of City
 
University
Hallmark
Poplar Level
Avondale Melbourne Heights
 
Belknap
 
Hawthorne
Saint Joseph
Taylor Berry
 
Remainder Of City
Bon Air
Audubon
 
Gardiner Lane
Hayfield Dundee
South Louisville
 
Klondike
 
Bashford Manor
Remainder Of City
Fairgrounds
Camp Taylor
Remainder Of City
Wilder Park
Prestonia
Wyandotte/Oakdale
Jacobs
Remainder Of City
Highland Park
Remainder Of City
Remainder Of City
Beechmont
Prestonia
Haz

In [54]:
print(louisville_venues.shape)
louisville_venues.head()

(655, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Remainder Of City,38.272222,-85.708712,Munson Business Interiors,38.273943,-85.708248,Home Service
1,Remainder Of City,38.272222,-85.708712,Tumbleweed Southwest Grill,38.27273,-85.708468,American Restaurant
2,Remainder Of City,38.272222,-85.708712,Champion Soccer Park,38.269385,-85.708292,Soccer Field
3,Remainder Of City,38.272222,-85.708712,Heuser Clinic,38.270264,-85.713112,Gym
4,Remainder Of City,38.272222,-85.708712,ProFormance Fitness & Training,38.270307,-85.713204,Gym


Now i check how many venues were returned for each neighborhood

In [55]:
louisville_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
,46,46,46,46,46,46
Algonquin,4,4,4,4,4,4
Auburndale,3,3,3,3,3,3
Audubon,1,1,1,1,1,1
Avondale Melbourne Heights,2,2,2,2,2,2
Beechmont,6,6,6,6,6,6
Belknap,4,4,4,4,4,4
Bon Air,2,2,2,2,2,2
Bonnycastle,4,4,4,4,4,4
Bowman,5,5,5,5,5,5


#### Let's find out how many unique categories can be curated from all the returned venues

In [56]:
print('There are {} uniques categories.'.format(len(louisville_venues['Venue Category'].unique())))

There are 169 uniques categories.


### 3.3 Analyze each Neighborhood in Louisville Kentucky

In [57]:
# one hot encoding
louisville_onehot = pd.get_dummies(louisville_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
louisville_onehot['Neighborhood'] = louisville_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [louisville_onehot.columns[-1]] + list(louisville_onehot.columns[:-1])
louisville_onehot = louisville_onehot[fixed_columns]

louisville_onehot.head()

Unnamed: 0,Neighborhood,ATM,Advertising Agency,African Restaurant,Airport,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Garden,Big Box Store,Bike Rental / Bike Share,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Burrito Place,Bus Station,Business Service,Café,Cemetery,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Construction & Landscaping,Convenience Store,Cosmetics Shop,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Dry Cleaner,Event Service,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Historic Site,History Museum,Home Service,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Insurance Office,Italian Restaurant,Japanese Restaurant,Jewelry Store,Karaoke Bar,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motel,Multiplex,Museum,Music Store,Music Venue,Nature Preserve,New American Restaurant,Nightclub,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Piano Bar,Pizza Place,Playground,Plaza,Pool,Print Shop,Pub,Racetrack,Record Shop,Rental Car Location,Restaurant,Road,Rock Climbing Spot,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Club,Steakhouse,Storage Facility,Supermarket,Sushi Restaurant,Tennis Court,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Water Park,Whisky Bar,Women's Store,Yoga Studio
0,Remainder Of City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Remainder Of City,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Remainder Of City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Remainder Of City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Remainder Of City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [58]:
louisville_onehot.shape

(655, 170)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [59]:
louisville_grouped = louisville_onehot.groupby('Neighborhood').mean().reset_index()
louisville_grouped

Unnamed: 0,Neighborhood,ATM,Advertising Agency,African Restaurant,Airport,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Garden,Big Box Store,Bike Rental / Bike Share,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Burrito Place,Bus Station,Business Service,Café,Cemetery,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Construction & Landscaping,Convenience Store,Cosmetics Shop,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Dry Cleaner,Event Service,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Historic Site,History Museum,Home Service,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Insurance Office,Italian Restaurant,Japanese Restaurant,Jewelry Store,Karaoke Bar,Lake,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motel,Multiplex,Museum,Music Store,Music Venue,Nature Preserve,New American Restaurant,Nightclub,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Piano Bar,Pizza Place,Playground,Plaza,Pool,Print Shop,Pub,Racetrack,Record Shop,Rental Car Location,Restaurant,Road,Rock Climbing Spot,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Club,Steakhouse,Storage Facility,Supermarket,Sushi Restaurant,Tennis Court,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Water Park,Whisky Bar,Women's Store,Yoga Studio
0,,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.021739,0.0,0.021739,0.021739,0.0,0.0,0.0,0.021739,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.021739,0.021739,0.0,0.0,0.021739,0.0,0.0,0.021739,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.021739,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.065217,0.0,0.108696,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.021739,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.021739,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Algonquin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Auburndale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Audubon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Avondale Melbourne Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Beechmont,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0
6,Belknap,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bon Air,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bonnycastle,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Bowman,0.0,0.0,0.0,0.4,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [60]:
louisville_grouped.shape

(65, 170)

Now, I print each neighborhood along with the top 5 most common venues

In [61]:
num_top_venues = 5

for hood in louisville_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = louisville_grouped[louisville_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- ----
               venue  freq
0        Pizza Place  0.11
1           Pharmacy  0.07
2               Bank  0.04
3     Ice Cream Shop  0.04
4  Convenience Store  0.04


----Algonquin----
                  venue  freq
0  Fast Food Restaurant  0.50
1     Convenience Store  0.25
2           Auto Garage  0.25
3            Restaurant  0.00
4           Pizza Place  0.00


----Auburndale----
                           venue  freq
0                        Daycare  0.33
1     Construction & Landscaping  0.33
2                           Café  0.33
3                           Pool  0.00
4  Paper / Office Supplies Store  0.00


----Audubon----
                     venue  freq
0                     Park   1.0
1                      ATM   0.0
2               Playground   0.0
3  New American Restaurant   0.0
4                Nightclub   0.0


----Avondale Melbourne Heights----
               venue  freq
0         Donut Shop   0.5
1  Convenience Store   0.5
2                ATM   0.0
3         Pl

#### I put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [62]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [63]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = louisville_grouped['Neighborhood']

for ind in np.arange(louisville_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(louisville_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,,Pizza Place,Pharmacy,Sandwich Place,Ice Cream Shop,Bank,Deli / Bodega,Convenience Store,Bridal Shop,Business Service,Café
1,Algonquin,Fast Food Restaurant,Auto Garage,Convenience Store,Daycare,Deli / Bodega,Farmers Market,Farm,Falafel Restaurant,Event Service,Dry Cleaner
2,Auburndale,Daycare,Café,Construction & Landscaping,Yoga Studio,Doctor's Office,Farmers Market,Farm,Falafel Restaurant,Event Service,Dry Cleaner
3,Audubon,Park,Yoga Studio,Doctor's Office,Farmers Market,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop,Dog Run
4,Avondale Melbourne Heights,Convenience Store,Donut Shop,Yoga Studio,Dive Bar,Farmers Market,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Dog Run


### 3.4 Cluster Neighborhoods

To analyze which neighborhood of Louisville is good to open a new restaurant, I will use a K-means clustering: a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity.

In [74]:
# set number of clusters
kclusters = 5

louisville_grouped_clustering = louisville_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(louisville_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 3, 0, 3, 3, 3, 2, 3, 3], dtype=int32)

Next I create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [84]:
# add clustering labels

louisville_merged = louisville_N

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
louisville_merged = louisville_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

louisville_merged.head() # check the last columns!

Unnamed: 0,Id,Neighborhood,Geometry,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,99,Remainder Of City,"POLYGON ((-85.70282 38.27046, -85.70268 38.270...",38.272222,-85.708712,3.0,Pizza Place,Sandwich Place,Fast Food Restaurant,Bank,Pharmacy,Bar,Furniture / Home Store,Gas Station,Automotive Shop,Chinese Restaurant
1,53,Portland,"POLYGON ((-85.82022 38.27686, -85.81992 38.276...",38.268303,-85.792998,1.0,Pizza Place,Clothing Store,Bank,Grocery Store,Dive Bar,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop
2,62,Shawnee,"POLYGON ((-85.82022 38.27686, -85.82028 38.276...",38.261047,-85.818672,3.0,Sports Club,Sandwich Place,Food,Restaurant,Cosmetics Shop,Doctor's Office,Farm,Falafel Restaurant,Event Service,Dry Cleaner
3,12,Brownsboro Zorn,"POLYGON ((-85.70282 38.27046, -85.70278 38.270...",38.26663,-85.688639,3.0,Gym / Fitness Center,Yoga Studio,Discount Store,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop,Dog Run,Doctor's Office
4,23,Clifton Heights,"POLYGON ((-85.71501 38.26404, -85.71394 38.263...",38.263401,-85.704026,3.0,Convenience Store,Bar,Video Store,Art Gallery,Doctor's Office,Farmers Market,Farm,Falafel Restaurant,Event Service,Dry Cleaner


Finally, let's visualize the resulting clusters

In [91]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(louisville_merged['Latitude'], louisville_merged['Longitude'], louisville_merged['Neighborhood'], louisville_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 4. Examine Cluster

Now, I examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories

#### Cluster 1

In [92]:
louisville_merged.loc[louisville_merged['Cluster Labels'] == 0, louisville_merged.columns[[1] + list(range(5, louisville_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,Audubon,0.0,Park,Yoga Studio,Doctor's Office,Farmers Market,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop,Dog Run
63,Klondike,0.0,Park,Construction & Landscaping,Yoga Studio,Doctor's Office,Farmers Market,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop
89,Kenwood Hill,0.0,Construction & Landscaping,Historic Site,Discount Store,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop,Dog Run,Doctor's Office


#### Cluster 2

In [93]:
louisville_merged.loc[louisville_merged['Cluster Labels'] == 1, louisville_merged.columns[[1] + list(range(5, louisville_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Portland,1.0,Pizza Place,Clothing Store,Bank,Grocery Store,Dive Bar,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop
8,Russell,1.0,Pizza Place,Liquor Store,Dive Bar,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop,Dog Run,Doctor's Office
35,Hikes Point,1.0,Pizza Place,Ice Cream Shop,Bank,Discount Store,Dive Bar,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop
41,Merriwether,1.0,Pizza Place,Gas Station,Convenience Store,Karaoke Bar,Toy / Game Store,Gastropub,Ice Cream Shop,Department Store,Dive Bar,Falafel Restaurant
51,Hawthorne,1.0,Sushi Restaurant,Pizza Place,Yoga Studio,Dive Bar,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop,Dog Run
85,Edgewood,1.0,Ice Cream Shop,Yoga Studio,Dive Bar,Farmers Market,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop,Dog Run


#### Cluster 3

In [94]:
louisville_merged.loc[louisville_merged['Cluster Labels'] == 2, louisville_merged.columns[[1] + list(range(5, louisville_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,Germantown,2.0,Bar,Coffee Shop,Thrift / Vintage Store,Record Shop,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop,Dog Run
56,Bon Air,2.0,Bar,Discount Store,Yoga Studio,Doctor's Office,Farmers Market,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop
68,Camp Taylor,2.0,Bar,Bank,Yoga Studio,Fish & Chips Shop,Farmers Market,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop


#### Cluster 4

In [95]:
louisville_merged.loc[louisville_merged['Cluster Labels'] == 3, louisville_merged.columns[[1] + list(range(5, louisville_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Remainder Of City,3.0,Pizza Place,Sandwich Place,Fast Food Restaurant,Bank,Pharmacy,Bar,Furniture / Home Store,Gas Station,Automotive Shop,Chinese Restaurant
2,Shawnee,3.0,Sports Club,Sandwich Place,Food,Restaurant,Cosmetics Shop,Doctor's Office,Farm,Falafel Restaurant,Event Service,Dry Cleaner
3,Brownsboro Zorn,3.0,Gym / Fitness Center,Yoga Studio,Discount Store,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop,Dog Run,Doctor's Office
4,Clifton Heights,3.0,Convenience Store,Bar,Video Store,Art Gallery,Doctor's Office,Farmers Market,Farm,Falafel Restaurant,Event Service,Dry Cleaner
5,Butchertown,3.0,Park,Gay Bar,New American Restaurant,Food Truck,Shopping Mall,Soccer Stadium,Speakeasy,Athletics & Sports,Dessert Shop,Pizza Place
6,Crescent Hill,3.0,Boutique,Bakery,Lake,Music Store,Yoga Studio,Dog Run,Farmers Market,Farm,Falafel Restaurant,Event Service
7,Central Business District,3.0,Hotel,American Restaurant,Coffee Shop,Speakeasy,Restaurant,Sandwich Place,Mexican Restaurant,Pizza Place,Fast Food Restaurant,Whisky Bar
9,Clifton,3.0,Arts & Crafts Store,Asian Restaurant,Bakery,Sushi Restaurant,Sporting Goods Shop,New American Restaurant,Mediterranean Restaurant,Italian Restaurant,Gastropub,Sandwich Place
10,Irish Hill,3.0,Coffee Shop,American Restaurant,Park,Bagel Shop,Gastropub,Gas Station,Spa,Department Store,Farm,Falafel Restaurant
11,Remainder Of City,3.0,Pizza Place,Sandwich Place,Fast Food Restaurant,Bank,Pharmacy,Bar,Furniture / Home Store,Gas Station,Automotive Shop,Chinese Restaurant


#### Cluster 5

In [96]:
louisville_merged.loc[louisville_merged['Cluster Labels'] == 4, louisville_merged.columns[[1] + list(range(5, louisville_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
60,Hayfield Dundee,4.0,Skating Rink,Yoga Studio,Farm,Falafel Restaurant,Event Service,Dry Cleaner,Donut Shop,Dog Run,Doctor's Office,Dive Bar


### 5. Conclusion

As the analysis is performed on small set of data, we can achieve better results by increasing the neighborhood information. Anyway Louisville is a city with many different types of new restaurant business. According with the last ideas I think we have gone through the process of identifying information for the business problem, specifying the data required, clean the datasets, performing a machine learning algorithm using k-means clustering and providing some useful tips to our stakeholder.