## Access to retail and home-goods in Detroit, MI
#### A final project using skills learned in the IBM Data Science Professional Certificate program  on Coursera.

<b>Defining the issue and stakeholders:</b>  For 18 years I worked in the non-profit sector in Detroit, MI while living on the near eastside in the city proper.

One of the issues that some non-profit groups in Detroit address is the presence of food deserts.  The wiki page for ‘food desert’ (https://en.wikipedia.org/wiki/Food_desert) provides an excellent and detailed description of this concept while the opening paragraph offers a succinct definition:  <i>“A food desert is an area that has limited access to affordable and nutritious food,in contrast with an area with higher access to supermarkets or vegetable shops with fresh foods, which is called a food oasis.”</i>

The issue of food deserts has already received a large amount of attention, research, and data mapping and so I will not be looking at that issue but by way of the food desert topic, I will transition to a similar issue that has received less attention:  The lack of access to other retail.

In my particular situation I was a homeowner of a 100 year old home that required quite a few supplies for maintenance and upkeep.  In addition, I am a gardener and this hobby also required supplies.  In both cases, there were minimal local options and a typical supply trip involved driving a minimum of 16 miles round-trip and a minimum of 45 minutes drive-time depending on traffic conditions.

With this experience as the motivator, the purpose of this project is to map location based access to hardware, homegoods, and bigbox retail shops that provide needed supplies for home maintenance and daily living.  After the initial mapping, use the k-means method taught in this course to group similar neighborhoods.

The final outcome is to provide an initial assessment of where there may be a need for non-profits to explore how to provide these supplies to under-served communities.

<b> Description of the Desired Data and Data Collection</b>:  Following the basic model as developed in this course, use the Four Square API to retrieve neighborhood specific data using desired venue categories as defined in the Foursquare Venue Category Hierarchy. (https://developer.foursquare.com/docs/build-with-foursquare/categories/) The api request will use the 'search?' end point and include the 'categoryId=' option.

<b>Collecting geo-coded information:</b>
1. The city of Detroit has identified neighborhoods and provides an online map. https://detroitmi.gov/webapp/interactive-district-map
2. Using this map, I cross referenced two other websites: https://www.zipmap.net/Michigan/Wayne_County/ and https://www.latlong.net/ . These two sites provide a zipcode map of Detroit and the latitude and longitude of any location by way of dropping a pin.
3. The result of step 2 is a csv file of zipcode, neighborhood name, latitude and longitude. https://docs.google.com/spreadsheets/d/e/2PACX-1vQnWbnmwrC1bM1MLRu3SgFc_9UJZzBuo0lZV34YvkeZGuPhdrDb_3AP-BDp3d-6-lSKAu0m5rJaSTH6/pub?gid=590577091&single=true&output=csv

<b>Venue Categories:</b><p>
Big Box Store, 52f2ab2ebcbc57f1066b8b42 (to include Home Depot, Lowes, and similar)\
Department Store, 4bf58dd8d48988d1f6941735 (to include Target and similar)\
Discount Store, 52dea92d3cf9994f4e043dbb (to include Dollar stores)\
Furniture / Home Store, 4bf58dd8d48988d1f8941735\
Garden Center, 4eb1c0253b7b52c0e1adc2e9\
Hardware Store, 4bf58dd8d48988d112951735\
Kitchen Supply Store, 58daa1558bbb0b01f18ec1b4\
Pharmacy, 4bf58dd8d48988d10f951735\
Supermarket, 52f2ab2ebcbc57f1066b8b46 (to include Meijer and Walmart)\
Warehouse Store, 52e816a6bcbc57f1066b7a54 (to include Costco and Sams Club)<p>
    
<b>Outcome:</b> Produce k-means analysis and a map that clusters neighborhoods by access to the above listed categories.

In [1]:
import numpy as np
import pandas as pd
import json
import requests
import pickle
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
!pip install folium
import folium # map rendering library

print('Libraries imported.')

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 14.6MB/s eta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/81/6d/31c83485189a2521a75b4130f1fee5364f772a0375f81afff619004e5237/branca-0.4.0-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.0 folium-0.10.1
Libraries imported.


In [2]:
#Full data set
detroit='https://docs.google.com/spreadsheets/d/e/2PACX-1vTYHE9bz6bQ7t-0AHC4Fr4Gus53JvoSYsH3coJGoX5F1akJSlw8Yp6eaDlElZioS2J16aUuMA0tdqx5/pub?gid=61988180&single=true&output=csv'

### An initial look at the Detroit neighborhoods dataset.

In [3]:
detroit_data=pd.read_csv(detroit)
print(detroit_data.shape)
detroit_data.head()

(200, 4)


Unnamed: 0,zipcode,neighborhood,latitude,longitude
0,48201,Brewster-Douglas,42.347507,-83.048334
1,48201,Brush Park,42.343733,-83.052593
2,48201,Jeffries,42.345604,-83.070811
3,48201,Midtown,42.351567,-83.061879
4,48201,Medical Center,42.354643,-83.057265


### Retrieve venue data from Foursquare

#### Read the category id numbers to 'categories'.

In [5]:
categories='4bf58dd8d48988d1f6941735,4bf58dd8d48988d1f8941735,4eb1c0253b7b52c0e1adc2e9,4bf58dd8d48988d112951735,58daa1558bbb0b01f18ec1b4,52e816a6bcbc57f1066b7a54,52dea92d3cf9994f4e043dbb,52f2ab2ebcbc57f1066b8b46,52f2ab2ebcbc57f1066b8b42,4bf58dd8d48988d10f951735'
categories

'4bf58dd8d48988d1f6941735,4bf58dd8d48988d1f8941735,4eb1c0253b7b52c0e1adc2e9,4bf58dd8d48988d112951735,58daa1558bbb0b01f18ec1b4,52e816a6bcbc57f1066b7a54,52dea92d3cf9994f4e043dbb,52f2ab2ebcbc57f1066b8b46,52f2ab2ebcbc57f1066b8b42,4bf58dd8d48988d10f951735'

#### Create a function that extracts the category of the venue.

In [6]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### Set variables for the api request.

In [8]:
CLIENT_ID = 'XKKGOA14HPO11BCZG4RNS21QTYT5RTKAANXIMQU23XOTGNT2' # Foursquare ID
CLIENT_SECRET = 'XBBACRJQBXOP12GESKOLCC3HLFKNUIFINTO4D0ACO0DNPDDH' # Foursquare Secret
VERSION = '20200424' # Foursquare API version
LIMIT = 100 # XKKGOA14HPO11BCZG4RNS21QTYT5RTKAANXIMQU23XOTGNT2limit of number of venues returned by Foursquare API

#### Create a function that searches all the neighborhoods.

In [9]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&categoryId={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            categories,
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Perform the search and assign the results to 'detroit_list'.

In [10]:
detroit_list = getNearbyVenues(names=detroit_data['neighborhood'],
                                   latitudes=detroit_data['latitude'],
                                   longitudes=detroit_data['longitude']
                                  )

Brewster-Douglas
Brush Park
Jeffries
Midtown
Medical Center
Arden Park
Cultural Center
Gateway
Milwaukee Junction
New Center
Piety Hill
Tech Town
Wayne State
Greenfield Park
Grixdale Farms
Hawthorne Park
Nolan
Palmer Park
Palmer Woods
Penrose
State Fair
Aviation
Barton-McFarland
Midwest
Nardin Park
Petoskey-Otsego
Russell Woods
Conner Creek
Franklin
Gratiot-Findlay
LaSalle-College Park
Maple Ridge
Mohican-Regent
Regent Park
Pulaski
Von Stueben
Atkinson
Boston-Edison
Dexter-Linwood
Henry Ford
Herman Kiefer
Jamison
LaSalle Gardens
North LaSalle
Virginia Park
West Virginia Park
Wildemere Park
Eastern Market
Elmwood Park
Forest Park
Lafayette Park
McDougall-Hunt
Rivertown
Core City
Elijah McCoy
NW Goldberg
Woodbridge
Carbon Works
Delray
Hubbard Farms
Springwells
Southwest
West Side Industrial
Chadsey-Condon
Claytown
Michigan-Martin
Southwest
Northend
Poletown East
Russell Industrial
Banglatown
Buffalo Charles
Cadillac Heights
Davison
North Campau
Chandler Park
Chalmers
Eden Gardens
Gratiot

### Work with the data.

Take an initial view of the data to confirm structure.

In [11]:
print(detroit_list.shape)
detroit_list.head()

(204, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Brush Park,42.343733,-83.052593,PharMor Pharmacy,42.345122,-83.057125,Pharmacy
1,Jeffries,42.345604,-83.070811,Family Dollar,42.341389,-83.072208,Discount Store
2,Jeffries,42.345604,-83.070811,Third Ave Hardware,42.34564,-83.066366,Hardware Store
3,Midtown,42.351567,-83.061879,Hugh,42.351198,-83.063894,Furniture / Home Store
4,Midtown,42.351567,-83.061879,Nest,42.351439,-83.065846,Furniture / Home Store


Examine the full set.

In [12]:
detroit_list

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Brush Park,42.343733,-83.052593,PharMor Pharmacy,42.345122,-83.057125,Pharmacy
1,Jeffries,42.345604,-83.070811,Family Dollar,42.341389,-83.072208,Discount Store
2,Jeffries,42.345604,-83.070811,Third Ave Hardware,42.345640,-83.066366,Hardware Store
3,Midtown,42.351567,-83.061879,Hugh,42.351198,-83.063894,Furniture / Home Store
4,Midtown,42.351567,-83.061879,Nest,42.351439,-83.065846,Furniture / Home Store
5,Midtown,42.351567,-83.061879,Walgreens,42.353604,-83.062587,Pharmacy
6,Midtown,42.351567,-83.061879,Rite Aid,42.354151,-83.062021,Pharmacy
7,Midtown,42.351567,-83.061879,Community Pharmacy,42.353604,-83.062587,Pharmacy
8,Midtown,42.351567,-83.061879,The Stove,42.354027,-83.062291,Furniture / Home Store
9,Midtown,42.351567,-83.061879,Avanced Plumbing,42.347915,-83.061558,Hardware Store


#### There are duplicates so clean the data set.

In [13]:
detroit_cleanlist=pd.DataFrame.drop_duplicates(detroit_list,subset='Venue Latitude')
print(detroit_cleanlist.shape)

(185, 7)


#### There are many inaccurate venues.

1. Displayed a section of the dataframe using detroit_cleanlist[n:n1] where n and n1 are sections.\
Example:  detroit_cleanlist[0:50]
2. Copied and pasted the entire dateframe into a spreadsheet.
3. Reviewed the full set and created a new csv.
4. Uploaded the csv to google drive.

In [37]:
detroit_cleanlist='https://docs.google.com/spreadsheets/d/e/2PACX-1vQU0PNVD-L74PpeGY0dbwceezY_08XPN9mFTIFX4CbPdvA1nZrh64gf2VFYScK8mJdJZOeAkNBUe_HV/pub?gid=1717123301&single=true&output=csv'
detroit_cleanlist=pd.read_csv(detroit_cleanlist)

In [61]:
print(detroit_cleanlist.shape)

(80, 7)


### Create a map with all Detroit neighborhoods and venues.

In [39]:
address = 'Detroit, MI'
geolocator = Nominatim(user_agent="detroit_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Detroit using latitude and longitude values
map_detroit_final = folium.Map(location=[latitude, longitude], zoom_start=11)

# add neighborhood markers to map
for neighborhood, latitude, longitude in zip(detroit_data['neighborhood'],detroit_data['latitude'], detroit_data['longitude']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_detroit_final)   

# add venue markers to map
for venue, latitude, longitude in zip(detroit_cleanlist['Venue'],detroit_cleanlist['Venue Latitude'], detroit_cleanlist['Venue Longitude']):
    label = '{}'.format(venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='pink',
        fill_opacity=0.9,
        parse_html=False).add_to(map_detroit_final) 
map_detroit_final

### There are large areas with few venues so perform another run using 1km radius.

In [40]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&categoryId={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            categories,
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [42]:
detroit1km_list = getNearbyVenues(names=detroit_data['neighborhood'],
                                   latitudes=detroit_data['latitude'],
                                   longitudes=detroit_data['longitude']
                                  )

Brewster-Douglas
Brush Park
Jeffries
Midtown
Medical Center
Arden Park
Cultural Center
Gateway
Milwaukee Junction
New Center
Piety Hill
Tech Town
Wayne State
Greenfield Park
Grixdale Farms
Hawthorne Park
Nolan
Palmer Park
Palmer Woods
Penrose
State Fair
Aviation
Barton-McFarland
Midwest
Nardin Park
Petoskey-Otsego
Russell Woods
Conner Creek
Franklin
Gratiot-Findlay
LaSalle-College Park
Maple Ridge
Mohican-Regent
Regent Park
Pulaski
Von Stueben
Atkinson
Boston-Edison
Dexter-Linwood
Henry Ford
Herman Kiefer
Jamison
LaSalle Gardens
North LaSalle
Virginia Park
West Virginia Park
Wildemere Park
Eastern Market
Elmwood Park
Forest Park
Lafayette Park
McDougall-Hunt
Rivertown
Core City
Elijah McCoy
NW Goldberg
Woodbridge
Carbon Works
Delray
Hubbard Farms
Springwells
Southwest
West Side Industrial
Chadsey-Condon
Claytown
Michigan-Martin
Southwest
Northend
Poletown East
Russell Industrial
Banglatown
Buffalo Charles
Cadillac Heights
Davison
North Campau
Chandler Park
Chalmers
Eden Gardens
Gratiot

In [43]:
detroit1km_list.shape

(797, 7)

An increase of 593 venus. (797-204=593)

<h4>Remove duplicates.</h4>

In [56]:
detroit1km_cleanlist=pd.DataFrame.drop_duplicates(detroit1km_list,subset='Venue Latitude')
print(detroit1km_cleanlist.shape)

(413, 7)


An increase of 228 venues. (413-185=228)

In [60]:
detroit1km_cleanlist='https://docs.google.com/spreadsheets/d/e/2PACX-1vStA22FoAy05_JCSW_BR7ffIk9TStds91axK5-UWcGFgEYFcywX6O6JyFWl9uMn6ExsOL0DgLStEqbA/pub?gid=1535023124&single=true&output=csv'
detroit1km_cleanlist=pd.read_csv(detroit1km_cleanlist)
detroit1km_cleanlist.shape

(195, 7)

After further cleaning using the spreadsheet.  Original set contained 80.  The new set contains 195.

### Map with the new dataset.

In [62]:
detroit_cleanlist=detroit1km_cleanlist

In [63]:
address = 'Detroit, MI'
geolocator = Nominatim(user_agent="detroit_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Detroit using latitude and longitude values
map_detroit_final = folium.Map(location=[latitude, longitude], zoom_start=11)

# add neighborhood markers to map
for neighborhood, latitude, longitude in zip(detroit_data['neighborhood'],detroit_data['latitude'], detroit_data['longitude']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_detroit_final)   

# add venue markers to map
for venue, latitude, longitude in zip(detroit_cleanlist['Venue'],detroit_cleanlist['Venue Latitude'], detroit_cleanlist['Venue Longitude']):
    label = '{}'.format(venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='pink',
        fill_opacity=0.9,
        parse_html=False).add_to(map_detroit_final) 
map_detroit_final

### This map looks much better, with the empty neighborhood areas now filled with venues. Try one more round using 1.5km radius.

In [64]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&categoryId={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            categories,
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [65]:
detroit1500m_list = getNearbyVenues(names=detroit_data['neighborhood'],
                                   latitudes=detroit_data['latitude'],
                                   longitudes=detroit_data['longitude']
                                  )

Brewster-Douglas
Brush Park
Jeffries
Midtown
Medical Center
Arden Park
Cultural Center
Gateway
Milwaukee Junction
New Center
Piety Hill
Tech Town
Wayne State
Greenfield Park
Grixdale Farms
Hawthorne Park
Nolan
Palmer Park
Palmer Woods
Penrose
State Fair
Aviation
Barton-McFarland
Midwest
Nardin Park
Petoskey-Otsego
Russell Woods
Conner Creek
Franklin
Gratiot-Findlay
LaSalle-College Park
Maple Ridge
Mohican-Regent
Regent Park
Pulaski
Von Stueben
Atkinson
Boston-Edison
Dexter-Linwood
Henry Ford
Herman Kiefer
Jamison
LaSalle Gardens
North LaSalle
Virginia Park
West Virginia Park
Wildemere Park
Eastern Market
Elmwood Park
Forest Park
Lafayette Park
McDougall-Hunt
Rivertown
Core City
Elijah McCoy
NW Goldberg
Woodbridge
Carbon Works
Delray
Hubbard Farms
Springwells
Southwest
West Side Industrial
Chadsey-Condon
Claytown
Michigan-Martin
Southwest
Northend
Poletown East
Russell Industrial
Banglatown
Buffalo Charles
Cadillac Heights
Davison
North Campau
Chandler Park
Chalmers
Eden Gardens
Gratiot

In [68]:
detroit1500m_cleanlist=pd.DataFrame.drop_duplicates(detroit1500m_list,subset='Venue Latitude')
print(detroit1500m_cleanlist.shape)

(513, 7)


Increasing the radius added 100 additional venues.

In [418]:
detroit1500m_cleanlist='https://docs.google.com/spreadsheets/d/e/2PACX-1vS96jFe7KGw5_54niJQFWq8VjaO8CCyUP3qGnkZV5jIXUBkUVZIHgZibc6Lmo4H6B0jelFEr1CMLqtt/pub?gid=387629276&single=true&output=csv'
detroit1500m_cleanlist=pd.read_csv(detroit1500m_cleanlist)
detroit1500m_cleanlist.shape

(238, 7)

Added 43 more venues.

### Map the new dataset.

In [419]:
detroit_cleanlist=detroit1500m_cleanlist

In [415]:
address = 'Detroit, MI'
geolocator = Nominatim(user_agent="detroit_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Detroit using latitude and longitude values
map_detroit_final = folium.Map(location=[latitude, longitude], zoom_start=11)

# add neighborhood markers to map
for neighborhood, latitude, longitude in zip(detroit_data['neighborhood'],detroit_data['latitude'], detroit_data['longitude']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_detroit_final)   

# add venue markers to map
for venue, latitude, longitude in zip(detroit_cleanlist['venue'],detroit_cleanlist['venue latitude'], detroit_cleanlist['venue longitude']):
    label = '{}'.format(venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='pink',
        fill_opacity=0.9,
        parse_html=False).add_to(map_detroit_final) 
map_detroit_final

This is a looks like a decent dataset.  I'm ready to move to clustering.

### Prepare data for k-means clustering.

In [421]:
# one hot encoding
detroit_onehot = pd.get_dummies(detroit_cleanlist[['venue category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
detroit_onehot.insert(0,'neighborhood',detroit_cleanlist['neighborhood'])
detroit_onehot.shape

(238, 7)

In [422]:
mean_grouped = detroit_onehot.groupby('neighborhood').mean().reset_index()
mean_grouped.shape

(92, 7)

#### The dataset has 92 discrete neighborhoods with at least one venue within 1.5km.

### Functions to create a dataframe with neighborhoods and top 5 venues.

In [424]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [462]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
mean_neighborhood_venues_sorted = pd.DataFrame(columns=columns)
mean_neighborhood_venues_sorted['neighborhood'] = mean_grouped['neighborhood']

for ind in np.arange(mean_grouped.shape[0]):
    mean_neighborhood_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mean_grouped.iloc[ind, :], num_top_venues)

print(mean_neighborhood_venues_sorted.shape)
mean_neighborhood_venues_sorted.head()

(92, 6)


Unnamed: 0,neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Arden Park,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
1,Aviation,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
2,Bagley,Discount Store,Pharmacy,Hardware Store,Big Box Store,Garden Center
3,Banglatown,Pharmacy,Discount Store,Hardware Store,Garden Center,Department Store
4,Belmont,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store


### Perform k-means clustering with k=5

In [500]:
mean_grouped = detroit_onehot.groupby('neighborhood').mean().reset_index()
mean_grouped.shape

# set number of clusters
kclusters = 5

detroit_clusters_mean = mean_grouped.drop('neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(detroit_clusters_mean)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_)
mean_neighborhood_venues_sorted=mean_neighborhood_venues_sorted.drop(['Cluster Labels'], axis=1)
mean_neighborhood_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
mean_neighborhood_venues_sorted.head()

[4 2 2 2 4 2 2 4 1 1 1 1 0 3 2 4 4 2 1 4 4 2 0 2 0 4 0 2 2 1 2 4 1 4 2 4 2
 2 4 4 0 4 1 4 4 0 4 4 2 4 2 1 1 2 4 1 0 3 1 0 1 2 4 1 2 1 2 2 1 3 4 2 4 1
 4 4 1 0 4 2 2 2 0 2 2 2 3 2 4 4 0 3]


Unnamed: 0,Cluster Labels,neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,4,Arden Park,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
1,2,Aviation,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
2,2,Bagley,Discount Store,Pharmacy,Hardware Store,Big Box Store,Garden Center
3,2,Banglatown,Pharmacy,Discount Store,Hardware Store,Garden Center,Department Store
4,4,Belmont,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store


In [502]:
mean_merged = detroit_data
mean_merged = mean_merged.join(mean_neighborhood_venues_sorted.set_index('neighborhood'), on='neighborhood')
mean_merged=mean_merged.dropna()
mean_merged=mean_merged.astype({'Cluster Labels':'int32'})
print(mean_merged.shape)
mean_merged.head() # check the last columns!

(92, 10)


Unnamed: 0,zipcode,neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,48201,Brewster-Douglas,42.347507,-83.048334,1,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
1,48201,Brush Park,42.343733,-83.052593,1,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
2,48201,Jeffries,42.345604,-83.070811,2,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
3,48201,Midtown,42.351567,-83.061879,1,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
5,48202,Arden Park,42.38811,-83.07971,4,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store


### Analyze the clusters.

In [499]:
detroit_meancluster01=mean_merged.loc[mean_merged['Cluster Labels'] == 0, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_01=detroit_cleanlist.merge(detroit_meancluster01,on='neighborhood')
cluster_01=cluster_01[['neighborhood','venue','venue category']]
cluster01_count=cluster_01.groupby('venue category').count()
cluster01_count=cluster01_count[['venue']]
cluster01_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Department Store,1
Discount Store,7
Hardware Store,11


In [430]:
detroit_meancluster02=mean_merged.loc[mean_merged['Cluster Labels'] == 1, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_02=detroit_cleanlist.merge(detroit_meancluster02,on='neighborhood')
cluster_02=cluster_02[['neighborhood','venue','venue category']]
cluster02_count=cluster_02.groupby('venue category').count()
cluster02_count=cluster02_count[['venue']]
cluster02_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Discount Store,1
Hardware Store,2
Pharmacy,25


In [431]:
detroit_meancluster03=mean_merged.loc[mean_merged['Cluster Labels'] == 2, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_03=detroit_cleanlist.merge(detroit_meancluster03,on='neighborhood')
cluster_03=cluster_03[['neighborhood','venue','venue category']]
cluster03_count=cluster_03.groupby('venue category').count()
cluster03_count=cluster03_count[['venue']]
cluster03_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Big Box Store,2
Department Store,1
Discount Store,53
Garden Center,3
Hardware Store,14
Pharmacy,50


In [432]:
detroit_meancluster04=mean_merged.loc[mean_merged['Cluster Labels'] == 3, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_04=detroit_cleanlist.merge(detroit_meancluster04,on='neighborhood')
cluster_04=cluster_04[['neighborhood','venue','venue category']]
cluster04_count=cluster_04.groupby('venue category').count()
cluster04_count=cluster04_count[['venue']]
cluster04_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Big Box Store,6
Department Store,1
Discount Store,2


In [433]:
detroit_meancluster05=mean_merged.loc[mean_merged['Cluster Labels'] == 4, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_05=detroit_cleanlist.merge(detroit_meancluster05,on='neighborhood')
cluster_05=cluster_05[['neighborhood','venue','venue category']]
cluster05_count=cluster_05.groupby('venue category').count()
cluster05_count=cluster05_count[['venue']]
cluster05_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Department Store,1
Discount Store,50
Hardware Store,1
Pharmacy,3


In [503]:
mean_merged05=mean_merged

### Perform clustering with k=6.

In [512]:
# set number of clusters
kclusters = 6

detroit_clusters_mean = mean_grouped.drop('neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(detroit_clusters_mean)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([2, 1, 1, 1, 2, 1, 1, 2, 4, 3, 3, 3, 0, 5, 1, 2, 2, 4, 3, 2, 2, 1,
       0, 4, 4, 2, 0, 1, 1, 3, 1, 2, 3, 2, 1, 2, 1, 1, 2, 2, 0, 2, 3, 2,
       2, 0, 2, 2, 1, 2, 1, 3, 3, 1, 2, 3, 0, 5, 3, 0, 3, 4, 2, 3, 1, 3,
       1, 1, 3, 5, 2, 1, 2, 3, 2, 2, 3, 4, 2, 1, 1, 1, 0, 1, 1, 1, 5, 4,
       2, 2, 0, 5], dtype=int32)

In [513]:
mean_neighborhood_venues_sorted=mean_neighborhood_venues_sorted.drop(['Cluster Labels'], axis=1)
mean_neighborhood_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
mean_neighborhood_venues_sorted.head()

Unnamed: 0,Cluster Labels,neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,2,Arden Park,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
1,1,Aviation,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
2,1,Bagley,Discount Store,Pharmacy,Hardware Store,Big Box Store,Garden Center
3,1,Banglatown,Pharmacy,Discount Store,Hardware Store,Garden Center,Department Store
4,2,Belmont,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store


In [514]:
mean_merged = detroit_data

mean_merged = mean_merged.join(mean_neighborhood_venues_sorted.set_index('neighborhood'), on='neighborhood')
mean_merged=mean_merged.dropna()
mean_merged=mean_merged.astype({'Cluster Labels':'int32'})
print(mean_merged.shape)
mean_merged.head()

(92, 10)


Unnamed: 0,zipcode,neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,48201,Brewster-Douglas,42.347507,-83.048334,4,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
1,48201,Brush Park,42.343733,-83.052593,3,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
2,48201,Jeffries,42.345604,-83.070811,1,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
3,48201,Midtown,42.351567,-83.061879,3,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
5,48202,Arden Park,42.38811,-83.07971,2,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store


In [437]:
detroit_meancluster01=mean_merged.loc[mean_merged['Cluster Labels'] == 0, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
detroit_meancluster01=mean_merged.loc[mean_merged['Cluster Labels'] == 0, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_01=detroit_cleanlist.merge(detroit_meancluster01,on='neighborhood')
cluster_01=cluster_01[['neighborhood','venue','venue category']]
cluster01_count=cluster_01.groupby('venue category').count()
cluster01_count=cluster01_count[['venue']]
cluster01_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Department Store,1
Discount Store,7
Hardware Store,11


In [438]:
detroit_meancluster02=mean_merged.loc[mean_merged['Cluster Labels'] == 1, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_02=detroit_cleanlist.merge(detroit_meancluster02,on='neighborhood')
cluster_02=cluster_02[['neighborhood','venue','venue category']]
cluster02_count=cluster_02.groupby('venue category').count()
cluster02_count=cluster02_count[['venue']]
cluster02_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Big Box Store,2
Department Store,1
Discount Store,48
Garden Center,3
Hardware Store,9
Pharmacy,38


In [439]:
detroit_meancluster03=mean_merged.loc[mean_merged['Cluster Labels'] == 2, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_03=detroit_cleanlist.merge(detroit_meancluster03,on='neighborhood')
cluster_03=cluster_03[['neighborhood','venue','venue category']]
cluster03_count=cluster_03.groupby('venue category').count()
cluster03_count=cluster03_count[['venue']]
cluster03_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Department Store,1
Discount Store,50
Hardware Store,1
Pharmacy,3


In [440]:
detroit_meancluster04=mean_merged.loc[mean_merged['Cluster Labels'] == 3, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_04=detroit_cleanlist.merge(detroit_meancluster04,on='neighborhood')
cluster_04=cluster_04[['neighborhood','venue','venue category']]
cluster04_count=cluster_04.groupby('venue category').count()
cluster04_count=cluster04_count[['venue']]
cluster04_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Discount Store,1
Pharmacy,22


In [442]:
detroit_meancluster05=mean_merged.loc[mean_merged['Cluster Labels'] == 4, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_05=detroit_cleanlist.merge(detroit_meancluster05,on='neighborhood')
cluster_05=cluster_05[['neighborhood','venue','venue category']]
cluster05_count=cluster_05.groupby('venue category').count()
cluster05_count=cluster05_count[['venue']]
cluster05_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Discount Store,5
Hardware Store,9
Pharmacy,17


In [443]:
detroit_meancluster06=mean_merged.loc[mean_merged['Cluster Labels'] == 5, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_06=detroit_cleanlist.merge(detroit_meancluster06,on='neighborhood')
cluster_06=cluster_06[['neighborhood','venue','venue category']]
cluster06_count=cluster_06.groupby('venue category').count()
cluster06_count=cluster06_count[['venue']]
cluster06_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Big Box Store,6
Department Store,1
Discount Store,2


In [491]:
detroit_meancluster06

Unnamed: 0,neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Jeffries,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
24,Nardin Park,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
26,Russell Woods,Discount Store,Hardware Store,Pharmacy,Garden Center,Department Store
32,Mohican-Regent,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
47,Eastern Market,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
59,Hubbard Farms,Discount Store,Pharmacy,Garden Center,Hardware Store,Department Store
69,Russell Industrial,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
120,Pembroke,Garden Center,Discount Store,Pharmacy,Hardware Store,Department Store
140,Belmont,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
141,Bethune,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store


In [515]:
meanmerged_06=mean_merged

### Clustering with k=7

In [508]:
# set number of clusters
kclusters = 7

detroit_clusters_mean = mean_grouped.drop('neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(detroit_clusters_mean)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_)
mean_neighborhood_venues_sorted=mean_neighborhood_venues_sorted.drop(['Cluster Labels'], axis=1)
mean_neighborhood_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
mean_neighborhood_venues_sorted.head()

[1 0 0 0 5 0 5 1 6 4 4 4 2 3 0 1 1 0 4 1 1 0 2 0 6 1 2 0 5 0 0 1 4 1 0 5 0
 0 1 1 2 1 4 1 1 2 1 1 5 1 5 4 4 0 1 4 2 3 4 2 4 0 5 4 5 4 0 5 4 3 1 5 1 4
 1 1 4 6 1 0 5 5 2 5 0 0 3 0 1 1 2 3]


Unnamed: 0,Cluster Labels,neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,1,Arden Park,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
1,0,Aviation,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
2,0,Bagley,Discount Store,Pharmacy,Hardware Store,Big Box Store,Garden Center
3,0,Banglatown,Pharmacy,Discount Store,Hardware Store,Garden Center,Department Store
4,5,Belmont,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store


In [509]:
mean_merged = detroit_data
mean_merged = mean_merged.join(mean_neighborhood_venues_sorted.set_index('neighborhood'), on='neighborhood')
mean_merged=mean_merged.dropna()
mean_merged=mean_merged.astype({'Cluster Labels':'int32'})
print(mean_merged.shape)
mean_merged.head()

(92, 10)


Unnamed: 0,zipcode,neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,48201,Brewster-Douglas,42.347507,-83.048334,6,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
1,48201,Brush Park,42.343733,-83.052593,4,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
2,48201,Jeffries,42.345604,-83.070811,5,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
3,48201,Midtown,42.351567,-83.061879,4,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
5,48202,Arden Park,42.38811,-83.07971,1,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store


In [446]:
detroit_meancluster01=mean_merged.loc[mean_merged['Cluster Labels'] == 0, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_01=detroit_cleanlist.merge(detroit_meancluster01,on='neighborhood')
cluster_01=cluster_01[['neighborhood','venue','venue category']]
cluster01_count=cluster_01.groupby('venue category').count()
cluster01_count=cluster01_count[['venue']]
cluster01_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Big Box Store,2
Discount Store,34
Garden Center,1
Hardware Store,10
Pharmacy,44


In [447]:
detroit_meancluster02=mean_merged.loc[mean_merged['Cluster Labels'] == 1, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_02=detroit_cleanlist.merge(detroit_meancluster02,on='neighborhood')
cluster_02=cluster_02[['neighborhood','venue','venue category']]
cluster02_count=cluster_02.groupby('venue category').count()
cluster02_count=cluster02_count[['venue']]
cluster02_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Department Store,1
Discount Store,40
Hardware Store,1


In [448]:
detroit_meancluster03=mean_merged.loc[mean_merged['Cluster Labels'] == 2, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_03=detroit_cleanlist.merge(detroit_meancluster03,on='neighborhood')
cluster_03=cluster_03[['neighborhood','venue','venue category']]
cluster03_count=cluster_03.groupby('venue category').count()
cluster03_count=cluster03_count[['venue']]
cluster03_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Department Store,1
Discount Store,7
Hardware Store,11


In [449]:
detroit_meancluster04=mean_merged.loc[mean_merged['Cluster Labels'] == 3, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_04=detroit_cleanlist.merge(detroit_meancluster04,on='neighborhood')
cluster_04=cluster_04[['neighborhood','venue','venue category']]
cluster04_count=cluster_04.groupby('venue category').count()
cluster04_count=cluster04_count[['venue']]
cluster04_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Big Box Store,6
Department Store,1
Discount Store,2


In [510]:
detroit_meancluster05=mean_merged.loc[mean_merged['Cluster Labels'] == 4, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_05=detroit_cleanlist.merge(detroit_meancluster05,on='neighborhood')
cluster_05=cluster_05[['neighborhood','venue','venue category']]
cluster05_count=cluster_05.groupby('venue category').count()
cluster05_count=cluster05_count[['venue']]
cluster05_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Pharmacy,19


In [451]:
detroit_meancluster06=mean_merged.loc[mean_merged['Cluster Labels'] == 5, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_06=detroit_cleanlist.merge(detroit_meancluster06,on='neighborhood')
cluster_06=cluster_06[['neighborhood','venue','venue category']]
cluster06_count=cluster_06.groupby('venue category').count()
cluster06_count=cluster06_count[['venue']]
cluster06_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Department Store,1
Discount Store,30
Garden Center,2
Hardware Store,4
Pharmacy,12


In [452]:
detroit_meancluster07=mean_merged.loc[mean_merged['Cluster Labels'] == 6, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_07=detroit_cleanlist.merge(detroit_meancluster07,on='neighborhood')
cluster_07=cluster_07[['neighborhood','venue','venue category']]
cluster07_count=cluster_07.groupby('venue category').count()
cluster07_count=cluster07_count[['venue']]
cluster07_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Hardware Store,4
Pharmacy,5


In [511]:
meanmerged_07=mean_merged

### Clustering with k=4

In [488]:
# set number of clusters
kclusters = 4

detroit_clusters_mean = mean_grouped.drop('neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(detroit_clusters_mean)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_)

mean_neighborhood_venues_sorted=mean_neighborhood_venues_sorted.drop(['Cluster Labels'], axis=1)
mean_neighborhood_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
mean_neighborhood_venues_sorted.head()

[1 0 0 0 1 0 0 1 0 2 2 2 3 0 0 1 1 0 2 1 1 0 3 0 0 1 3 0 0 2 0 1 2 1 0 1 0
 0 1 1 3 1 2 1 1 3 1 1 0 1 0 2 2 0 1 2 3 0 2 3 2 0 1 2 0 2 0 0 2 0 1 1 1 2
 1 1 2 0 1 0 0 0 3 1 0 0 0 0 1 1 3 0]


Unnamed: 0,Cluster Labels,neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,1,Arden Park,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
1,0,Aviation,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
2,0,Bagley,Discount Store,Pharmacy,Hardware Store,Big Box Store,Garden Center
3,0,Banglatown,Pharmacy,Discount Store,Hardware Store,Garden Center,Department Store
4,1,Belmont,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store


In [489]:
mean_merged = detroit_data
mean_merged = mean_merged.join(mean_neighborhood_venues_sorted.set_index('neighborhood'), on='neighborhood')
mean_merged=mean_merged.dropna()
mean_merged=mean_merged.astype({'Cluster Labels':'int32'})
print(mean_merged.shape)
mean_merged.head()

(92, 10)


Unnamed: 0,zipcode,neighborhood,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,48201,Brewster-Douglas,42.347507,-83.048334,0,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
1,48201,Brush Park,42.343733,-83.052593,2,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
2,48201,Jeffries,42.345604,-83.070811,0,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
3,48201,Midtown,42.351567,-83.061879,2,Pharmacy,Hardware Store,Garden Center,Discount Store,Department Store
5,48202,Arden Park,42.38811,-83.07971,1,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store


In [485]:
detroit_meancluster01=mean_merged.loc[mean_merged['Cluster Labels'] == 0, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_01=detroit_cleanlist.merge(detroit_meancluster01,on='neighborhood')
cluster_01=cluster_01[['neighborhood','venue','venue category']]
cluster01_count=cluster_01.groupby('venue category').count()
cluster01_count=cluster01_count[['venue']]
cluster01_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Big Box Store,8
Department Store,3
Discount Store,60
Garden Center,3
Hardware Store,27
Pharmacy,48


In [473]:
detroit_meancluster02=mean_merged.loc[mean_merged['Cluster Labels'] == 1, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_02=detroit_cleanlist.merge(detroit_meancluster02,on='neighborhood')
cluster_02=cluster_02[['neighborhood','venue','venue category']]
cluster02_count=cluster_02.groupby('venue category').count()
cluster02_count=cluster02_count[['venue']]
cluster02_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Department Store,2
Discount Store,52
Garden Center,1
Hardware Store,1
Pharmacy,3


In [474]:
detroit_meancluster03=mean_merged.loc[mean_merged['Cluster Labels'] == 2, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_03=detroit_cleanlist.merge(detroit_meancluster03,on='neighborhood')
cluster_03=cluster_03[['neighborhood','venue','venue category']]
cluster03_count=cluster_03.groupby('venue category').count()
cluster03_count=cluster03_count[['venue']]
cluster03_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Discount Store,1
Pharmacy,22


In [475]:
detroit_meancluster04=mean_merged.loc[mean_merged['Cluster Labels'] == 3, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_04=detroit_cleanlist.merge(detroit_meancluster04,on='neighborhood')
cluster_04=cluster_04[['neighborhood','venue','venue category']]
cluster04_count=cluster_04.groupby('venue category').count()
cluster04_count=cluster04_count[['venue']]
cluster04_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Department Store,1
Discount Store,7
Hardware Store,11


### Clustering with k=3

In [477]:
# set number of clusters
kclusters = 3

detroit_clusters_mean = mean_grouped.drop('neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(detroit_clusters_mean)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_)

mean_neighborhood_venues_sorted=mean_neighborhood_venues_sorted.drop(['Cluster Labels'], axis=1)
mean_neighborhood_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
mean_neighborhood_venues_sorted.head()

[2 0 0 0 2 0 0 2 1 1 1 1 0 0 1 2 2 0 1 2 2 0 0 0 0 2 0 0 0 1 0 2 1 2 0 2 0
 0 2 2 0 2 1 2 2 0 2 2 0 2 0 1 1 0 2 1 0 0 1 0 1 0 2 1 0 1 1 0 1 0 2 0 2 1
 2 2 1 0 2 0 0 0 0 0 0 0 0 0 2 2 0 0]


Unnamed: 0,Cluster Labels,neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,2,Arden Park,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
1,0,Aviation,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store
2,0,Bagley,Discount Store,Pharmacy,Hardware Store,Big Box Store,Garden Center
3,0,Banglatown,Pharmacy,Discount Store,Hardware Store,Garden Center,Department Store
4,2,Belmont,Discount Store,Pharmacy,Hardware Store,Garden Center,Department Store


In [478]:
mean_merged = detroit_data
mean_merged = mean_merged.join(mean_neighborhood_venues_sorted.set_index('neighborhood'), on='neighborhood')
mean_merged=mean_merged.dropna()
mean_merged=mean_merged.astype({'Cluster Labels':'int32'})
print(mean_merged.shape)

(92, 10)


In [479]:
detroit_meancluster01=mean_merged.loc[mean_merged['Cluster Labels'] == 0, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_01=detroit_cleanlist.merge(detroit_meancluster01,on='neighborhood')
cluster_01=cluster_01[['neighborhood','venue','venue category']]
cluster01_count=cluster_01.groupby('venue category').count()
cluster01_count=cluster01_count[['venue']]
cluster01_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Big Box Store,8
Department Store,3
Discount Store,60
Garden Center,3
Hardware Store,27
Pharmacy,48


In [480]:
detroit_meancluster02=mean_merged.loc[mean_merged['Cluster Labels'] == 1, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_02=detroit_cleanlist.merge(detroit_meancluster02,on='neighborhood')
cluster_02=cluster_02[['neighborhood','venue','venue category']]
cluster02_count=cluster_02.groupby('venue category').count()
cluster02_count=cluster02_count[['venue']]
cluster02_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Discount Store,3
Hardware Store,2
Pharmacy,29


In [481]:
detroit_meancluster03=mean_merged.loc[mean_merged['Cluster Labels'] == 2, mean_merged.columns[[1] + list(range(5, mean_merged.shape[1]))]]
cluster_03=detroit_cleanlist.merge(detroit_meancluster03,on='neighborhood')
cluster_03=cluster_03[['neighborhood','venue','venue category']]
cluster03_count=cluster_03.groupby('venue category').count()
cluster03_count=cluster03_count[['venue']]
cluster03_count

Unnamed: 0_level_0,venue
venue category,Unnamed: 1_level_1
Department Store,1
Discount Store,50
Hardware Store,1
Pharmacy,3


### Best k-means cluster is with k=6

The 6 clusters are:
<ul>
    <li><b>Cluster 1: Hardware and Dollar Stores-</b>  18 of 19 total shops are hardware or dollar.</li>
    <li><b>Cluster 2: Dollar Stores and Pharmacies-</b> 76 of 101 total shops are Dollar or Pharmacy.</li>
    <li><b>Cluster 3: Dollar Stores Galore!-</b> 50 of 55 total shops are Dollar Stores</li>
    <li><b>Cluster 4: Pharmacies Are All We See.</b>22 of 23 total shops are pharmacies.</li>
    <li><b>Cluster 5: The Big Three-</b>A mix of 5 Dollar, 9 Hardware, 17 Pharmacy</li>
    <li><b>Cluster 6: Big Box-</b>Neighborhoods with access to multiple Big Box retailers.</li>
        

### Map of the clusters.

In [516]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)                             
                             
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(meanmerged_06['latitude'], meanmerged_06['longitude'], meanmerged_06['neighborhood'], meanmerged_06['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.vector_layers.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color='green',
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<b>Map legend:</b><br>
Red=Hardware and Dollar Stores<br>
Purple=Dollar Store and Pharmacy<br>
Blue=Dollar Stores Galore!<br>
Turquoise=Pharmacies Are All We See.<br>
Light Green=The Big Three<br>
Orange=Big Box

### Do it all again without Dollar Stores, Pharmacies, or Hardware Stores

The majority of retail access in Detroit is limited to the big three, which are dollar stores, pharmacies, and independently operated hardware franchises.  The big three present three challenges:  Limited product selection, higher prices, and lower product quality. Pharmacies and hardware have limited selection and higher prices while dollar stores have a limited selection and lower product quality.  So I will perform mapping and clustering again while excluding the big three.

In [522]:
categories='52f2ab2ebcbc57f1066b8b42,4bf58dd8d48988d1f6941735,4bf58dd8d48988d1f8941735,4eb1c0253b7b52c0e1adc2e9,58daa1558bbb0b01f18ec1b4,52e816a6bcbc57f1066b7a54,52f2ab2ebcbc57f1066b8b46'
categories

'52f2ab2ebcbc57f1066b8b42,4bf58dd8d48988d1f6941735,4bf58dd8d48988d1f8941735,4eb1c0253b7b52c0e1adc2e9,58daa1558bbb0b01f18ec1b4,52e816a6bcbc57f1066b7a54,52f2ab2ebcbc57f1066b8b46'

In [524]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [520]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&categoryId={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            categories,
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [535]:
nobigthree_list = getNearbyVenues(names=detroit_data['neighborhood'],
                                   latitudes=detroit_data['latitude'],
                                   longitudes=detroit_data['longitude']
                                  )

Brewster-Douglas
Brush Park
Jeffries
Midtown
Medical Center
Arden Park
Cultural Center
Gateway
Milwaukee Junction
New Center
Piety Hill
Tech Town
Wayne State
Greenfield Park
Grixdale Farms
Hawthorne Park
Nolan
Palmer Park
Palmer Woods
Penrose
State Fair
Aviation
Barton-McFarland
Midwest
Nardin Park
Petoskey-Otsego
Russell Woods
Conner Creek
Franklin
Gratiot-Findlay
LaSalle-College Park
Maple Ridge
Mohican-Regent
Regent Park
Pulaski
Von Stueben
Atkinson
Boston-Edison
Dexter-Linwood
Henry Ford
Herman Kiefer
Jamison
LaSalle Gardens
North LaSalle
Virginia Park
West Virginia Park
Wildemere Park
Eastern Market
Elmwood Park
Forest Park
Lafayette Park
McDougall-Hunt
Rivertown
Core City
Elijah McCoy
NW Goldberg
Woodbridge
Carbon Works
Delray
Hubbard Farms
Springwells
Southwest
West Side Industrial
Chadsey-Condon
Claytown
Michigan-Martin
Southwest
Northend
Poletown East
Russell Industrial
Banglatown
Buffalo Charles
Cadillac Heights
Davison
North Campau
Chandler Park
Chalmers
Eden Gardens
Gratiot

In [536]:
print(nobigthree_list.shape)
nobigthree_list.head()

(523, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Brewster-Douglas,42.347507,-83.048334,Hugh,42.351198,-83.063894,Furniture / Home Store
1,Brewster-Douglas,42.347507,-83.048334,The Pottery Guy @ Shed 1,42.347593,-83.04177,Flower Shop
2,Brewster-Douglas,42.347507,-83.048334,Barbie's Garden Supply,42.348224,-83.058428,Garden Center
3,Brewster-Douglas,42.347507,-83.048334,The Stove,42.354027,-83.062291,Furniture / Home Store
4,Brewster-Douglas,42.347507,-83.048334,Gardella Furniture,42.351921,-83.032043,Furniture / Home Store


In [537]:
nobigthree_list=pd.DataFrame.drop_duplicates(nobigthree_list,subset='Venue Latitude')
print(nobigthree_list.shape)

(179, 7)


In [544]:
nobigthree='https://docs.google.com/spreadsheets/d/e/2PACX-1vSgwy1IzK2HLcwiC6_f4fw6BXzHVJu-lXUGgvZemFKZCY3YQpcUU9LLqHlyklG0wKVuRDdSmvGNr0HB/pub?gid=1700194101&single=true&output=csv'
nobigthree=pd.read_csv(nobigthree)
nobigthree.shape

(25, 7)

### Only 25 venues when excluding the big three of dollar store, pharmacy, and hardware.

In [545]:
address = 'Detroit, MI'
geolocator = Nominatim(user_agent="detroit_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Detroit using latitude and longitude values
map_detroit_final = folium.Map(location=[latitude, longitude], zoom_start=11)

# add neighborhood markers to map
for neighborhood, latitude, longitude in zip(detroit_data['neighborhood'],detroit_data['latitude'], detroit_data['longitude']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_detroit_final)   

# add venue markers to map
for venue, latitude, longitude in zip(nobigthree['Venue'],nobigthree['Venue Latitude'], nobigthree['Venue Longitude']):
    label = '{}'.format(venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='pink',
        fill_opacity=0.9,
        parse_html=False).add_to(map_detroit_final) 
map_detroit_final