# This is the Southern California Neighborhood Comparison Project

### Introduction
Given the large geographic size of the Southern California housing market it is hard to figure out what neighborhoods would be comparable and of equal interest to potential home buyers.

We want to develop a tool to help home buyers figure out what neighborhoods have a comparable living feel, in an easy to see visual way that would help buyers narrow their home buying search without having to physically travel to all the different neighborhoods in this sprawling metropolitan area.

We started with Los Angeles and San Diego as these are two cities that while they have similiar climates and are geographically proximate, have dramatically different looks and feels to them.

### Data
We are using data scraped from Wikipedia to determine the relevant neighborhood determinants

This list is submitted to a Geo Encoder to retrieve it's longitudinal and latitudinal coordinates.

That data is fed to the Foursquare App to get relevant venues and types of activities in the neighborhood to get the most prevalent establishments and activities in the area.

### Methodology 
We develop a dataframe of the relevant neighborhoods, their geographical coordinates, and the most prevalent types of establishments based on a categorical count of the venue type.

We normalize this data, strip out the venue data, and cluster the dataset based on these findings.

We are hoping to find distinctive neighborhood clusters, based on relevant neighborhood features, to give our home buyers a good reference guide of where they would like to live based on their neighborhood preferences.

We then display this information in a visual map to make the information easier to understand and process.

### Core Code

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
!conda install -c anaconda xlrd –yes

import json # library to handle JSON files
!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


Collecting package metadata: done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - –yes

Current channels:

  - https://conda.anaconda.org/anaconda/osx-64
  - https://conda.anaconda.org/anaconda/noarch
  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/free/osx-64
  - https://repo.anaconda.com/pkgs/free/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.


Collecting package metadata: done
Solving environment: | 
  - anaconda::ca-certificates-2018.03.07-0, anaconda::certifi-2018.11.29-py37_0, anaconda::openssl-1.1.1a-h1de35cc_0
  - anaconda::certifi-2018.11.29-py37_0, anaconda::openssl-1.1.1a-h1de35cc_0, defaults::ca-certifi

In [159]:
# Read in the Neighborhood Data
so_cal_path="/Users/thaddeus/Desktop/Desktop/Data Scientist/Los_Angeles_Neighborhoods.csv"
Los_Angeles_Neighborhoods_df=pd.DataFrame()
Los_Angeles_Neighborhoods_csv3={}
Los_Angeles_Neighborhoods_csv3=pd.read_csv(so_cal_path,header=0)
Los_Angeles_Neighborhoods_df=pd.DataFrame.from_dict(Los_Angeles_Neighborhoods_csv3)
Los_Angeles_Neighborhoods_df.columns=["Neighbourhood"]
Los_Angeles_Neighborhoods_df['Latitude']=0.0
Los_Angeles_Neighborhoods_df['Longitude']=0.0
Los_Angeles_Neighborhoods_df.head(5)

Unnamed: 0,Neighbourhood,Latitude,Longitude
0,Angelino Heights,0.0,0.0
1,Arleta,0.0,0.0
2,Arlington Heights,0.0,0.0
3,Arts District,0.0,0.0
4,Atwater Village,0.0,0.0


In [155]:
CLIENT_ID ='ULLRFE02RCBTO15S4HM0ABROE1YDT0Y2EJJEBZJWUCW0Q3VO'
CLIENT_SECRET ='21FEU3P3XFYFPWL4JQVEPPSFX0XSYJMH2YYSEPXYSWRZZYXQ'
VERSION = '20180605' # Foursquare API version

In [161]:
#Add Longitude and Latitude to the data

a=len(Los_Angeles_Neighborhoods_df)

for b in range(a):
    geolocator = Nominatim(user_agent="so_cal_explorer")
    address = str(Los_Angeles_Neighborhoods_df.iloc[b,0]) + ', California'
    print (address)
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))
    Los_Angeles_Neighborhoods_df.iloc[b,1]=latitude
    Los_Angeles_Neighborhoods_df.iloc[b,2]=longitude
print(Los_Angeles_Neighborhoods_df)


Angelino Heights, California
The geograpical coordinate of Angelino Heights, California are 34.0702889, -118.2547965.
Arleta, California
The geograpical coordinate of Arleta, California are 34.2413266, -118.4322047.
Arlington Heights, California
The geograpical coordinate of Arlington Heights, California are 40.055725, -120.8902318.
Arts District, California
The geograpical coordinate of Arts District, California are 34.0412389, -118.2344503.
Atwater Village, California
The geograpical coordinate of Atwater Village, California are 34.1163979, -118.2564637.
Baldwin Hills, California
The geograpical coordinate of Baldwin Hills, California are 34.0075684, -118.3505956.
Crenshaw, California
The geograpical coordinate of Crenshaw, California are 33.9252122, -118.3265295.
Bel Air, California
The geograpical coordinate of Bel Air, California are 34.0827278, -118.4479802.
Benedict Canyon, California
The geograpical coordinate of Benedict Canyon, California are 34.0494418, -118.4004298.
Beverly

The geograpical coordinate of Koreatown, California are 34.0580134, -118.3008095.
Ladera, California
The geograpical coordinate of Ladera, California are 37.4003689, -122.2021098.
Lafayette Square, California
The geograpical coordinate of Lafayette Square, California are 33.74971395, -117.811551180636.
Lake Balboa, California
The geograpical coordinate of Lake Balboa, California are 34.1811656, -118.495236001433.
Lake View Terrace, California
The geograpical coordinate of Lake View Terrace, California are 34.2763908, -118.3611912.
Larchmont, California
The geograpical coordinate of Larchmont, California are 38.56815905, -121.358899109521.
Laurel Canyon, California
The geograpical coordinate of Laurel Canyon, California are 33.590091, -117.7677432.
Leimert Park, California
The geograpical coordinate of Leimert Park, California are 34.007702, -118.3320627.
Lincoln Heights, California
The geograpical coordinate of Lincoln Heights, California are 34.0705664, -118.2050727.
Little Armenia, C

The geograpical coordinate of Van Nuys, California are 34.1866581, -118.448729.
Venice, California
The geograpical coordinate of Venice, California are 33.995044, -118.4668875.
Vermont-Slauson, California
The geograpical coordinate of Vermont-Slauson, California are 33.98841885, -118.290338644074.
Vermont Square, California
The geograpical coordinate of Vermont Square, California are 34.00013975, -118.295890410078.
Vermont Vista, California
The geograpical coordinate of Vermont Vista, California are 34.1209957, -117.5478762.
Victor Heights, California
The geograpical coordinate of Victor Heights, California are 34.0653811, -118.2506213.
Victoria Park, California
The geograpical coordinate of Victoria Park, California are 33.85342525, -116.539841436846.
Village Green, California
The geograpical coordinate of Village Green, California are 34.0198711, -118.3606643.
Warehouse District, California
The geograpical coordinate of Warehouse District, California are -0.1388171, -78.5064391.
Warn

In [204]:
print (Los_Angeles_Neighborhoods_df)

           Neighbourhood   Latitude   Longitude
0       Angelino Heights  34.070289 -118.254796
1                 Arleta  34.241327 -118.432205
2      Arlington Heights  40.055725 -120.890232
3          Arts District  34.041239 -118.234450
4        Atwater Village  34.116398 -118.256464
5          Baldwin Hills  34.007568 -118.350596
6               Crenshaw  33.925212 -118.326530
7                Bel Air  34.082728 -118.447980
8        Benedict Canyon  34.049442 -118.400430
9          Beverly Crest  32.716728 -117.077118
10          Beverly Glen  34.107785 -118.445636
11         Beverly Grove  37.736483 -121.120079
12        Beverly Hills   34.069650 -118.396306
13          Beverly Park  34.063769 -118.264690
14           Beverlywood  34.046633 -118.395038
15         Boyle Heights  34.033166 -118.204865
16             Brentwood  37.931777 -121.696027
17      Brentwood Circle  33.741226 -116.986747
18             Brookside  34.265108 -117.187349
19           Bunker Hill  38.426577 -120

In [164]:
# create map of New York using latitude and longitude values
map_los_angeles = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighbourhood in zip(Los_Angeles_Neighborhoods_df['Latitude'], Los_Angeles_Neighborhoods_df['Longitude'],Los_Angeles_Neighborhoods_df['Neighbourhood']):
    label = '{}'.format(neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_los_angeles)  
    
map_los_angeles

In [165]:
los_angeles_downtown_data = Los_Angeles_Neighborhoods_df[Los_Angeles_Neighborhoods_df['Neighbourhood'] == 'Downtown'].reset_index(drop=True)
los_angeles_downtown_data.head()

Unnamed: 0,Neighbourhood,Latitude,Longitude
0,Downtown,37.803629,-122.271524


In [166]:
los_angeles_downtown_data.loc[0, 'Neighbourhood']
#Get the neighborhood's latitude and longitude values.
neighbourhood_latitude = los_angeles_downtown_data.loc[0, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude = los_angeles_downtown_data.loc[0, 'Longitude'] # neighborhood longitude value

neighbourhood_name = los_angeles_downtown_data.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

Latitude and longitude values of Downtown are 37.8036295, -122.2715244.


In [167]:
#Get the Foursquare Venue Data

LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighbourhood_latitude, 
    neighbourhood_longitude, 
    radius, 
    LIMIT)
url # display URL

#'https://api.foursquare.com/v2/venues/explore?&client_id=ULLRFE02RCBTO15S4HM0ABROE1YDT0Y2EJJEBZJWUCW0Q3VO&client_secret=21FEU3P3XFYFPWL4JQVEPPSFX0XSYJMH2YYSEPXYSWRZZYXQ&v=20180605&ll=40.87655077879964,-73.91065965862981&radius=500&limit=100'

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c6c48c6f594df76b6d8ebf2'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Downtown Oakland',
  'headerFullLocation': 'Downtown Oakland, Oakland',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 98,
  'suggestedBounds': {'ne': {'lat': 37.8081295045, 'lng': -122.26583966329775},
   'sw': {'lat': 37.7991294955, 'lng': -122.27720913670225}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4dfb9c2c1f6eeef806ab898c',
       'name': 'Oaklandish',
       'location': {'address': '1444 Broadway',
        'crossStreet': 'at 14th St',
        'lat': 37.80507510123643,
        'lng': -122.27072640549

In [168]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

98 venues were returned by Foursquare.


In [169]:
# Function that gets the Venue Data

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [172]:
# Get the Venue Data

los_angeles_venues = getNearbyVenues(names=Los_Angeles_Neighborhoods_df['Neighbourhood'],
                                   latitudes=Los_Angeles_Neighborhoods_df['Latitude'],
                                   longitudes=Los_Angeles_Neighborhoods_df['Longitude']
                                  )

Angelino Heights
Arleta
Arlington Heights
Arts District
Atwater Village
Baldwin Hills
Crenshaw
Bel Air
Benedict Canyon
Beverly Crest
Beverly Glen
Beverly Grove
Beverly Hills 
Beverly Park
Beverlywood
Boyle Heights
Brentwood
Brentwood Circle
Brookside
Bunker Hill
Cahuenga Pass
Canoga Park
Canterbury Knolls
Carthay
Central City
Century City
Chatsworth
Chesterfield Square
Cheviot Hills
Chinatown
Civic Center
Crenshaw
Crestwood Hills
Cypress Park
Del Rey
Downtown
Eagle Rock
East Hollywood
Echo Park
Edendale
El Sereno
Elysian Heights
Elysian Park
Elysian Valley
Encino
Exposition Park
Fairfax
Fashion District
Financial District
Florence
Flower District
Franklin Hills
Gallery Row
Garvanza
Glassell Park
Gramercy Park
Granada Hills
Green Meadows
Griffith Park
Hancock Park
Harbor City
Harbor Gateway
Harvard Heights
Harvard Park
Hermon
Highland Park
Historic Core
Hollywood
Hollywood Hills
Hollywood Hills West
Hyde Park
Jefferson Park
Jewelry District
Kinney Heights
Koreatown
Ladera
Lafayette Squa

In [173]:
print(los_angeles_venues.shape)
los_angeles_venues.head()

(3601, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Angelino Heights,34.070289,-118.254796,Halliwell Manor,34.069329,-118.254165,Performing Arts Venue
1,Angelino Heights,34.070289,-118.254796,Guisados,34.070262,-118.250437,Taco Place
2,Angelino Heights,34.070289,-118.254796,Eightfold Coffee,34.071245,-118.250698,Coffee Shop
3,Angelino Heights,34.070289,-118.254796,"Michael Jackson's ""Thriller"" House (and Tree)",34.069557,-118.254599,Historic Site
4,Angelino Heights,34.070289,-118.254796,The Park's Finest BBQ,34.066519,-118.254291,BBQ Joint


In [174]:
los_angeles_venues.groupby('Neighbourhood').count()
print('There are {} uniques categories.'.format(len(los_angeles_venues['Venue Category'].unique())))


There are 321 uniques categories.


In [175]:
# one hot encoding
los_angeles_onehot = pd.get_dummies(los_angeles_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
los_angeles_onehot['Neighbourhood'] = los_angeles_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [los_angeles_onehot.columns[-1]] + list(los_angeles_onehot.columns[:-1])
los_angeles_onehot = los_angeles_onehot[fixed_columns]

los_angeles_onehot.head()

Unnamed: 0,Neighbourhood,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,Airport Lounge,Airport Terminal,Alternative Healer,American Restaurant,Amphitheater,...,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant
0,Angelino Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Angelino Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Angelino Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Angelino Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Angelino Heights,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [176]:
los_angeles_onehot.shape

(3601, 322)

In [177]:
los_angeles_grouped = los_angeles_onehot.groupby('Neighbourhood').mean().reset_index()
los_angeles_grouped

Unnamed: 0,Neighbourhood,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,Airport Lounge,Airport Terminal,Alternative Healer,American Restaurant,Amphitheater,...,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant
0,Angelino Heights,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
1,Arleta,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
2,Arts District,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
3,Atwater Village,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
4,Baldwin Hills,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
5,Bel Air,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
6,Benedict Canyon,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
7,Beverly Crest,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
8,Beverly Grove,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.100000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
9,Beverly Hills,0.013514,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.040541,0.0,...,0.0,0.000000,0.000000,0.013514,0.013514,0.000000,0.0,0.000000,0.000000,0.0


In [178]:
los_angeles_grouped.shape

(153, 322)

In [179]:
# Get the Top Venue Data

num_top_venues = 5

for hood in los_angeles_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = los_angeles_grouped[los_angeles_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Angelino Heights----
        venue  freq
0  Taco Place  0.11
1       Trail  0.07
2       Hotel  0.07
3  Laundromat  0.04
4      Market  0.04


----Arleta----
                venue  freq
0       Historic Site  0.25
1       Movie Theater  0.25
2  Mexican Restaurant  0.25
3             Dog Run  0.25
4                 ATM  0.00


----Arts District----
                venue  freq
0         Coffee Shop  0.11
1         Art Gallery  0.11
2  Italian Restaurant  0.05
3    Asian Restaurant  0.05
4           Bookstore  0.05


----Atwater Village----
                    venue  freq
0             Coffee Shop  0.08
1                     Gym  0.05
2                 Theater  0.05
3  Thrift / Vintage Store  0.05
4              Restaurant  0.05


----Baldwin Hills----
          venue  freq
0          Park  0.33
1         Trail  0.33
2       Dog Run  0.33
3           ATM  0.00
4  Optical Shop  0.00


----Bel Air----
        venue  freq
0  Restaurant  0.14
1         Spa  0.14
2  Hotel Pool  0.14
3   Ho

                  venue  freq
0  Other Great Outdoors  0.07
1              Pharmacy  0.07
2                Bakery  0.07
3        Shipping Store  0.07
4        Sandwich Place  0.07


----Gramercy Park----
                     venue  freq
0                   Market   0.4
1              Pizza Place   0.4
2               Food Truck   0.2
3              Music Venue   0.0
4  North Indian Restaurant   0.0


----Green Meadows----
        venue  freq
0        Park   0.5
1  Playground   0.5
2         ATM   0.0
3  Nail Salon   0.0
4      Office   0.0


----Griffith Park----
             venue  freq
0   Scenic Lookout   0.2
1  Nature Preserve   0.2
2            Trail   0.2
3             Park   0.2
4         Tea Room   0.2


----Hancock Park----
              venue  freq
0        Art Museum  0.19
1       Art Gallery  0.10
2            Museum  0.08
3        Food Truck  0.06
4  Sculpture Garden  0.04


----Harbor City----
             venue  freq
0  Thai Restaurant  0.33
1      Wings Joint  0.17
2   

                 venue  freq
0          Yoga Studio  0.14
1           Taco Place  0.14
2   Italian Restaurant  0.14
3          Video Store  0.14
4  Japanese Restaurant  0.14


----Panorama City----
                  venue  freq
0     Mobile Phone Shop  0.13
1            Shoe Store  0.13
2   Filipino Restaurant  0.10
3              Pharmacy  0.07
4  Fast Food Restaurant  0.07


----Park La Brea----
                                      venue  freq
0                                Art Museum  0.12
1  Residential Building (Apartment / Condo)  0.12
2                                     Hotel  0.06
3                               Video Store  0.06
4                            Clothing Store  0.06


----Platinum Triangle----
                        venue  freq
0                         Gym  0.12
1                        Pool  0.08
2        Fast Food Restaurant  0.08
3         Rental Car Location  0.04
4  Construction & Landscaping  0.04


----Playa Vista----
        venue  freq
0  Food Truck

                       venue  freq
0  Middle Eastern Restaurant  0.10
1        Japanese Restaurant  0.06
2        Indie Movie Theater  0.06
3               Cocktail Bar  0.03
4         Chinese Restaurant  0.03


----Westchester----
                    venue  freq
0  Furniture / Home Store  0.09
1      Frozen Yogurt Shop  0.05
2        Asian Restaurant  0.05
3             Supermarket  0.05
4      Mexican Restaurant  0.05


----Westdale----
            venue  freq
0            Park  0.08
1            Bank  0.04
2        Pharmacy  0.04
3             Spa  0.04
4  Breakfast Spot  0.04


----Western Heights----
                        venue  freq
0                        Farm  0.50
1  Construction & Landscaping  0.25
2        Other Great Outdoors  0.25
3                         ATM  0.00
4                Neighborhood  0.00


----Westlake----
                venue  freq
0       Grocery Store  0.20
1  Mexican Restaurant  0.13
2        Liquor Store  0.07
3            Pet Café  0.07
4           

In [180]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]


In [181]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = los_angeles_grouped['Neighbourhood']

for ind in np.arange(los_angeles_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(los_angeles_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Angelino Heights,Taco Place,Hotel,Trail,Bakery,Record Shop,Breakfast Spot,Boxing Gym,Motel,Market,Laundromat
1,Arleta,Mexican Restaurant,Movie Theater,Dog Run,Historic Site,Yoshoku Restaurant,Fast Food Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant
2,Arts District,Coffee Shop,Art Gallery,Bookstore,Italian Restaurant,Asian Restaurant,Museum,Restaurant,Café,Gym / Fitness Center,Cocktail Bar
3,Atwater Village,Coffee Shop,Pet Store,Theater,Thrift / Vintage Store,Gym,Restaurant,Farmers Market,Latin American Restaurant,Sporting Goods Shop,Liquor Store
4,Baldwin Hills,Dog Run,Trail,Park,Yoshoku Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop


In [182]:
los_angeles_grouped

Unnamed: 0,Neighbourhood,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,Airport Lounge,Airport Terminal,Alternative Healer,American Restaurant,Amphitheater,...,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant
0,Angelino Heights,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
1,Arleta,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
2,Arts District,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
3,Atwater Village,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
4,Baldwin Hills,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
5,Bel Air,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
6,Benedict Canyon,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
7,Beverly Crest,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
8,Beverly Grove,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.100000,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
9,Beverly Hills,0.013514,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.040541,0.0,...,0.0,0.000000,0.000000,0.013514,0.013514,0.000000,0.0,0.000000,0.000000,0.0


In [183]:
# set number of clusters
kclusters = 9
kmeans=''
los_angeles_grouped_clustering = los_angeles_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(los_angeles_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 6, 0, 0, 2, 0, 2, 0, 6, 0], dtype=int32)

In [185]:
los_angeles_grouped_clustering

Unnamed: 0,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,Airport Lounge,Airport Terminal,Alternative Healer,American Restaurant,Amphitheater,Antique Shop,...,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant
0,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
1,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
2,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
3,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
4,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
5,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
6,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
7,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
8,0.000000,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.100000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0
9,0.013514,0.000000,0.0,0.000000,0.000000,0.00,0.0,0.040541,0.0,0.000000,...,0.0,0.000000,0.000000,0.013514,0.013514,0.000000,0.0,0.000000,0.000000,0.0


In [188]:
# add clustering labels

del neighbourhoods_venues_sorted['Cluster Labels']
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

los_angeles_merged = Los_Angeles_Neighborhoods_df

# merge los_angeles_grouped with los_angeles_data to add latitude/longitude for each neighborhood
los_angeles_merged = los_angeles_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

los_angeles_merged

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Angelino Heights,34.070289,-118.254796,0.0,Taco Place,Hotel,Trail,Bakery,Record Shop,Breakfast Spot,Boxing Gym,Motel,Market,Laundromat
1,Arleta,34.241327,-118.432205,6.0,Mexican Restaurant,Movie Theater,Dog Run,Historic Site,Yoshoku Restaurant,Fast Food Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant
2,Arlington Heights,40.055725,-120.890232,,,,,,,,,,,
3,Arts District,34.041239,-118.234450,0.0,Coffee Shop,Art Gallery,Bookstore,Italian Restaurant,Asian Restaurant,Museum,Restaurant,Café,Gym / Fitness Center,Cocktail Bar
4,Atwater Village,34.116398,-118.256464,0.0,Coffee Shop,Pet Store,Theater,Thrift / Vintage Store,Gym,Restaurant,Farmers Market,Latin American Restaurant,Sporting Goods Shop,Liquor Store
5,Baldwin Hills,34.007568,-118.350596,2.0,Dog Run,Trail,Park,Yoshoku Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop
6,Crenshaw,33.925212,-118.326530,0.0,Pet Store,Mexican Restaurant,Food Truck,Tailor Shop,Salon / Barbershop,Frozen Yogurt Shop,Restaurant,Shipping Store,Diner,Gym / Fitness Center
7,Bel Air,34.082728,-118.447980,0.0,Hotel Bar,Restaurant,Café,Golf Course,Spa,Hotel Pool,Hotel,Food,Flower Shop,Flea Market
8,Benedict Canyon,34.049442,-118.400430,2.0,Food Truck,Gym Pool,Park,Farm,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant
9,Beverly Crest,32.716728,-117.077118,0.0,Recording Studio,Yoshoku Restaurant,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm


In [189]:
#Clean the Data

los_angeles_merged.dropna(inplace=True)
los_angeles_merged['Cluster Labels'] = los_angeles_merged['Cluster Labels'].astype(int)
los_angeles_merged

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Angelino Heights,34.070289,-118.254796,0,Taco Place,Hotel,Trail,Bakery,Record Shop,Breakfast Spot,Boxing Gym,Motel,Market,Laundromat
1,Arleta,34.241327,-118.432205,6,Mexican Restaurant,Movie Theater,Dog Run,Historic Site,Yoshoku Restaurant,Fast Food Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant
3,Arts District,34.041239,-118.234450,0,Coffee Shop,Art Gallery,Bookstore,Italian Restaurant,Asian Restaurant,Museum,Restaurant,Café,Gym / Fitness Center,Cocktail Bar
4,Atwater Village,34.116398,-118.256464,0,Coffee Shop,Pet Store,Theater,Thrift / Vintage Store,Gym,Restaurant,Farmers Market,Latin American Restaurant,Sporting Goods Shop,Liquor Store
5,Baldwin Hills,34.007568,-118.350596,2,Dog Run,Trail,Park,Yoshoku Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop
6,Crenshaw,33.925212,-118.326530,0,Pet Store,Mexican Restaurant,Food Truck,Tailor Shop,Salon / Barbershop,Frozen Yogurt Shop,Restaurant,Shipping Store,Diner,Gym / Fitness Center
7,Bel Air,34.082728,-118.447980,0,Hotel Bar,Restaurant,Café,Golf Course,Spa,Hotel Pool,Hotel,Food,Flower Shop,Flea Market
8,Benedict Canyon,34.049442,-118.400430,2,Food Truck,Gym Pool,Park,Farm,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant
9,Beverly Crest,32.716728,-117.077118,0,Recording Studio,Yoshoku Restaurant,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
11,Beverly Grove,37.736483,-121.120079,6,Mexican Restaurant,Coffee Shop,American Restaurant,Bar,Park,Bakery,Sporting Goods Shop,Pizza Place,Food,Farmers Market


In [190]:
los_angeles_merged.groupby('Cluster Labels').count()

Unnamed: 0_level_0,Neighbourhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0,102,102,102,102,102,102,102,102,102,102,102,102,102
1,3,3,3,3,3,3,3,3,3,3,3,3,3
2,12,12,12,12,12,12,12,12,12,12,12,12,12
3,1,1,1,1,1,1,1,1,1,1,1,1,1
4,4,4,4,4,4,4,4,4,4,4,4,4,4
5,1,1,1,1,1,1,1,1,1,1,1,1,1
6,25,25,25,25,25,25,25,25,25,25,25,25,25
7,1,1,1,1,1,1,1,1,1,1,1,1,1
8,5,5,5,5,5,5,5,5,5,5,5,5,5


In [191]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(los_angeles_merged['Latitude'], los_angeles_merged['Longitude'], los_angeles_merged['Neighbourhood'], los_angeles_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [194]:
#for aa in range (kclusters):
los_angeles_merged.loc[los_angeles_merged['Cluster Labels'] == 0, los_angeles_merged.columns[[1] + list(range(5, los_angeles_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,34.070289,Hotel,Trail,Bakery,Record Shop,Breakfast Spot,Boxing Gym,Motel,Market,Laundromat
3,34.041239,Art Gallery,Bookstore,Italian Restaurant,Asian Restaurant,Museum,Restaurant,Café,Gym / Fitness Center,Cocktail Bar
4,34.116398,Pet Store,Theater,Thrift / Vintage Store,Gym,Restaurant,Farmers Market,Latin American Restaurant,Sporting Goods Shop,Liquor Store
6,33.925212,Mexican Restaurant,Food Truck,Tailor Shop,Salon / Barbershop,Frozen Yogurt Shop,Restaurant,Shipping Store,Diner,Gym / Fitness Center
7,34.082728,Restaurant,Café,Golf Course,Spa,Hotel Pool,Hotel,Food,Flower Shop,Flea Market
9,32.716728,Yoshoku Restaurant,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
12,34.069650,Park,Hotel,Sushi Restaurant,Spa,New American Restaurant,American Restaurant,Coffee Shop,Boutique,Restaurant
13,34.063769,Park,Massage Studio,Breakfast Spot,Bubble Tea Shop,Filipino Restaurant,Liquor Store,Fast Food Restaurant,Supermarket,Caribbean Restaurant
18,34.265108,Antique Shop,Baseball Field,Resort,Beach,Scenic Lookout,Fish & Chips Shop,Filipino Restaurant,Fast Food Restaurant,Farmers Market
20,34.126866,Restaurant,Trail,Paper / Office Supplies Store,Japanese Restaurant,Pizza Place,Bar,Coffee Shop,Vegetarian / Vegan Restaurant,Movie Theater


In [195]:
#for aa in range (kclusters):
los_angeles_merged.loc[los_angeles_merged['Cluster Labels'] == 1, los_angeles_merged.columns[[1] + list(range(5, los_angeles_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,34.131179,Yoshoku Restaurant,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
69,34.131179,Yoshoku Restaurant,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
99,34.106485,Scenic Lookout,Yoshoku Restaurant,Farm,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop


In [196]:
los_angeles_merged.loc[los_angeles_merged['Cluster Labels'] == 2, los_angeles_merged.columns[[1] + list(range(5, los_angeles_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,34.007568,Trail,Park,Yoshoku Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop
8,34.049442,Gym Pool,Park,Farm,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant
14,34.046633,Paper / Office Supplies Store,Park,Yoshoku Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant
32,33.613498,Home Service,Yoshoku Restaurant,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant
57,33.537326,Park,Yoshoku Restaurant,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant
77,34.181166,Golf Course,Playground,Boat Rental,Baseball Field,American Restaurant,Farmers Market,Ethiopian Restaurant,Event Space,Exhibit
79,38.568159,Auto Garage,Home Service,Yoshoku Restaurant,Farm,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop
95,34.097129,Donut Shop,Sandwich Place,Mexican Restaurant,Grocery Store,History Museum,Train Station,Fabric Shop,Empanada Restaurant,English Restaurant
115,34.277253,Home Service,American Restaurant,Park,Locksmith,Yoshoku Restaurant,Falafel Restaurant,English Restaurant,Ethiopian Restaurant,Event Space
146,38.433771,Locksmith,Yoshoku Restaurant,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm


In [198]:
los_angeles_merged.loc[los_angeles_merged['Cluster Labels'] == 3, los_angeles_merged.columns[[1] + list(range(5, los_angeles_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
140,34.146788,Yoshoku Restaurant,Food Truck,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm


In [199]:
los_angeles_merged.loc[los_angeles_merged['Cluster Labels'] == 4, los_angeles_merged.columns[[1] + list(range(5, los_angeles_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,36.211273,Sandwich Place,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
104,-0.138817,Electronics Store,Cuban Restaurant,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
156,-0.138817,Electronics Store,Cuban Restaurant,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
167,36.400507,Creperie,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market


In [200]:
los_angeles_merged.loc[los_angeles_merged['Cluster Labels'] == 5, los_angeles_merged.columns[[1] + list(range(5, los_angeles_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,34.147226,Yoshoku Restaurant,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market


In [201]:
los_angeles_merged.loc[los_angeles_merged['Cluster Labels'] == 6, los_angeles_merged.columns[[1] + list(range(5, los_angeles_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,34.241327,Movie Theater,Dog Run,Historic Site,Yoshoku Restaurant,Fast Food Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant
11,37.736483,Coffee Shop,American Restaurant,Bar,Park,Bakery,Sporting Goods Shop,Pizza Place,Food,Farmers Market
15,34.033166,Mobile Phone Shop,Mexican Restaurant,Theater,Grocery Store,Gym,Yoshoku Restaurant,Falafel Restaurant,Ethiopian Restaurant,Event Space
16,37.931777,American Restaurant,Pizza Place,Breakfast Spot,Bus Stop,Taco Place,Bar,Laundromat,Café,Gas Station
26,34.259571,Hotel,Gym / Fitness Center,Creperie,Rock Club,Donut Shop,Dry Cleaner,Cajun / Creole Restaurant,Fast Food Restaurant,Pizza Place
33,34.092232,Discount Store,Bakery,Park,Latin American Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
40,34.081121,Mexican Restaurant,Thrift / Vintage Store,Seafood Restaurant,Liquor Store,Yoshoku Restaurant,Farm,Ethiopian Restaurant,Event Space,Exhibit
49,33.975217,Restaurant,Grocery Store,Mexican Restaurant,Sandwich Place,Fast Food Restaurant,Fabric Shop,English Restaurant,Ethiopian Restaurant,Event Space
71,34.027234,Park,Skate Park,Burger Joint,Taco Place,Bakery,Sandwich Place,Design Studio,Dessert Shop,Exhibit
73,38.632404,Stadium,Discount Store,Video Store,Gas Station,Flea Market,Falafel Restaurant,English Restaurant,Ethiopian Restaurant,Event Space


In [202]:
los_angeles_merged.loc[los_angeles_merged['Cluster Labels'] == 7, los_angeles_merged.columns[[1] + list(range(5, los_angeles_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
131,37.505309,Yoshoku Restaurant,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm


In [203]:
los_angeles_merged.loc[los_angeles_merged['Cluster Labels'] == 8, los_angeles_merged.columns[[1] + list(range(5, los_angeles_merged.shape[1]))]]



Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
41,34.082898,Park,Baseball Field,Record Shop,Basketball Court,Garden,Disc Golf,Ethiopian Restaurant,Event Space,Exhibit
42,34.082898,Park,Baseball Field,Record Shop,Basketball Court,Garden,Disc Golf,Ethiopian Restaurant,Event Space,Exhibit
78,34.276391,Garden,Yoshoku Restaurant,Fast Food Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
100,34.239054,Yoshoku Restaurant,Football Stadium,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market
127,34.081887,Garden,Baseball Field,Basketball Court,Park,Disc Golf,Fish & Chips Shop,Fish Market,Filipino Restaurant,Fast Food Restaurant


Switching to San Diego

In [86]:
so_cal_path="/Users/thaddeus/Desktop/Desktop/Data Scientist/test.csv"
san_diego_neighborhoods_df=pd.DataFrame()
san_diego_neighborhoods_csv={}
san_diego_neighborhoods_csv=pd.read_csv(so_cal_path,header=0)
san_diego_neighborhoods_df=pd.DataFrame.from_dict(san_diego_neighborhoods_csv)
print(san_diego_neighborhoods_df)
san_diego_neighborhoods_df.columns=["Neighbourhood"]
san_diego_neighborhoods_df['Latitude']=0.0
san_diego_neighborhoods_df['Longitude']=0.0
san_diego_neighborhoods_df

                     Keys
0             Balboa Park
1            Bankers Hill
2            Barrio Logan
3                  Bay Ho
4                Bay Park
5                Birdland
6    Black Mountain Ranch
7                  Border
8              Burlingame
9   Carmel Mountain Ranch
10          Carmel Valley
11           City Heights
12             Clairemont
13           College Area
14        Del Mar Heights
15           Del Mar Mesa
16     Downtown San Diego
17           East Elliott
18             El Cerrito
19                Gateway
20            Golden Hill
21             Grant Hill
22          Harbor Island
23             Harborview
24              Hillcrest
25            Kearny Mesa
26             Kensington
27               La Jolla
28            Lake Murray
29            Linda Vista
..                    ...
68     University Heights
69                 Uptown
70                Webster
71             San Ysidro
72          Scripps Ranch
73    Miramar Ranch North
74  Scripps 

Unnamed: 0,Neighbourhood,Latitude,Longitude
0,Balboa Park,0.0,0.0
1,Bankers Hill,0.0,0.0
2,Barrio Logan,0.0,0.0
3,Bay Ho,0.0,0.0
4,Bay Park,0.0,0.0
5,Birdland,0.0,0.0
6,Black Mountain Ranch,0.0,0.0
7,Border,0.0,0.0
8,Burlingame,0.0,0.0
9,Carmel Mountain Ranch,0.0,0.0


In [87]:
a=len(san_diego_neighborhoods_df)

for b in range(a):
    geolocator = Nominatim(user_agent="so_cal_explorer")
    address = str(san_diego_neighborhoods_df.iloc[b,0]) + ', California'
    print (address)
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))
    san_diego_neighborhoods_df.iloc[b,1]=latitude
    san_diego_neighborhoods_df.iloc[b,2]=longitude
print(san_diego_neighborhoods_df)

Balboa Park, California
The geograpical coordinate of Balboa Park, California are 37.72494875, -122.444804524152.
Bankers Hill, California
The geograpical coordinate of Bankers Hill, California are 32.7260727, -117.1612254.
Barrio Logan, California
The geograpical coordinate of Barrio Logan, California are 32.697552, -117.1419765.
Bay Ho, California
The geograpical coordinate of Bay Ho, California are 36.7014631, -118.7559974.
Bay Park, California
The geograpical coordinate of Bay Park, California are 32.781716, -117.2064242.
Birdland, California
The geograpical coordinate of Birdland, California are 32.7882923, -117.1562231.
Black Mountain Ranch, California
The geograpical coordinate of Black Mountain Ranch, California are 32.9841169, -117.1319318.
Border, California
The geograpical coordinate of Border, California are 37.5346899, -122.508296.
Burlingame, California
The geograpical coordinate of Burlingame, California are 37.5841026, -122.3660825.
Carmel Mountain Ranch, California
The

The geograpical coordinate of Serra Mesa, California are 32.802827, -117.1383662.
Shelter Island, California
The geograpical coordinate of Shelter Island, California are 32.7113133, -117.2300319.
Sherman Heights, California
The geograpical coordinate of Sherman Heights, California are 32.7106073, -117.1422545.
Sorrento Mesa, California
The geograpical coordinate of Sorrento Mesa, California are 32.7678288, -117.0230839.
Sorrento Valley, California
The geograpical coordinate of Sorrento Valley, California are 32.9022964, -117.2247321.
South Park, California
The geograpical coordinate of South Park, California are 32.7201829, -117.1292638.
Southeast San Diego, California
The geograpical coordinate of Southeast San Diego, California are 32.7174209, -117.1627714.
Alta Vista, California
The geograpical coordinate of Alta Vista, California are 37.4127086, -118.5426178.
Bay Terraces, California
The geograpical coordinate of Bay Terraces, California are 32.6919137, -117.0366305.
Broadway Heigh

In [89]:
# create map of New York using latitude and longitude values
map_san_diego = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighbourhood in zip(san_diego_neighborhoods_df['Latitude'], san_diego_neighborhoods_df['Longitude'], san_diego_neighborhoods_df['Neighbourhood']):
    label = '{}'.format(neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_san_diego)  
    
map_san_diego


In [91]:
san_diego_downtown_data = san_diego_neighborhoods_df[san_diego_neighborhoods_df['Neighbourhood'] == 'Downtown San Diego'].reset_index(drop=True)
san_diego_downtown_data.head()


Unnamed: 0,Neighbourhood,Latitude,Longitude
0,Downtown San Diego,33.125353,-117.075213


In [22]:
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: ULLRFE02RCBTO15S4HM0ABROE1YDT0Y2EJJEBZJWUCW0Q3VO
CLIENT_SECRET:21FEU3P3XFYFPWL4JQVEPPSFX0XSYJMH2YYSEPXYSWRZZYXQ


In [92]:
san_diego_downtown_data.loc[0, 'Neighbourhood']
#Get the neighborhood's latitude and longitude values.
neighbourhood_latitude = san_diego_downtown_data.loc[0, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude = san_diego_downtown_data.loc[0, 'Longitude'] # neighborhood longitude value

neighbourhood_name = san_diego_downtown_data.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))


Latitude and longitude values of Downtown San Diego are 33.1253528, -117.075213451358.


In [96]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighbourhood_latitude, 
    neighbourhood_longitude, 
    radius, 
    LIMIT)
url # display URL

#'https://api.foursquare.com/v2/venues/explore?&client_id=ULLRFE02RCBTO15S4HM0ABROE1YDT0Y2EJJEBZJWUCW0Q3VO&client_secret=21FEU3P3XFYFPWL4JQVEPPSFX0XSYJMH2YYSEPXYSWRZZYXQ&v=20180605&ll=40.87655077879964,-73.91065965862981&radius=500&limit=100'

results = requests.get(url).json()

In [97]:
results

{'meta': {'code': 200, 'requestId': '5c6b2b80dd57977bd49c17d7'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Central Escondido',
  'headerFullLocation': 'Central Escondido, Escondido',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 10,
  'suggestedBounds': {'ne': {'lat': 33.129852804500004,
    'lng': -117.06985019657004},
   'sw': {'lat': 33.1208527955, 'lng': -117.08057670614595}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bc8b75592b376b08807523a',
       'name': 'The Sculpture Salon',
       'location': {'address': '401 E Grand Ave',
        'crossStreet': 'South Ivy Street',
        'lat': 33.123736747154794,
        'lng': -117.07724557044189,
    

In [98]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))


10 venues were returned by Foursquare.


In [100]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [101]:
san_diego_venues = getNearbyVenues(names=san_diego_neighborhoods_df['Neighbourhood'],
                                   latitudes=san_diego_neighborhoods_df['Latitude'],
                                   longitudes=san_diego_neighborhoods_df['Longitude']
                                  )

Balboa Park
Bankers Hill
Barrio Logan
Bay Ho
Bay Park
Birdland
Black Mountain Ranch
Border
Burlingame
Carmel Mountain Ranch
Carmel Valley
City Heights
Clairemont
College Area
Del Mar Heights
Del Mar Mesa
Downtown San Diego
East Elliott
El Cerrito
Gateway
Golden Hill
Grant Hill
Harbor Island
Harborview
Hillcrest
Kearny Mesa
Kensington
La Jolla
Lake Murray
Linda Vista
Logan Heights
Marston Hills
Memorial
Midtown
Mira Mesa
Miramar
Mission Beach
Mission Hills
Mission Valley
Civita
Morena
Navajo
Nestor
Normal Heights
North City
North Park
North Clairemont
Oak Park
Ocean Beach
Ocean Crest
Ocean View Hills
Old Town
Otay Mesa
Otay Mesa West
Pacific Beach
Pacific Highlands Ranch
Palm City
Point Loma
Rancho Bernardo
Rancho Encantada
Rancho Peñasquitos
Redwood Village
Rolando
Rolando Park
Sabre Springs
San Pasqual Valley
Torrey Hills
University City
University Heights
Uptown
Webster
San Ysidro
Scripps Ranch
Miramar Ranch North
Scripps Miramar Ranch
Serra Mesa
Shelter Island
Sherman Heights
Sorren

In [102]:
print(san_diego_venues.shape)
san_diego_venues.head()

(1715, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Balboa Park,37.724949,-122.444805,Roxie Food Center,37.726867,-122.441398,Sandwich Place
1,Balboa Park,37.724949,-122.444805,Pineapples,37.723219,-122.443128,Dessert Shop
2,Balboa Park,37.724949,-122.444805,Balboa Park,37.725014,-122.443879,Park
3,Balboa Park,37.724949,-122.444805,AJ's Barbeque Cafe,37.720757,-122.44628,BBQ Joint
4,Balboa Park,37.724949,-122.444805,City College: Community Health & Wellness Center,37.723996,-122.449445,College Gym


In [103]:
san_diego_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alta Vista,2,2,2,2,2,2
Balboa Park,15,15,15,15,15,15
Bankers Hill,25,25,25,25,25,25
Barrio Logan,32,32,32,32,32,32
Bay Park,17,17,17,17,17,17
Birdland,3,3,3,3,3,3
Black Mountain Ranch,2,2,2,2,2,2
Broadway Heights,4,4,4,4,4,4
Burlingame,34,34,34,34,34,34
Carmel Mountain Ranch,64,64,64,64,64,64


In [104]:
print('There are {} uniques categories.'.format(len(san_diego_venues['Venue Category'].unique())))


There are 271 uniques categories.


Analyze Each Neighborhood

In [105]:
# one hot encoding
san_diego_onehot = pd.get_dummies(san_diego_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
san_diego_onehot['Neighbourhood'] = san_diego_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [san_diego_onehot.columns[-1]] + list(san_diego_onehot.columns[:-1])
san_diego_onehot = san_diego_onehot[fixed_columns]

san_diego_onehot.head()

Unnamed: 0,Neighbourhood,ATM,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,...,Video Store,Vietnamese Restaurant,Water Park,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,Balboa Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Balboa Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Balboa Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Balboa Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Balboa Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [106]:
san_diego_onehot.shape

(1715, 272)

In [107]:
san_diego_grouped = san_diego_onehot.groupby('Neighbourhood').mean().reset_index()
san_diego_grouped

Unnamed: 0,Neighbourhood,ATM,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,...,Video Store,Vietnamese Restaurant,Water Park,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,Alta Vista,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
1,Balboa Park,0.000000,0.000000,0.00000,0.066667,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
2,Bankers Hill,0.000000,0.000000,0.00000,0.080000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.040000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
3,Barrio Logan,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.062500,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
4,Bay Park,0.000000,0.000000,0.00000,0.058824,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
5,Birdland,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
6,Black Mountain Ranch,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
7,Broadway Heights,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
8,Burlingame,0.000000,0.000000,0.00000,0.029412,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.029412,0.000000,0.000000,0.0,0.000000,0.0
9,Carmel Mountain Ranch,0.015625,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.015625,...,0.015625,0.000000,0.0,0.000000,0.000000,0.000000,0.015625,0.0,0.000000,0.0


In [108]:
san_diego_grouped.shape

(91, 272)

In [109]:
num_top_venues = 5

for hood in san_diego_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = san_diego_grouped[san_diego_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alta Vista----
               venue  freq
0       Home Service   0.5
1              River   0.5
2  Other Repair Shop   0.0
3       Optical Shop   0.0
4       Noodle House   0.0


----Balboa Park----
                venue  freq
0                Pool  0.07
1  Light Rail Station  0.07
2        Tennis Court  0.07
3      Sandwich Place  0.07
4         College Gym  0.07


----Bankers Hill----
                  venue  freq
0                   Spa  0.08
1   American Restaurant  0.08
2               Gay Bar  0.04
3       Bed & Breakfast  0.04
4  Gym / Fitness Center  0.04


----Barrio Logan----
                venue  freq
0  Mexican Restaurant  0.12
1         Coffee Shop  0.09
2             Brewery  0.09
3         Art Gallery  0.06
4       Grocery Store  0.06


----Bay Park----
                     venue  freq
0                      Spa  0.12
1                    Beach  0.12
2  New American Restaurant  0.06
3                      Bay  0.06
4                      Gym  0.06


----Birdland----

                venue  freq
0      Breakfast Spot  0.14
1               Beach  0.09
2   Recreation Center  0.05
3  Falafel Restaurant  0.05
4          Steakhouse  0.05


----Mission Hills----
                    venue  freq
0                     Spa  0.25
1  Furniture / Home Store  0.25
2            Liquor Store  0.25
3               Gift Shop  0.25
4                     ATM  0.00


----Mission Valley----
                    venue  freq
0             Coffee Shop  0.07
1  Furniture / Home Store  0.04
2          Cosmetics Shop  0.04
3      Seafood Restaurant  0.03
4      Mexican Restaurant  0.03


----Morena----
         venue  freq
0        Beach  0.07
1          Spa  0.07
2  Pizza Place  0.07
3   Restaurant  0.07
4   Sports Bar  0.04


----Mountain View----
              venue  freq
0       Coffee Shop  0.07
1  Sushi Restaurant  0.05
2            Bakery  0.05
3              Park  0.05
4      Optical Shop  0.04


----Mt. Hope----
                venue  freq
0        Home Service  0.25
1

In [110]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [111]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = san_diego_grouped['Neighbourhood']

for ind in np.arange(san_diego_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(san_diego_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alta Vista,River,Home Service,Zoo Exhibit,Fast Food Restaurant,Ethiopian Restaurant,Exhibit,Falafel Restaurant,Farm,Farmers Market,Field
1,Balboa Park,Light Rail Station,Sandwich Place,College Gym,Gym Pool,BBQ Joint,Bus Line,Metro Station,Asian Restaurant,Tennis Court,Pool
2,Bankers Hill,American Restaurant,Spa,Mexican Restaurant,Dive Bar,Sushi Restaurant,Taco Place,Marijuana Dispensary,Bed & Breakfast,Motel,Middle Eastern Restaurant
3,Barrio Logan,Mexican Restaurant,Coffee Shop,Brewery,Art Gallery,Grocery Store,Mobile Phone Shop,Theater,Taco Place,Flea Market,Hot Dog Joint
4,Bay Park,Spa,Beach,Mexican Restaurant,Café,Bay,Restaurant,Butcher,New American Restaurant,Gym,Bike Trail


In [132]:
san_diego_grouped

Unnamed: 0,Neighbourhood,ATM,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,...,Video Store,Vietnamese Restaurant,Water Park,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,Alta Vista,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
1,Balboa Park,0.000000,0.000000,0.00000,0.066667,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
2,Bankers Hill,0.000000,0.000000,0.00000,0.080000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.040000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
3,Barrio Logan,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.062500,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
4,Bay Park,0.000000,0.000000,0.00000,0.058824,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
5,Birdland,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
6,Black Mountain Ranch,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
7,Broadway Heights,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
8,Burlingame,0.000000,0.000000,0.00000,0.029412,0.0,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.029412,0.000000,0.000000,0.0,0.000000,0.0
9,Carmel Mountain Ranch,0.015625,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.015625,...,0.015625,0.000000,0.0,0.000000,0.000000,0.000000,0.015625,0.0,0.000000,0.0


In [128]:
# set number of clusters
kclusters = 9
kmeans=''
san_diego_grouped_clustering = san_diego_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(san_diego_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 0, 1, 0, 6, 1, 1, 1], dtype=int32)

In [133]:
san_diego_grouped_clustering

Unnamed: 0,ATM,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,...,Video Store,Vietnamese Restaurant,Water Park,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
1,0.000000,0.000000,0.00000,0.066667,0.0,0.000000,0.000000,0.000000,0.000000,0.066667,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
2,0.000000,0.000000,0.00000,0.080000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.040000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
3,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.062500,0.000000,0.031250,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
4,0.000000,0.000000,0.00000,0.058824,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
5,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
6,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
7,0.000000,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0
8,0.000000,0.000000,0.00000,0.029412,0.0,0.000000,0.000000,0.000000,0.000000,0.029412,...,0.000000,0.000000,0.0,0.000000,0.029412,0.000000,0.000000,0.0,0.000000,0.0
9,0.015625,0.000000,0.00000,0.000000,0.0,0.000000,0.000000,0.000000,0.015625,0.000000,...,0.015625,0.000000,0.0,0.000000,0.000000,0.000000,0.015625,0.0,0.000000,0.0


In [137]:
# add clustering labels

#del neighbourhoods_venues_sorted['Cluster Labels']
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

san_diego_merged = san_diego_neighborhoods_df

# merge san_diego_grouped with san_diego_data to add latitude/longitude for each neighborhood
san_diego_merged = san_diego_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

san_diego_merged

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Balboa Park,37.724949,-122.444805,1.0,Light Rail Station,Sandwich Place,College Gym,Gym Pool,BBQ Joint,Bus Line,Metro Station,Asian Restaurant,Tennis Court,Pool
1,Bankers Hill,32.726073,-117.161225,1.0,American Restaurant,Spa,Mexican Restaurant,Dive Bar,Sushi Restaurant,Taco Place,Marijuana Dispensary,Bed & Breakfast,Motel,Middle Eastern Restaurant
2,Barrio Logan,32.697552,-117.141976,0.0,Mexican Restaurant,Coffee Shop,Brewery,Art Gallery,Grocery Store,Mobile Phone Shop,Theater,Taco Place,Flea Market,Hot Dog Joint
3,Bay Ho,36.701463,-118.755997,,,,,,,,,,,
4,Bay Park,32.781716,-117.206424,1.0,Spa,Beach,Mexican Restaurant,Café,Bay,Restaurant,Butcher,New American Restaurant,Gym,Bike Trail
5,Birdland,32.788292,-117.156223,0.0,Intersection,Massage Studio,Doctor's Office,Zoo Exhibit,Field,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
6,Black Mountain Ranch,32.984117,-117.131932,6.0,Park,Zoo Exhibit,Field,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Electronics Store
7,Border,37.534690,-122.508296,,,,,,,,,,,
8,Burlingame,37.584103,-122.366083,1.0,Japanese Restaurant,Breakfast Spot,Sandwich Place,Italian Restaurant,Coffee Shop,Korean Restaurant,Thai Restaurant,Bank,Grocery Store,Greek Restaurant
9,Carmel Mountain Ranch,32.980393,-117.078364,1.0,Fast Food Restaurant,Mexican Restaurant,Coffee Shop,Juice Bar,Chinese Restaurant,Pizza Place,BBQ Joint,Clothing Store,Cosmetics Shop,Burger Joint


In [138]:
san_diego_merged.dropna(inplace=True)
san_diego_merged


Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Balboa Park,37.724949,-122.444805,1.0,Light Rail Station,Sandwich Place,College Gym,Gym Pool,BBQ Joint,Bus Line,Metro Station,Asian Restaurant,Tennis Court,Pool
1,Bankers Hill,32.726073,-117.161225,1.0,American Restaurant,Spa,Mexican Restaurant,Dive Bar,Sushi Restaurant,Taco Place,Marijuana Dispensary,Bed & Breakfast,Motel,Middle Eastern Restaurant
2,Barrio Logan,32.697552,-117.141976,0.0,Mexican Restaurant,Coffee Shop,Brewery,Art Gallery,Grocery Store,Mobile Phone Shop,Theater,Taco Place,Flea Market,Hot Dog Joint
4,Bay Park,32.781716,-117.206424,1.0,Spa,Beach,Mexican Restaurant,Café,Bay,Restaurant,Butcher,New American Restaurant,Gym,Bike Trail
5,Birdland,32.788292,-117.156223,0.0,Intersection,Massage Studio,Doctor's Office,Zoo Exhibit,Field,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
6,Black Mountain Ranch,32.984117,-117.131932,6.0,Park,Zoo Exhibit,Field,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Electronics Store
8,Burlingame,37.584103,-122.366083,1.0,Japanese Restaurant,Breakfast Spot,Sandwich Place,Italian Restaurant,Coffee Shop,Korean Restaurant,Thai Restaurant,Bank,Grocery Store,Greek Restaurant
9,Carmel Mountain Ranch,32.980393,-117.078364,1.0,Fast Food Restaurant,Mexican Restaurant,Coffee Shop,Juice Bar,Chinese Restaurant,Pizza Place,BBQ Joint,Clothing Store,Cosmetics Shop,Burger Joint
10,Carmel Valley,36.479684,-121.732448,1.0,Wine Bar,Mexican Restaurant,French Restaurant,Winery,American Restaurant,Hotel,Steakhouse,Café,Thai Restaurant,Gourmet Shop
11,City Heights,32.749728,-117.101029,1.0,Pizza Place,Chinese Restaurant,Mexican Restaurant,Vietnamese Restaurant,Grocery Store,Sandwich Place,ATM,Farmers Market,Taco Place,Liquor Store


In [142]:
san_diego_merged['Cluster Labels'] = san_diego_merged['Cluster Labels'].astype(int)
san_diego_merged

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Balboa Park,37.724949,-122.444805,1,Light Rail Station,Sandwich Place,College Gym,Gym Pool,BBQ Joint,Bus Line,Metro Station,Asian Restaurant,Tennis Court,Pool
1,Bankers Hill,32.726073,-117.161225,1,American Restaurant,Spa,Mexican Restaurant,Dive Bar,Sushi Restaurant,Taco Place,Marijuana Dispensary,Bed & Breakfast,Motel,Middle Eastern Restaurant
2,Barrio Logan,32.697552,-117.141976,0,Mexican Restaurant,Coffee Shop,Brewery,Art Gallery,Grocery Store,Mobile Phone Shop,Theater,Taco Place,Flea Market,Hot Dog Joint
4,Bay Park,32.781716,-117.206424,1,Spa,Beach,Mexican Restaurant,Café,Bay,Restaurant,Butcher,New American Restaurant,Gym,Bike Trail
5,Birdland,32.788292,-117.156223,0,Intersection,Massage Studio,Doctor's Office,Zoo Exhibit,Field,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
6,Black Mountain Ranch,32.984117,-117.131932,6,Park,Zoo Exhibit,Field,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Electronics Store
8,Burlingame,37.584103,-122.366083,1,Japanese Restaurant,Breakfast Spot,Sandwich Place,Italian Restaurant,Coffee Shop,Korean Restaurant,Thai Restaurant,Bank,Grocery Store,Greek Restaurant
9,Carmel Mountain Ranch,32.980393,-117.078364,1,Fast Food Restaurant,Mexican Restaurant,Coffee Shop,Juice Bar,Chinese Restaurant,Pizza Place,BBQ Joint,Clothing Store,Cosmetics Shop,Burger Joint
10,Carmel Valley,36.479684,-121.732448,1,Wine Bar,Mexican Restaurant,French Restaurant,Winery,American Restaurant,Hotel,Steakhouse,Café,Thai Restaurant,Gourmet Shop
11,City Heights,32.749728,-117.101029,1,Pizza Place,Chinese Restaurant,Mexican Restaurant,Vietnamese Restaurant,Grocery Store,Sandwich Place,ATM,Farmers Market,Taco Place,Liquor Store


In [143]:
san_diego_merged.groupby('Cluster Labels').count()

Unnamed: 0_level_0,Neighbourhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0,17,17,17,17,17,17,17,17,17,17,17,17,17
1,63,63,63,63,63,63,63,63,63,63,63,63,63
2,3,3,3,3,3,3,3,3,3,3,3,3,3
3,1,1,1,1,1,1,1,1,1,1,1,1,1
4,1,1,1,1,1,1,1,1,1,1,1,1,1
5,1,1,1,1,1,1,1,1,1,1,1,1,1
6,3,3,3,3,3,3,3,3,3,3,3,3,3
7,1,1,1,1,1,1,1,1,1,1,1,1,1
8,1,1,1,1,1,1,1,1,1,1,1,1,1


In [144]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(san_diego_merged['Latitude'], san_diego_merged['Longitude'], san_diego_merged['Neighbourhood'], san_diego_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [145]:
san_diego_merged.loc[san_diego_merged['Cluster Labels'] == 0, san_diego_merged.columns[[1] + list(range(5, san_diego_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,32.697552,Coffee Shop,Brewery,Art Gallery,Grocery Store,Mobile Phone Shop,Theater,Taco Place,Flea Market,Hot Dog Joint
5,32.788292,Massage Studio,Doctor's Office,Zoo Exhibit,Field,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
21,32.709496,Discount Store,Indie Theater,BBQ Joint,Intersection,Historic Site,Fast Food Restaurant,Park,Restaurant,Bus Station
30,32.699219,Fast Food Restaurant,Baseball Field,Coffee Shop,Skate Park,Frozen Yogurt Shop,Sandwich Place,Flower Shop,Chinese Restaurant,Grocery Store
42,32.575921,Motel,Convenience Store,Gas Station,Diner,Fast Food Restaurant,Laundromat,Auto Workshop,Flea Market,Pizza Place
52,32.560058,Coffee Shop,Sandwich Place,Hotel,Marijuana Dispensary,Mexican Restaurant,Fruit & Vegetable Store,Falafel Restaurant,Farm,Farmers Market
53,32.573391,Convenience Store,Mexican Restaurant,Fried Chicken Joint,Video Store,Market,Park,Zoo Exhibit,Fast Food Restaurant,Exhibit
55,32.969319,Zoo Exhibit,Fish & Chips Shop,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fishing Spot
71,32.552001,Motel,Insurance Office,Video Store,Market,Marijuana Dispensary,Gas Station,Liquor Store,Diner,Food
77,32.710607,Hotel,Bar,Park,Discount Store,Coffee Shop,Restaurant,Seafood Restaurant,Fast Food Restaurant,Spa


In [146]:
san_diego_merged.loc[san_diego_merged['Cluster Labels'] == 1, san_diego_merged.columns[[1] + list(range(5, san_diego_merged.shape[1]))]]



Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,37.724949,Sandwich Place,College Gym,Gym Pool,BBQ Joint,Bus Line,Metro Station,Asian Restaurant,Tennis Court,Pool
1,32.726073,Spa,Mexican Restaurant,Dive Bar,Sushi Restaurant,Taco Place,Marijuana Dispensary,Bed & Breakfast,Motel,Middle Eastern Restaurant
4,32.781716,Beach,Mexican Restaurant,Café,Bay,Restaurant,Butcher,New American Restaurant,Gym,Bike Trail
8,37.584103,Breakfast Spot,Sandwich Place,Italian Restaurant,Coffee Shop,Korean Restaurant,Thai Restaurant,Bank,Grocery Store,Greek Restaurant
9,32.980393,Mexican Restaurant,Coffee Shop,Juice Bar,Chinese Restaurant,Pizza Place,BBQ Joint,Clothing Store,Cosmetics Shop,Burger Joint
10,36.479684,Mexican Restaurant,French Restaurant,Winery,American Restaurant,Hotel,Steakhouse,Café,Thai Restaurant,Gourmet Shop
11,32.749728,Chinese Restaurant,Mexican Restaurant,Vietnamese Restaurant,Grocery Store,Sandwich Place,ATM,Farmers Market,Taco Place,Liquor Store
12,32.797271,Convenience Store,BBQ Joint,Pizza Place,Dive Bar,Coffee Shop,Chinese Restaurant,Café,Shipping Store,Fast Food Restaurant
13,10.481162,Italian Restaurant,Sushi Restaurant,Nightclub,Mediterranean Restaurant,Spanish Restaurant,Ice Cream Shop,Asian Restaurant,Café,Burger Joint
14,32.948378,Yoga Studio,Hotel,American Restaurant,Beach,Park,Surf Spot,Fish & Chips Shop,Falafel Restaurant,Farm


In [147]:
san_diego_merged.loc[san_diego_merged['Cluster Labels'] == 2, san_diego_merged.columns[[1] + list(range(5, san_diego_merged.shape[1]))]]



Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,32.941434,Zoo Exhibit,Field,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Electronics Store
26,37.91048,Park,Playground,Trail,Ethiopian Restaurant,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
64,32.953379,Trail,Park,Electronics Store,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field


In [148]:
san_diego_merged.loc[san_diego_merged['Cluster Labels'] == 3, san_diego_merged.columns[[1] + list(range(5, san_diego_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
67,32.882615,Field,Ethiopian Restaurant,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Zoo Exhibit,Indian Restaurant


In [149]:
san_diego_merged.loc[san_diego_merged['Cluster Labels'] == 4, san_diego_merged.columns[[1] + list(range(5, san_diego_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
70,38.562128,Electronics Store,Food & Drink Shop,Food,Fondue Restaurant,Flower Shop,Flea Market,Fishing Spot,Fish & Chips Shop,Field


In [150]:
san_diego_merged.loc[san_diego_merged['Cluster Labels'] == 5, san_diego_merged.columns[[1] + list(range(5, san_diego_merged.shape[1]))]]



Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,33.61061,Zoo Exhibit,Field,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Electronics Store


In [151]:
san_diego_merged.loc[san_diego_merged['Cluster Labels'] == 6, san_diego_merged.columns[[1] + list(range(5, san_diego_merged.shape[1]))]]



Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,32.984117,Zoo Exhibit,Field,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Electronics Store
59,32.924815,Zoo Exhibit,Field,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Electronics Store
61,33.457136,Park,Field,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Zoo Exhibit,Ethiopian Restaurant


In [152]:
san_diego_merged.loc[san_diego_merged['Cluster Labels'] == 7, san_diego_merged.columns[[1] + list(range(5, san_diego_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,32.8335,Zoo Exhibit,Fish & Chips Shop,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fishing Spot


In [154]:
san_diego_merged.loc[san_diego_merged['Cluster Labels'] == 8, san_diego_merged.columns[[1] + list(range(5, san_diego_merged.shape[1]))]]


Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
41,32.800908,Field,Ethiopian Restaurant,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Indian Restaurant


### Results
Unfortunately the results are less than optimal.  When using the clustering program we get a suboptimal result that groups both the Los Angeles and San Diego in two or three major clusters but then the rest of the clusters are much smaller or single outliers.

When reviewing the major groupings we get three major groups.

One cluster representing a large majority of neighborhoods that correspond to a urbanized city cluster

One Cluster representing a middle or lower middle income level neighborhood, and

Once Cluster that represents a more suburban neighborhood with more greenspace.

Other Clusters seem to be outliers without a large enough sample size to generalized a particular character to the neighborhood.

This is not an acceptable result as it does not provide anything but a common knowledge level of understanding to our prospective home buyer.

### Next Steps
The next steps to this project would to be evaluate and eliminate the outliers that are causing the low information clusters.  Once we resolve these low information data points that are skewing our results we can see if our we can get a more nuanced analysis of the neighborhoods that would be of more use to our prospective home buyers.

### Conclusions
A further investigation of the data is required before a working model coud be developed or if this methodology can provide the answers and tools we are looking for.