### Where is the recommended place to open a restaurant, a coffee shop, or to start a new business?

## Introduction

In this project, I will provide a general guide about where you should open a restaruant, a coffee shop, or to setup a office for the new business based on . The **Foursquare API** is used to explore the neighborhoods in a particular city, and the **explore** function is used to get the most common venue categories in each neighborhood. After this project, you will get a general idea on determining the location for your business. 

## Table of Contents

#### 1. Import Dataset

#### 2. Analysis of the cities in Los Angeles, CA

#### 3. Analyze venue category in each city (Neighborhood) 

#### 4. Cluster Neighborhoods

#### 5. Examine Each of the Five Clusters

#### 6. Conclusions & Recommendation    
</font>
</div>

Import libraries that are used in this project

In [3]:
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

from sklearn.cluster import KMeans  # import k-means from clustering stage

import matplotlib.cm as cm   # Matplotlib and associated plotting modules
import matplotlib.colors as colors
from matplotlib import pyplot as plt

import folium # map rendering library

## 1. Import Dataset: Get all the neighborhood names of California from wikipedia

In [4]:
url_cal = 'https://en.wikipedia.org/wiki/List_of_cities_and_towns_in_California'
California_raw = pd.read_clipboard()

In [5]:
California_raw.head()

Unnamed: 0,Adelanto,City,San Bernardino,"31,765",56.01,145.1,"December 22, 1970"
0,Agoura Hills,City,Los Angeles,20330,7.79,20.2,"December 8, 1982"
1,Alameda,City,Alameda,73812,10.61,27.5,"April 19, 1854"
2,Albany,City,Alameda,18539,1.79,4.6,"September 22, 1908"
3,Alhambra,City,Los Angeles,83089,7.63,19.8,"July 11, 1903"
4,Aliso Viejo,City,Orange,47823,7.47,19.3,"July 1, 2001"


In [7]:
len(California_raw)

481

In [8]:
# move the title into the data
California_update = np.vstack([California_raw.columns, California_raw])
California_update = pd.DataFrame(California_update)
California_update.head()

Unnamed: 0,0,1,2,3,4,5,6
0,Adelanto,City,San Bernardino,31765,56.01,145.1,"December 22, 1970"
1,Agoura Hills,City,Los Angeles,20330,7.79,20.2,"December 8, 1982"
2,Alameda,City,Alameda,73812,10.61,27.5,"April 19, 1854"
3,Albany,City,Alameda,18539,1.79,4.6,"September 22, 1908"
4,Alhambra,City,Los Angeles,83089,7.63,19.8,"July 11, 1903"


In [9]:
California_update.sort_values([2, 0]).head()
California_update.columns = ['Name', 
                             'Type', 
                             'County', 
                             'Population(2010)', 
                             'Land area (sq mi)', 
                             'Land area (km^2)', 
                             'Incorporated']

In [10]:
California_update.head()

Unnamed: 0,Name,Type,County,Population(2010),Land area (sq mi),Land area (km^2),Incorporated
0,Adelanto,City,San Bernardino,31765,56.01,145.1,"December 22, 1970"
1,Agoura Hills,City,Los Angeles,20330,7.79,20.2,"December 8, 1982"
2,Alameda,City,Alameda,73812,10.61,27.5,"April 19, 1854"
3,Albany,City,Alameda,18539,1.79,4.6,"September 22, 1908"
4,Alhambra,City,Los Angeles,83089,7.63,19.8,"July 11, 1903"


Get the total number of counties and their cororesponding number of cities (or towns) within them.

In [11]:
total_counties = California_update['County'].value_counts()
len(total_counties)

55

There are a total of 55 counties, and 481 cities in Los Angeles. Instead of focusing on all of the 481 cities from all the 55 counties, I will only focus on Log Angeles, which is famous for its diversity. I am interested to explore how many venues categories are there, what are these categories, and how they are distributed.

Get the cities is Los Angeles.

In [190]:
# Notes: create a dataframe for the top 5 counties, where the cities are not index

#df1 = pd.DataFrame(data = top_5_counties.index, columns = ['County'])
#df2 = pd.DataFrame(data = top_5_counties.values, columns = ['Number of Cities'])
#top_5_counties = pd.concat([df1, df2], axis = 1)

# Notes: get the cities (or towns) when county is in ['Los Angeles', 'Orange']
# California_update[California_update['County'].isin(['Los Angeles', 'Orange'])]

# California_df = California_update[California_update['County'].isin(top_5_counties.index)]
# California_df.head()

In [12]:
# get the cities (or towns) when county = 'Log Angeles'
Los_Angeles_df = California_update[California_update['County'] == 'Los Angeles']

In [13]:
Los_Angeles_df.head()

Unnamed: 0,Name,Type,County,Population(2010),Land area (sq mi),Land area (km^2),Incorporated
1,Agoura Hills,City,Los Angeles,20330,7.79,20.2,"December 8, 1982"
4,Alhambra,City,Los Angeles,83089,7.63,19.8,"July 11, 1903"
14,Arcadia,City,Los Angeles,56364,10.93,28.3,"August 5, 1903"
17,Artesia,City,Los Angeles,16522,1.62,4.2,"May 29, 1959"
23,Avalon,City,Los Angeles,3728,2.94,7.6,"June 26, 1913"


In [15]:
len(Los_Angeles_df)

88

There are 88 cities in Log Angeles.

## 2. Analysis of the cities in Los Angeles

#### Define Foursquare Credentials and Version

In [36]:
CLIENT_ID = 'YTSPDYDJGJEEEWEPD3TYZILXIVRPQVRUM5JEBV3ZRVP4PHMP' # my Foursquare ID
CLIENT_SECRET = 'LFNOHUGVR4K2C23MEKLZ1RLZUOLU1EHL434LKTKPCPIVVRXZ' # my Foursquare Secret

VERSION = '20190105' # Foursquare API version

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: YTSPDYDJGJEEEWEPD3TYZILXIVRPQVRUM5JEBV3ZRVP4PHMP
CLIENT_SECRET:LFNOHUGVR4K2C23MEKLZ1RLZUOLU1EHL434LKTKPCPIVVRXZ


#### Use geopy library to get the latitude and longitude values for all the cities (towns) in California.

In [37]:
Latitude = []
Longitude = []
for address in Los_Angeles_df['Name']:
    geolocator = Nominatim(user_agent = 'specify_your_app_name_here')
    try:   
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
    except:
        location = np.nan  # assign lat, lng with NaN if the data are not avaialbel
        latitude = np.nan
        longitude = np.nan
    Latitude.append(latitude)
    Longitude.append(longitude)
#print(Latitude)
#print(Longitude)

In [38]:
Los_Angeles_df.loc[:, 'Latitude'] = Latitude
Los_Angeles_df.loc[:, 'Longitude'] = Longitude

In [283]:
# latitude = [x for x in Latitude if x != np.NaN]
# latitude = Latitude_copy.remove('np.nan')

In [39]:
Los_Angeles_df.head()

Unnamed: 0,Name,Type,County,Population(2010),Land area (sq mi),Land area (km^2),Incorporated,Latitude,Longitude
0,Agoura Hills,City,Los Angeles,20330,7.79,20.2,"December 8, 1982",34.136395,-118.774535
1,Alhambra,City,Los Angeles,83089,7.63,19.8,"July 11, 1903",37.176036,-3.587974
2,Arcadia,City,Los Angeles,56364,10.93,28.3,"August 5, 1903",34.136207,-118.04015
3,Artesia,City,Los Angeles,16522,1.62,4.2,"May 29, 1959",33.86902,-118.07962
4,Avalon,City,Los Angeles,3728,2.94,7.6,"June 26, 1913",47.488537,3.907066


Check missing values

In [40]:
Los_Angeles_df.isnull().sum()

Name                 0
Type                 0
County               0
Population(2010)     0
Land area (sq mi)    0
Land area (km^2)     0
Incorporated         0
Latitude             0
Longitude            0
dtype: int64

Remove the rows with missing values (cities where the longitide and latitudes are not availabel)

In [41]:
Los_Angeles_df = Los_Angeles_df.dropna().reset_index(drop = True)  # dropna() drop all rows that have any NaN values
Los_Angeles_df.head()

Unnamed: 0,Name,Type,County,Population(2010),Land area (sq mi),Land area (km^2),Incorporated,Latitude,Longitude
0,Agoura Hills,City,Los Angeles,20330,7.79,20.2,"December 8, 1982",34.136395,-118.774535
1,Alhambra,City,Los Angeles,83089,7.63,19.8,"July 11, 1903",37.176036,-3.587974
2,Arcadia,City,Los Angeles,56364,10.93,28.3,"August 5, 1903",34.136207,-118.04015
3,Artesia,City,Los Angeles,16522,1.62,4.2,"May 29, 1959",33.86902,-118.07962
4,Avalon,City,Los Angeles,3728,2.94,7.6,"June 26, 1913",47.488537,3.907066


In [42]:
len(Los_Angeles_df)

86

In [43]:
Los_Angeles_df.isnull().sum()

Name                 0
Type                 0
County               0
Population(2010)     0
Land area (sq mi)    0
Land area (km^2)     0
Incorporated         0
Latitude             0
Longitude            0
dtype: int64

### Part 1: Create a map to visualize the cities in Los Angeles

In [44]:
# create map of all cities in Los Angeles
map_Los_Angeles = folium.Map(location = [latitude, longitude])

# add markers to map
for lat, lng, County, City in zip(Los_Angeles_df['Latitude'], 
                                  Los_Angeles_df['Longitude'], 
                                  Los_Angeles_df['County'], 
                                  Los_Angeles_df['Name']):
    label = '{}, {}'.format(City, County)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
    [lat, lng], 
    radius = 5, 
    popup = label,
    fill = True, 
    fill_color = '#3186cc',
    fill_opacity = 0.7, 
    parse_html = False).add_to(map_Los_Angeles)
    
map_Los_Angeles

**Folium** is a great visualization library. It's available to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its FM.

However, this **folium** map containing **all cities in Los Angeles**, may not available on **GitHub**.

### Part 2: Explore the venues from the first city in Los Angeles step by step

Get the first neighborhood's name

In [45]:
Los_Angeles_df.loc[0, 'Name'] # get the information of the first row, column = 'Name'

'Agoura Hills'

Get the neighborhood's latitude and longitude

In [46]:
neighborhood_latitude = Los_Angeles_df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Los_Angeles_df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = Los_Angeles_df.loc[0, 'Name'] # neighborhood name

print('The latitude and longitude of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

The latitude and longitude of Agoura Hills are 34.1363945, -118.7745348.


#### Extract the top 500 venues that are in this neighborhood within a radius of 1000 meters.

##### **Step 1**: Get the url for the Foursqurae API

In [51]:
LIMIT = 500 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius in meters
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=YTSPDYDJGJEEEWEPD3TYZILXIVRPQVRUM5JEBV3ZRVP4PHMP&client_secret=LFNOHUGVR4K2C23MEKLZ1RLZUOLU1EHL434LKTKPCPIVVRXZ&v=20190105&ll=34.1363945,-118.7745348&radius=1000&limit=500'

##### **Step 2**: Send the GET request and extract the information step by step

**a.** get the overal results

In [52]:
result = requests.get(url).json()
result.keys()

dict_keys(['meta', 'response'])

In [53]:
result

{'meta': {'code': 200, 'requestId': '5c3279d79fb6b7151562ee0f'},
 'response': {'headerLocation': 'Agoura Hills',
  'headerFullLocation': 'Agoura Hills',
  'headerLocationGranularity': 'city',
  'totalResults': 7,
  'suggestedBounds': {'ne': {'lat': 34.14539450900001,
    'lng': -118.76368163940467},
   'sw': {'lat': 34.12739449099999, 'lng': -118.78538796059533}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b1ae736f964a52055f423e3',
       'name': 'Sheraton Agoura Hills Hotel',
       'location': {'address': '30100 Agoura Rd',
        'crossStreet': 'Reyes Adobe Rd',
        'lat': 34.1443363,
        'lng': -118.7794874,
        'labeledLatLngs': [{'label': 'display',
          'lat': 34.1443363,
          'lng': -118.7794874}],
        'distance': 994,
        'po

**b.** get the venues from the **key: response**

In [54]:
# display the informaiton for the first two venues
venues = result['response']['groups'][0]['items']
venues[0:1]

[{'reasons': {'count': 0,
   'items': [{'summary': 'This spot is popular',
     'type': 'general',
     'reasonName': 'globalInteractionReason'}]},
  'venue': {'id': '4b1ae736f964a52055f423e3',
   'name': 'Sheraton Agoura Hills Hotel',
   'location': {'address': '30100 Agoura Rd',
    'crossStreet': 'Reyes Adobe Rd',
    'lat': 34.1443363,
    'lng': -118.7794874,
    'labeledLatLngs': [{'label': 'display',
      'lat': 34.1443363,
      'lng': -118.7794874}],
    'distance': 994,
    'postalCode': '91301',
    'cc': 'US',
    'city': 'Agoura Hills',
    'state': 'CA',
    'country': 'United States',
    'formattedAddress': ['30100 Agoura Rd (Reyes Adobe Rd)',
     'Agoura Hills, CA 91301',
     'United States']},
   'categories': [{'id': '4bf58dd8d48988d1fa931735',
     'name': 'Hotel',
     'pluralName': 'Hotels',
     'shortName': 'Hotel',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/hotel_',
      'suffix': '.png'},
     'primary': True}],
   'photos': {'c

**c.** flatten JSON and structure it into a **pandas** dataframe.

In [55]:
nearby_venues = json_normalize(venues)
nearby_venues.head()

Unnamed: 0,reasons.count,reasons.items,referralId,venue.categories,venue.id,venue.location.address,venue.location.cc,venue.location.city,venue.location.country,venue.location.crossStreet,venue.location.distance,venue.location.formattedAddress,venue.location.labeledLatLngs,venue.location.lat,venue.location.lng,venue.location.postalCode,venue.location.state,venue.name,venue.photos.count,venue.photos.groups
0,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4b1ae736f964a52055f423e3-0,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",4b1ae736f964a52055f423e3,30100 Agoura Rd,US,Agoura Hills,United States,Reyes Adobe Rd,994,"[30100 Agoura Rd (Reyes Adobe Rd), Agoura Hill...","[{'label': 'display', 'lat': 34.1443363, 'lng'...",34.144336,-118.779487,91301.0,CA,Sheraton Agoura Hills Hotel,0,[]
1,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4bd4c4c629eb9c74325792e1-1,"[{'id': '52e81612bcbc57f1066b7a10', 'name': 'S...",4bd4c4c629eb9c74325792e1,,US,Agoura Hills,United States,,871,"[Agoura Hills, CA 91301, United States]","[{'label': 'display', 'lat': 34.14365209612649...",34.143652,-118.770992,91301.0,CA,Camp Kinneret,0,[]
2,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-51847d44498ee416278bf743-2,"[{'id': '4bf58dd8d48988d1d5941735', 'name': 'H...",51847d44498ee416278bf743,,US,Agoura Hills,United States,,961,"[Agoura Hills, CA, United States]","[{'label': 'display', 'lat': 34.14409893383497...",34.144099,-118.779251,,CA,Liquid Lounge At The Sheraton,0,[]
3,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4dd979f82271c5d36d6d727b-3,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",4dd979f82271c5d36d6d727b,vasa park,US,Agoura Hills,United States,,985,"[vasa park, Agoura Hills, CA 91301, United Sta...","[{'label': 'display', 'lat': 34.127699, 'lng':...",34.127699,-118.776552,91301.0,CA,Picnic De Chefs,0,[]
4,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-5099cb60e4b00d5fa66aa021-4,"[{'id': '4bf58dd8d48988d1d5941735', 'name': 'H...",5099cb60e4b00d5fa66aa021,,US,Agoura Hills,United States,,991,"[Agoura Hills, CA 91301, United States]","[{'label': 'display', 'lat': 34.14446640014648...",34.144466,-118.779076,91301.0,CA,The Sheraton Hotel Bar,0,[]


**d.** filter columns, extract the columns (or features) of interested, **'venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng'**

In [56]:
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

In [57]:
nearby_venues.head()

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,Sheraton Agoura Hills Hotel,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",34.144336,-118.779487
1,Camp Kinneret,"[{'id': '52e81612bcbc57f1066b7a10', 'name': 'S...",34.143652,-118.770992
2,Liquid Lounge At The Sheraton,"[{'id': '4bf58dd8d48988d1d5941735', 'name': 'H...",34.144099,-118.779251
3,Picnic De Chefs,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",34.127699,-118.776552
4,The Sheraton Hotel Bar,"[{'id': '4bf58dd8d48988d1d5941735', 'name': 'H...",34.144466,-118.779076


**e.** clear the **venue.categories**, extract the **name** from categories

#### Take the first venue as the example, extract the **name** from the categories

In [58]:
first_venue_cat = nearby_venues.loc[0, 'venue.categories']
first_venue_cat

[{'id': '4bf58dd8d48988d1fa931735',
  'name': 'Hotel',
  'pluralName': 'Hotels',
  'shortName': 'Hotel',
  'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/hotel_',
   'suffix': '.png'},
  'primary': True}]

The **venue.categories** is a list with one dict

In [59]:
first_venue_cat[0]['name']

'Hotel'

#### Extract the **name** from the categories for all venues

Create a new feature with only the **category.name** values for all venues

In [60]:
nearby_venues['category'] = [v[0]['name'] for v in nearby_venues['venue.categories']]
nearby_venues.head()

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng,category
0,Sheraton Agoura Hills Hotel,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",34.144336,-118.779487,Hotel
1,Camp Kinneret,"[{'id': '52e81612bcbc57f1066b7a10', 'name': 'S...",34.143652,-118.770992,Summer Camp
2,Liquid Lounge At The Sheraton,"[{'id': '4bf58dd8d48988d1d5941735', 'name': 'H...",34.144099,-118.779251,Hotel Bar
3,Picnic De Chefs,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",34.127699,-118.776552,Park
4,The Sheraton Hotel Bar,"[{'id': '4bf58dd8d48988d1d5941735', 'name': 'H...",34.144466,-118.779076,Hotel Bar


**f.** clear the columns, ane remove the unnecessary column **venue.categories**

In [61]:
nearby_venues.rename(columns = {'venue.name':'name', 'venue.location.lat':'lat', 'venue.location.lng':'lng'}, inplace = True)

In [62]:
nearby_venues.drop(['venue.categories'], axis = 1, inplace = True)

In [104]:
# change the orders of the columns
column_order = ['name', 'category', 'lat', 'lng']
nearby_venues[column_order]

Unnamed: 0,name,category,lat,lng
0,Sheraton Agoura Hills Hotel,Hotel,34.144336,-118.779487
1,Camp Kinneret,Summer Camp,34.143652,-118.770992
2,Liquid Lounge At The Sheraton,Hotel Bar,34.144099,-118.779251
3,Picnic De Chefs,Park,34.127699,-118.776552
4,The Sheraton Hotel Bar,Hotel Bar,34.144466,-118.779076
5,Sheraton Gym,Gym,34.144531,-118.778972
6,H2O Sea Grill,American Restaurant,34.144484,-118.779251


There are 7 venues belonging to the city **Agoura Hills** in Los Angeles, we can notice that the lat and lng are very close, but different. This make sense that these 7 venues are within 1000 meters around the city **Agoura Hills**.

In [64]:
# create map of the venues from the first city (neighborhood)in California using latitude and longitude values
map_first_Los_Angeles = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, category, Name in zip(nearby_venues['lat'], nearby_venues['lng'], nearby_venues['category'], nearby_venues['name']):
    label = '{}, {}'.format(Name, category)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_first_Los_Angeles)  
    
map_first_Los_Angeles

Ths **folium** map containing all venues in city **Agoura Hills** in Los Angeles, may not available on **GitHub**.

### Part 3: Explore venues from all cities in Los Angeles

#### Create a function to get the venues of all cities in Los Angeles

In [109]:
LIMIT = 500 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius with meters

def getNearbyVenues(names, latitudes, longitudes, radius = 1000):
    venues_list = []
    
    # use loop funciton to visit each of the city, to get the lat, lng
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        # print(url)

        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # acquire the relevant informaiton for each nearby venue
        venues_list.append([(name, lat, lng,
          v['venue']['name'], 
          v['venue']['location']['lat'],
          v['venue']['location']['lng'], 
          v['venue']['categories'][0]['name']) for v in results])
       
        # create dataframe for the venues_list
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list]) 

        # add column names
        nearby_venues.columns = ['Neighborhood', 
                      'Neighborhood Latitude', 
                      'Neighborhood Longitude', 
                      'Venue', 
                      'Venue Latitude', 
                      'Venue Longitude', 
                      'Venue Category']
    
    return nearby_venues            

In [110]:
Los_Angeles_venues = getNearbyVenues(Los_Angeles_df['Name'], Los_Angeles_df['Latitude'], Los_Angeles_df['Longitude'])

In [111]:
Los_Angeles_venues.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agoura Hills,34.136395,-118.774535,Sheraton Agoura Hills Hotel,34.144336,-118.779487,Hotel
1,Agoura Hills,34.136395,-118.774535,Camp Kinneret,34.143652,-118.770992,Summer Camp
2,Agoura Hills,34.136395,-118.774535,Liquid Lounge At The Sheraton,34.144099,-118.779251,Hotel Bar
3,Agoura Hills,34.136395,-118.774535,Picnic De Chefs,34.127699,-118.776552,Park
4,Agoura Hills,34.136395,-118.774535,The Sheraton Hotel Bar,34.144466,-118.779076,Hotel Bar
5,Agoura Hills,34.136395,-118.774535,Sheraton Gym,34.144531,-118.778972,Gym
6,Agoura Hills,34.136395,-118.774535,H2O Sea Grill,34.144484,-118.779251,American Restaurant
7,Alhambra,37.176036,-3.587974,La Alhambra y el Generalife,37.17562,-3.586435,Historic Site
8,Alhambra,37.176036,-3.587974,Patio de los Leones,37.177078,-3.58927,Historic Site
9,Alhambra,37.176036,-3.587974,Palacios Nazaríes,37.177343,-3.589747,Historic Site


In [302]:
print('There are a total of', len(Los_Angeles_venues), 'venues in Los Angeles')

There are a total of 3779 venues in Los Angeles


#### Get the counts of venues in each of the city (neighborhood)

In [303]:
venue_count = Los_Angeles_venues['Neighborhood'] .value_counts()
venue_count.head()

Beverly Hills    100
Redondo Beach    100
Alhambra         100
Santa Monica     100
Pasadena         100
Name: Neighborhood, dtype: int64

There are a tolal of 86 cities in Los Angeles, and 100 venues in each city.

#### Find out how many unique categories can be curated from all the returned venues

In [115]:
print('There are {} uniques categories.'.format(len(Los_Angeles_venues['Venue Category'].unique())))
print('The categories are: ', Los_Angeles_venues['Venue Category'].unique()[:10], ', etc.')

There are 319 uniques categories.
The categories are:  ['Hotel' 'Summer Camp' 'Hotel Bar' 'Park' 'Gym' 'American Restaurant'
 'Historic Site' 'Monument / Landmark' 'Garden' 'Palace'] , etc.


In [304]:
Los_Angeles_venues['Venue Category'].value_counts()[:20]

Mexican Restaurant      180
Fast Food Restaurant    139
Coffee Shop             127
Pizza Place             116
American Restaurant      99
Sandwich Place           98
Park                     79
Burger Joint             78
Pharmacy                 74
Grocery Store            67
Chinese Restaurant       65
Convenience Store        65
Hotel                    61
Clothing Store           55
Café                     50
Bakery                   50
Seafood Restaurant       50
Italian Restaurant       50
Bar                      47
Cosmetics Shop           47
Name: Venue Category, dtype: int64

Even though there are a lot of venues, most of them are of the same category. Since how the venues being categorized is not clearly defined, I am interested to see the details of the venues, whether the venues are quite varied.

#### Find out how many unique venues curated from all the returned venues

In [117]:
print('There are {} uniques venues.'.format(len(Los_Angeles_venues['Venue'].unique())))
print('The venues are: ', Los_Angeles_venues['Venue'].unique(), ', etc.')

There are 2897 uniques venues.
The venues are:  ['Sheraton Agoura Hills Hotel' 'Camp Kinneret'
 'Liquid Lounge At The Sheraton' ... 'Taco Truck'
 'Law Office Richard S. Sailer & Associates' 'John Greenleaf Park'] , etc.


In [305]:
Los_Angeles_venues['Venue'].value_counts()[:20]

Starbucks                     53
7-Eleven                      42
SUBWAY                        40
Chase Bank                    29
McDonald's                    29
CVS pharmacy                  29
Redbox                        24
Domino's Pizza                19
Rite Aid                      19
Jack in the Box               18
Subway                        16
Little Caesars Pizza          15
El Pollo Loco                 15
Walgreens                     15
The UPS Store                 13
The Coffee Bean & Tea Leaf    13
Chipotle Mexican Grill        12
ampm                          12
AT&T                          12
Dollar Tree                   12
Name: Venue, dtype: int64

The venue are equally distributed in California. 

## 3. Analyze venue category in each city (Neighborhood) 

In [191]:
# convert each of the venue into column by utilizing dummies
Los_Angeles_onehot = pd.get_dummies(Los_Angeles_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhoods column back to dataframe
Los_Angeles_onehot['Neighborhoods'] = Los_Angeles_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Los_Angeles_onehot.columns[-1]] + list(Los_Angeles_onehot.columns[:-1])
California_onehot = Los_Angeles_onehot[fixed_columns]

Los_Angeles_onehot.head()

Unnamed: 0,ATM,Accessories Store,Advertising Agency,African Restaurant,Airport Service,American Restaurant,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,...,Water Park,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Neighborhoods
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Agoura Hills
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Agoura Hills
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Agoura Hills
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Agoura Hills
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Agoura Hills


In [192]:
Los_Angeles_onehot.shape

(3779, 320)

#### Group rows by neighborhood and take the mean of the frequency of occurrence of each category

In [193]:
Los_Angeles_grouped = Los_Angeles_onehot.groupby('Neighborhoods').mean().reset_index()
Los_Angeles_grouped.head()

Unnamed: 0,Neighborhoods,ATM,Accessories Store,Advertising Agency,African Restaurant,Airport Service,American Restaurant,Animal Shelter,Antique Shop,Arcade,...,Warehouse Store,Water Park,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Agoura Hills,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alhambra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
2,Arcadia,0.0,0.0,0.0,0.0,0.0,0.069767,0.0,0.0,0.023256,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Artesia,0.0,0.0,0.0,0.0,0.0,0.010204,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Avalon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The dataset **Los_Angeles_grouped** contains the frequency of each of the 319 venues for each city (neighborhood) in Los Angeles. Next, I will explore the most frequent venues for each city (neighborhood).

#### Print each city (neighborhood) along with the top 5 most common venues (5 cities to save space)

In [194]:
num_top_venues = 5
for hood in Los_Angeles_grouped['Neighborhoods'][:5]:
    print("-------"+hood+"-------")
    temp = Los_Angeles_grouped[Los_Angeles_grouped['Neighborhoods'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.loc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values(by = ['freq'], ascending=False).reset_index(drop=True).head(num_top_venues))
#    print("")
    print('\n')

-------Agoura Hills-------
                 venue  freq
0            Hotel Bar  0.29
1                  Gym  0.14
2  American Restaurant  0.14
3                Hotel  0.14
4          Summer Camp  0.14


-------Alhambra-------
                venue  freq
0       Historic Site  0.13
1  Spanish Restaurant  0.12
2               Hotel  0.10
3               Plaza  0.06
4    Tapas Restaurant  0.05


-------Arcadia-------
                 venue  freq
0            Racetrack  0.07
1  American Restaurant  0.07
2           Food Truck  0.07
3       Sandwich Place  0.05
4   Mexican Restaurant  0.05


-------Artesia-------
                  venue  freq
0     Indian Restaurant  0.15
1     Korean Restaurant  0.07
2    Chinese Restaurant  0.06
3         Grocery Store  0.04
4  Fast Food Restaurant  0.04


-------Avalon-------
           venue  freq
0          Hotel   0.2
1    Supermarket   0.1
2    Flea Market   0.1
3       Tea Room   0.1
4  Train Station   0.1




#### Put the neighborhood along with the top 5 most common venues into a *pandas* dataframe

First, write a function to sort the venues in descending order.

In [195]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 1 venues for each neighborhood.

In [196]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhoods']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhoods'] = Los_Angeles_grouped['Neighborhoods']

for ind in np.arange(Los_Angeles_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Los_Angeles_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agoura Hills,Hotel Bar,American Restaurant,Park,Summer Camp,Gym,Hotel,Home Service,Dumpling Restaurant,Electronics Store,Empanada Restaurant
1,Alhambra,Historic Site,Spanish Restaurant,Hotel,Plaza,Tapas Restaurant,Bar,Restaurant,Scenic Lookout,Garden,Museum
2,Arcadia,Racetrack,American Restaurant,Food Truck,Mexican Restaurant,Sandwich Place,Lingerie Store,Tea Room,Bakery,Bar,Baseball Field
3,Artesia,Indian Restaurant,Korean Restaurant,Chinese Restaurant,Sandwich Place,Fast Food Restaurant,Pizza Place,Grocery Store,Filipino Restaurant,Coffee Shop,Thai Restaurant
4,Avalon,Hotel,Flea Market,Supermarket,Tea Room,Other Repair Shop,Gastropub,Train Station,Diner,Café,Field


Since there is limited access to the Foursqure API everyday. The data are very evenly distributed, I need to further check the outputs tomorrow. But the process are correct.

## 4. Cluster Neighborhoods

Run *k*-means to cluster the city (neighborhood) in California into 5 clusters.

In [197]:
# set number of clusters
kclusters = 5

Los_Angeles_grouped_clustering = Los_Angeles_grouped.drop('Neighborhoods', axis = 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Los_Angeles_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 1, 1, 1, 3, 3, 3, 3, 1], dtype=int32)

Now the neighborhoods are clustered into five groups based on their characterisitcs, the frequency of the venue categories here. 

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [198]:
Los_Angeles_merged = Los_Angeles_df

# add clustering labels
Los_Angeles_merged['Cluster Labels'] = pd.Series(kmeans.labels_)

# merge OCoT_grouped with OCoT_df to add latitude/longitude for each neighborhood
Los_Angeles_merged = Los_Angeles_merged.join(neighborhoods_venues_sorted.set_index('Neighborhoods'), on='Name')

Los_Angeles_merged.head() # check the last columns!

Unnamed: 0,Name,Type,County,Population(2010),Land area (sq mi),Land area (km^2),Incorporated,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agoura Hills,City,Los Angeles,20330,7.79,20.2,"December 8, 1982",34.136395,-118.774535,0.0,Hotel Bar,American Restaurant,Park,Summer Camp,Gym,Hotel,Home Service,Dumpling Restaurant,Electronics Store,Empanada Restaurant
1,Alhambra,City,Los Angeles,83089,7.63,19.8,"July 11, 1903",37.176036,-3.587974,1.0,Historic Site,Spanish Restaurant,Hotel,Plaza,Tapas Restaurant,Bar,Restaurant,Scenic Lookout,Garden,Museum
2,Arcadia,City,Los Angeles,56364,10.93,28.3,"August 5, 1903",34.136207,-118.04015,1.0,Racetrack,American Restaurant,Food Truck,Mexican Restaurant,Sandwich Place,Lingerie Store,Tea Room,Bakery,Bar,Baseball Field
3,Artesia,City,Los Angeles,16522,1.62,4.2,"May 29, 1959",33.86902,-118.07962,1.0,Indian Restaurant,Korean Restaurant,Chinese Restaurant,Sandwich Place,Fast Food Restaurant,Pizza Place,Grocery Store,Filipino Restaurant,Coffee Shop,Thai Restaurant
4,Avalon,City,Los Angeles,3728,2.94,7.6,"June 26, 1913",47.488537,3.907066,1.0,Hotel,Flea Market,Supermarket,Tea Room,Other Repair Shop,Gastropub,Train Station,Diner,Café,Field


In [199]:
# drop the missing values
Los_Angeles_merged = Los_Angeles_merged.dropna()

In [200]:
# convert the labels into integer
Los_Angeles_merged['Cluster Labels'] = Los_Angeles_merged['Cluster Labels'].astype(int)

Visualize the resulting clusters

In [201]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Los_Angeles_merged['Latitude'], Los_Angeles_merged['Longitude'], Los_Angeles_merged['Name'], Los_Angeles_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

This **folium** map containing **all cities in Los Angeles**, the same color indicats the same group cluster. This map may not available on **GitHub**.

## 5. Examine Each of the Five Clusters

Examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, assign a name to each cluster.

#### Cluster 1

In [284]:
grp_1 = Los_Angeles_merged.loc[Los_Angeles_merged['Cluster Labels'] == 0, 
                               Los_Angeles_merged.columns[[0, 1] + list(range(10,Los_Angeles_merged.shape[1]))]].reset_index(drop = True)
print('Number of city:', len(grp_1))
print('')
print('Cities in first Cluster are: ',grp_1['Name'].values )
print('')
grp_1.head(12)

Number of city: 4

Cities in first Cluster are:  ['Agoura Hills' 'La Cañada Flintridge' 'Redondo Beach' 'San Dimas']



Unnamed: 0,Name,Type,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agoura Hills,City,Hotel Bar,American Restaurant,Park,Summer Camp,Gym,Hotel,Home Service,Dumpling Restaurant,Electronics Store,Empanada Restaurant
1,La Cañada Flintridge,City,Trail,Mexican Restaurant,Garden,Pizza Place,Sandwich Place,Department Store,Frozen Yogurt Shop,American Restaurant,Japanese Restaurant,Bakery
2,Redondo Beach,City,Seafood Restaurant,American Restaurant,Mexican Restaurant,Coffee Shop,Hotel,Bar,Café,Juice Bar,Burger Joint,Pharmacy
3,San Dimas,City,Sandwich Place,Thai Restaurant,Mexican Restaurant,Italian Restaurant,Hotel,Sushi Restaurant,Ice Cream Shop,Hobby Shop,Smoke Shop,Restaurant


#### Cluster 2

In [272]:
grp_2 = Los_Angeles_merged.loc[Los_Angeles_merged['Cluster Labels'] == 1, 
                               Los_Angeles_merged.columns[[0, 1] + list(range(10,Los_Angeles_merged.shape[1]))]].reset_index(drop = True)
print('Number of city:', len(grp_2))
print('')
print('Cities in first Cluster are: ',grp_2['Name'].values )
print('')
grp_2.head()

Number of city: 36

Cities in first Cluster are:  ['Alhambra' 'Arcadia' 'Artesia' 'Avalon' 'Bellflower' 'Beverly Hills'
 'Burbank' 'Calabasas' 'Carson' 'Claremont' 'Covina' 'Cudahy' 'El Monte'
 'El Segundo' 'Hawthorne' 'Huntington Park' 'Irwindale' 'La Habra Heights'
 'Lancaster' 'Long Beach' 'Lynwood' 'Monrovia' 'Montebello'
 'Monterey Park' 'Palos Verdes Estates' 'Pomona' 'Rolling Hills Estates'
 'Rosemead' 'San Marino' 'Santa Clarita' 'Sierra Madre' 'South El Monte'
 'South Gate' 'South Pasadena' 'Temple City' 'Torrance']



Unnamed: 0,Name,Type,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alhambra,City,Historic Site,Spanish Restaurant,Hotel,Plaza,Tapas Restaurant,Bar,Restaurant,Scenic Lookout,Garden,Museum
1,Arcadia,City,Racetrack,American Restaurant,Food Truck,Mexican Restaurant,Sandwich Place,Lingerie Store,Tea Room,Bakery,Bar,Baseball Field
2,Artesia,City,Indian Restaurant,Korean Restaurant,Chinese Restaurant,Sandwich Place,Fast Food Restaurant,Pizza Place,Grocery Store,Filipino Restaurant,Coffee Shop,Thai Restaurant
3,Avalon,City,Hotel,Flea Market,Supermarket,Tea Room,Other Repair Shop,Gastropub,Train Station,Diner,Café,Field
4,Bellflower,City,Pharmacy,Bank,BBQ Joint,Thai Restaurant,Mexican Restaurant,Fast Food Restaurant,Grocery Store,Business Service,Sushi Restaurant,Lounge


#### Cluster 3

In [275]:
grp_3 = Los_Angeles_merged.loc[Los_Angeles_merged['Cluster Labels'] == 2, 
                               Los_Angeles_merged.columns[[0, 1] + list(range(10,Los_Angeles_merged.shape[1]))]].reset_index(drop = True)
print('Number of city:', len(grp_3))
print('')
print('Cities in first Cluster are: ',grp_3['Name'].values )
print('')
grp_3.head()

Number of city: 1

Cities in first Cluster are:  ['Manhattan Beach']



Unnamed: 0,Name,Type,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan Beach,City,Restaurant,Coffee Shop,Hotel,American Restaurant,Bakery,Cosmetics Shop,Sports Bar,Pizza Place,Trail,Pharmacy


#### Cluster 4

In [276]:
grp_4 = Los_Angeles_merged.loc[Los_Angeles_merged['Cluster Labels'] == 3, 
                               Los_Angeles_merged.columns[[0, 1] + list(range(10,Los_Angeles_merged.shape[1]))]].reset_index(drop = True)
print('Number of city:', len(grp_4))
print('')
print('Cities in first Cluster are: ',grp_4['Name'].values )
print('')
grp_4.head()

Number of city: 32

Cities in first Cluster are:  ['Azusa' 'Baldwin Park' 'Bell' 'Bell Gardens' 'Bradbury' 'Commerce'
 'Compton' 'Culver City' 'Diamond Bar' 'Downey' 'Duarte' 'Glendale'
 'Glendora' 'Hawaiian Gardens' 'Hidden Hills' 'Industry' 'Inglewood'
 'La Mirada' 'La Puente' 'Lakewood' 'Lawndale' 'Lomita' 'Malibu' 'Maywood'
 'Palmdale' 'Paramount' 'Pasadena' 'Rolling Hills' 'Santa Fe Springs'
 'Santa Monica' 'Signal Hill' 'Walnut']



Unnamed: 0,Name,Type,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Azusa,City,Mexican Restaurant,Coffee Shop,Pizza Place,Café,Sandwich Place,Liquor Store,Convenience Store,Big Box Store,Sushi Restaurant,Burger Joint
1,Baldwin Park,City,Mexican Restaurant,Pizza Place,Discount Store,Ice Cream Shop,Burger Joint,Fast Food Restaurant,Convenience Store,Pharmacy,Park,Bank
2,Bell,City,Fast Food Restaurant,Boutique,American Restaurant,Bank,Department Store,Coffee Shop,Sandwich Place,Wine Shop,Event Service,Event Space
3,Bell Gardens,City,Mexican Restaurant,Liquor Store,Park,ATM,Convenience Store,Burger Joint,Fried Chicken Joint,Coffee Shop,Seafood Restaurant,Field
4,Bradbury,City,Mexican Restaurant,Pizza Place,Vietnamese Restaurant,Grocery Store,Mediterranean Restaurant,Fast Food Restaurant,Convenience Store,Donut Shop,Chinese Restaurant,Sushi Restaurant


#### Cluster 5

In [277]:
grp_5 = Los_Angeles_merged.loc[Los_Angeles_merged['Cluster Labels'] == 4, 
                               Los_Angeles_merged.columns[[0, 1] + list(range(10,Los_Angeles_merged.shape[1]))]].reset_index(drop = True)
print('Number of city:', len(grp_5))
print('')
print('Cities in first Cluster are: ',grp_5['Name'].values )
print('')
grp_5.head()

Number of city: 5

Cities in first Cluster are:  ['Gardena' 'Hermosa Beach' 'Norwalk' 'Pico Rivera' 'Rancho Palos Verdes']



Unnamed: 0,Name,Type,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Gardena,City,Japanese Restaurant,Korean Restaurant,Sushi Restaurant,Ramen Restaurant,Dessert Shop,Vietnamese Restaurant,Ice Cream Shop,Asian Restaurant,Noodle House,Bakery
1,Hermosa Beach,City,Sushi Restaurant,Beach,Mexican Restaurant,Pizza Place,Coffee Shop,American Restaurant,Italian Restaurant,Hotel,Board Shop,Park
2,Norwalk,City,BBQ Joint,Pizza Place,Mobile Phone Shop,Fast Food Restaurant,Pharmacy,Park,Italian Restaurant,Tapas Restaurant,Latin American Restaurant,Hotel
3,Pico Rivera,City,Fast Food Restaurant,Mexican Restaurant,Mobile Phone Shop,Cosmetics Shop,Convenience Store,Pharmacy,Steakhouse,Discount Store,Sandwich Place,Shoe Store
4,Rancho Palos Verdes,City,Trail,Other Great Outdoors,Beach,Gift Shop,Home Service,Nature Preserve,Scenic Lookout,Electronics Store,Empanada Restaurant,Event Service


#### Find the frequency of the most common venue category for each of the five cluster groups

In [294]:
grp_1_most_10 = pd.concat([grp_1['1st Most Common Venue'], 
                          grp_1['2nd Most Common Venue'], 
                          grp_1['3rd Most Common Venue'],
                          grp_1['4th Most Common Venue'],
                          grp_1['5th Most Common Venue'],
                          grp_1['6th Most Common Venue'],
                          grp_1['7th Most Common Venue'],
                          grp_1['8th Most Common Venue'],
                          grp_1['9th Most Common Venue'],
                          grp_1['10th Most Common Venue']], 
                         join = 'outer', axis = 0)
print('The frequency each venue category for Group 1: ')
grp_1_most_10.value_counts()[:4]

The frequency each venue category for Group 1: 


Hotel                  3
American Restaurant    3
Mexican Restaurant     3
Sandwich Place         2
dtype: int64

In [243]:
grp_2_most_10 = pd.concat([grp_2['1st Most Common Venue'], 
                          grp_2['2nd Most Common Venue'], 
                          grp_2['3rd Most Common Venue'],
                          grp_2['4th Most Common Venue'],
                          grp_2['5th Most Common Venue'],
                          grp_2['6th Most Common Venue'],
                          grp_2['7th Most Common Venue'],
                          grp_2['8th Most Common Venue'],
                          grp_2['9th Most Common Venue'],
                          grp_2['10th Most Common Venue']], 
                         join = 'outer', axis = 0)
print('The frequency each venue category for Group 2: ')
grp_2_most_10.value_counts()[:4]

The frequency each venue category for Group 2: 


Fast Food Restaurant    17
Mexican Restaurant      17
Pizza Place             16
Park                    13
dtype: int64

In [244]:
grp_3_most_10 = pd.concat([grp_3['1st Most Common Venue'], 
                          grp_3['2nd Most Common Venue'], 
                          grp_3['3rd Most Common Venue'],
                          grp_3['4th Most Common Venue'],
                          grp_3['5th Most Common Venue'],
                          grp_3['6th Most Common Venue'],
                          grp_3['7th Most Common Venue'],
                          grp_3['8th Most Common Venue'],
                          grp_3['9th Most Common Venue'],
                          grp_3['10th Most Common Venue']], 
                         join = 'outer', axis = 0)
print('The frequency each venue category for Group 3: ')
grp_3_most_10.value_counts()[:]

The frequency each venue category for Group 3: 


Bakery                 1
Cosmetics Shop         1
Sports Bar             1
Hotel                  1
Pharmacy               1
Restaurant             1
Coffee Shop            1
Pizza Place            1
Trail                  1
American Restaurant    1
dtype: int64

In [245]:
grp_4_most_10 = pd.concat([grp_4['1st Most Common Venue'], 
                          grp_4['2nd Most Common Venue'], 
                          grp_4['3rd Most Common Venue'],
                          grp_4['4th Most Common Venue'],
                          grp_4['5th Most Common Venue'],
                          grp_4['6th Most Common Venue'],
                          grp_4['7th Most Common Venue'],
                          grp_4['8th Most Common Venue'],
                          grp_4['9th Most Common Venue'],
                          grp_4['10th Most Common Venue']], 
                         join = 'outer', axis = 0)
print('The frequency each venue category for Group 4: ')
grp_4_most_10.value_counts()[:5]

The frequency each venue category for Group 4: 


Fast Food Restaurant    22
Mexican Restaurant      17
Coffee Shop             16
Pizza Place             12
Sandwich Place          10
dtype: int64

In [246]:
grp_5_most_10 = pd.concat([grp_5['1st Most Common Venue'], 
                          grp_5['2nd Most Common Venue'], 
                          grp_5['3rd Most Common Venue'],
                          grp_5['4th Most Common Venue'],
                          grp_5['5th Most Common Venue'],
                          grp_5['6th Most Common Venue'],
                          grp_5['7th Most Common Venue'],
                          grp_5['8th Most Common Venue'],
                          grp_5['9th Most Common Venue'],
                          grp_5['10th Most Common Venue']], 
                         join = 'outer', axis = 0)
print('The frequency each venue category for Group 5: ')
grp_5_most_10.value_counts()[:5]

The frequency each venue category for Group 5: 


Hotel                2
Beach                2
Sushi Restaurant     2
Park                 2
Mobile Phone Shop    2
dtype: int64

In [296]:
grp_most_common = pd.DataFrame({'Groups': ['grp_1', 'grp_2', 'grp_3', 'grp_4', 'grp_5'],
                               'Most Common Venue': ["Hotel/Mexican Restaurant/American Restaruant", 
                                                     'Fast Food Restaurant/Mexican Restaurant', 
                                                     '7-Eleven', 
                                                     'Fast Food Restaurant', 
                                                     'Hotel/Beach/Sushi/Park/Mobile Phone Shop'],
                                'Total Number of Cities': [4, 36, 1, 32, 5],
                               'Frequency': [3, 17, 1, 22, 2],})
grp_most_common

Unnamed: 0,Groups,Most Common Venue,Total Number of Cities,Frequency
0,grp_1,Hotel/Mexican Restaurant/American Restaruant,4,3
1,grp_2,Fast Food Restaurant/Mexican Restaurant,36,17
2,grp_3,7-Eleven,1,1
3,grp_4,Fast Food Restaurant,32,22
4,grp_5,Hotel/Beach/Sushi/Park/Mobile Phone Shop,5,2


## 6. Conclusions & Recommendation

### Conclusion 1:

From the output above, there are a lot of 88 cities in Los Angeles clustered into five groups based on the characteristics of each group. The numbers of cities for each group are **4**, **36**, **1**, **32**, and **5**, respectively.

Within 1000 meters for each of the 88 cities, a limit of 300 venues are explores, a total of 3779 venues are recurred. Among the 3779 venues, there are 2897 unique venues belonging to 319 uniques venues categories. The venues categories include **'Hotel'**, **'Summer Camp'**, **'Hotel Bar'**, **'Park'**, **'Gym'**, '**American Restaurant'**,
 **'Historic Site'**, **'Monument / Landmark'**, **'Garden'**, and **'Palace'**, etc.

### Conclusion 2:

From the discussion above, the most common venue category for each of the five groups varied a lot, may indicating some of people's living habits in the particular neighborhood. For example, the most common venues categories for group 2 are Fast Food Restaruant and Mexican Restaurant, indicating that people there are living a quite fast-speed life, and Mexicans could occupy a big portion there. Thus, if you want to start a business related to Mexicans, or your business would help improve people's live qulity who are living a fast-speed life, you may concisder the cities in group 1, 2, or 4. 

Moreover, if you want to start a business related to entertainment, such as fishing, diving, pleasure, it could be better to choose your cities belong to group 5. 

On the contrary, if you want to start a Chinese Restaurant, it would not be a good choice to start your business in these cities. At least, you need more to do more research focusing on the population of Chinese people there. 

### Recommendation:

This project provides a rough guide about where to start a business. The data analytics for other key performance indicators (KPIs) are needed before starting your business. 

However, once your determine to start a business, it is recommended to make the start-up lean at first, set the goal on the sand. In this case, you'll not go too far into the bubble you've created, and you won't come out until you hit the wall. At the same time, this lean analytics will help you quantify your innovation, getting you closer and closer to continuous reality check, and build your own business model.

In [292]:
url_Nash ='https://en.wikipedia.org/wiki/Nash_equilibrium'