# Capstone Project - Moving to NYC

***
***
## 1. Introduction

## A. Business Problem
 *You're planning a move to a new city (let's say NYC or Toronto since we already have the data for them) and you'd like to find the neighborhoods in the new city that are most similar to your current neighborhood.*
 
 *I'm thinking you could enter your zip code then the tool would use FourSquare data and cluster analysis to determine the NYC neighborhoods most similar to your current neighborhood. (Current city or current neighborhood and city will also work instead of zip code.)
 E.g., 60640 or Andersonville, Chicago IL or Chicago, IL all work. To match on a neighborhood level, I'd recommend using 60640 or Andersonville, Chicago, IL.*
 
## B. Stakeholders
*I see 2 sets of stakeholders. They are essentially looking at the same problem from different perspectives.*

*The end user - This is someone thinking of moving to NYC and would like guidance on which neighborhoods to investigate. They may want to target neighborhoods similar their current neighborhood or neighborhoods they've visited or lived in in the past.*

*The sponsor - I'm thinking of a chamber of commerce, local realty board or some other agency looking to encourage people to move to NYC. The tool would give end users guidance on which neighborhoods to target and would make the prospect of a move a little less daunting.*

## C. Background
*I thought of 2 problems I'd like the answer to:*

*My daughter was giving up her apartment in a Chicago neighborhood and looking for another within Chicago. Which neighborhoods would be most similar to the one she currently lived in and enjoyed? This is too much like the NYC & Toronto clustering problems we've already addressed. But what if she were moving to another city?*

*My wife and I are planning to retire in a few years. It would be valuable to have a tool that would let us enter the criteria important to us and suggest places to investigate. I decided not to do this because the scale involved - would need to analyze neighborhoods across the country and foursquare data seems to be more local than the questions I'd like answered - e.g., is the neighborhood within an hour's drive of the mountains?*

***
***
## 2. Data

## A. Data required
*NYC neighborhood data - json file used in the NYC neighborhood lab*

*foursquare data for existing zip code (or neighborhood) & target NYC neighborhoods*
## B. Tools
*Use the geopy library to convert addresses to latitude & longtitude*

*folium.Map for generating maps*

***
***
## 3. Methodology

## A. Exploratory data analysis

*Primary data is:*
- NYC borough & neighborhood latitude, longitude & boundaries from .json file - for mapping & across to FourSquare data
- latitude & longitude for the "Moving From" neighborhood
- FourSquare venue data for each neighborhood

## B. Inferential statistical testing

*K-means analysis to determine clusters of similar neighborhoods based on the venue data from FourSquare*
*This is used to find the neighborhoods most similar to the "Moving From" neighborhood*

## C. Machine learnings

***
***
## 4. Results

*I was having trouble getting FourSquare to return the venue data for all 307 neighborhoods (306 for NYC plus the "Moving From" neighborhood)*
*Since this is just a demo program, I tailed the list of neighborhoods to 100. I thought that would provide enough info for the demo*

*Using 100 neighborhoods and creating 10 K-means clusters - the cluster containing the "Moving From" neighborhood had 17 target neighborhoods*

*Since the "Moving From" neighborhood I used in the test has a suburban feel, I wasn't surprised that most of the target neighborhoods are in outlying boroughs, especially Staten Island. But 3 of the target neighborhoods are in Manhattan.*

***
***
## 5. Discussion

## A. Observations
*K-means analysis on FourSquare venue data does not provide what are to me intuitive groupings of the data points.*

*It's not like a K-means analysis on a 2-dimension graph of data points, where the groupings make sense visually.*

## B. Recommendations
*If this were a real commercial tool there would need to be some changes.*

*1. The NYC data could be pre-loaded. Maybe a .CSV file that's read in at start-up.*

*2. The "Moving from" neighborhood would be entered as field on a website.*

*3. The FourSquare venue data would be needed for all 307 neighborhoods. I was using a Sandbox version of FourSquare, and this may have been a factor.*

***
***
## 6. Conclusion

*For me, this was a meaningful and useful extension of the labs I completed for this course.*

*It allowed me to answer a question I posed (admittedly, in demo form) and gave me a feel for what I could potentially do.*

*It also showed me some of the limitations of my skills and the demo/sandbox implementation.*

***
*Import and install for dependencies*

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2p             |       h470a237_1         3.1 MB  conda-forge
    certifi-2018.10.15         |        py36_1000         138 KB  conda-forge
    geopy-1.17.0               |             py_0          49 KB  conda-forge
    ca-certificates-2018.10.15 |       ha4d7672_0         135 KB  conda-forge
    conda-4.5.11               |        py36_1000         651 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.1 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0            conda-forge
    geopy:           

***
*Get New York City .json dataset*

*Dataset has a total of 5 boroughs and 306 neighborhoods. For the analysis, we will need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and longitude coordinates of each neighborhood.*

In [2]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')

Data downloaded!


***
*Load the .json data*

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

***
*Grab the features out of the .json data - for the neighborhoods info*

In [4]:
neighborhoods_data = newyork_data['features']

***
*Create a pandas dataframe to contain the NYC neighborhood data*

In [5]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

***
*Fill the dataframe one row at a time*

In [6]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

*Take a quick look at the data*

In [7]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


*Make sure the dataframe has 5 boroughs & 306 neighborhoods*

In [8]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)


The dataframe has 5 boroughs and 306 neighborhoods.


***
*Get latitude and longitude for the comparison or "moving from" neighborhood*

In [9]:
# The address can be US postal code OR neighborhood & city OR city
# I recommend the first 2 options if the comparison (moving from) neighborhood is in a city
# For a small or mid-sized town, the 3rd option should work fine

# For a real implementation, this would be a field on an interactive website for the user to enter the data
# For this example, I'll use a postal code OR a neighborhood & city
# Remove the comment on the 
address = '60640'  # US postal code
#address = "Andersonville, Chicago, IL" # Remove the comment at the start of the line to use this address instead
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates  of ', address, ' are {}, {}.'.format(latitude, longitude))



The geographical coordinates  of  60640  are 41.9701760652135, -87.6651511888727.


***
*Add this address as a neighborhood to the NYC data for the Kmeans analysis*
*Will use "Moving From" for the Borough name*

In [10]:
borough = "Moving From"
neighborhood_name = address
        
neighborhood_lat = latitude
neighborhood_lon = longitude
    
neighborhoods = neighborhoods.append({'Borough': borough,
                                        'Neighborhood': neighborhood_name,
                                        'Latitude': neighborhood_lat,
                                        'Longitude': neighborhood_lon}, ignore_index=True)

*Tail the dataframe see the new entry*

In [11]:
neighborhoods.tail()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
302,Queens,Hammels,40.587338,-73.80553
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631
305,Staten Island,Fox Hills,40.617311,-74.08174
306,Moving From,60640,41.970176,-87.665151


*Verify the new borough & neighborhood count*

In [12]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 6 boroughs and 307 neighborhoods.


***
*FourSquare credentials*

In [13]:
CLIENT_ID = 'T25ZXI3SJ0PZXENRJA2H0MX3QCQ4JV4XYV010I2COSDTRIDZ' # your Foursquare ID
CLIENT_SECRET = 'QSCG0KYEFNG5LNL1SVJHUGLKTCH1N30TMCZGCPJG2ZKKL2ZE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: T25ZXI3SJ0PZXENRJA2H0MX3QCQ4JV4XYV010I2COSDTRIDZ
CLIENT_SECRET:QSCG0KYEFNG5LNL1SVJHUGLKTCH1N30TMCZGCPJG2ZKKL2ZE


***
*Define function to get FourSquare venue info*

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(len(venues_list), name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        #print(results)
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:
radius=500
LIMIT=10   #Limit of 10 venues per neighborhood

***
*I was having issues getting FourSquare to return data for all 307 neighborhoods - seemed to work, but would hang after random number of neighborhoods - so I decided to tail the list of neighborhoods. 100 neighborhoods should be enough for a reasonable test*

In [16]:
ntail=neighborhoods.tail(100)
ntail

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
207,Staten Island,Port Ivory,40.639683,-74.174645
208,Staten Island,Castleton Corners,40.613336,-74.119181
209,Staten Island,New Springville,40.594252,-74.16496
210,Staten Island,Travis,40.586314,-74.190737
211,Staten Island,New Dorp,40.572572,-74.116479
212,Staten Island,Oakwood,40.558462,-74.121566
213,Staten Island,Great Kills,40.54948,-74.149324
214,Staten Island,Eltingville,40.542231,-74.164331
215,Staten Island,Annadale,40.538114,-74.178549
216,Staten Island,Woodrow,40.541968,-74.205246


In [17]:
nyc_venues = getNearbyVenues(names=ntail['Neighborhood'],
                                   latitudes=ntail['Latitude'],
                                   longitudes=ntail['Longitude']
                                  )

0 Port Ivory
1 Castleton Corners
2 New Springville
3 Travis
4 New Dorp
5 Oakwood
6 Great Kills
7 Eltingville
8 Annadale
9 Woodrow
10 Tottenville
11 Tompkinsville
12 Silver Lake
13 Sunnyside
14 Ditmas Park
15 Wingate
16 Rugby
17 Park Hill
18 Westerleigh
19 Graniteville
20 Arlington
21 Arrochar
22 Grasmere
23 Old Town
24 Dongan Hills
25 Midland Beach
26 Grant City
27 New Dorp Beach
28 Bay Terrace
29 Huguenot
30 Pleasant Plains
31 Butler Manor
32 Charleston
33 Rossville
34 Arden Heights
35 Greenridge
36 Heartland Village
37 Chelsea
38 Bloomfield
39 Bulls Head
40 Carnegie Hill
41 Noho
42 Civic Center
43 Midtown South
44 Richmond Town
45 Shore Acres
46 Clifton
47 Concord
48 Emerson Hill
49 Randall Manor
50 Howland Hook
51 Elm Park
52 Remsen Village
53 New Lots
54 Paerdegat Basin
55 Mill Basin
56 Jamaica Hills
57 Utopia
58 Pomonok
59 Astoria Heights
60 Claremont Village
61 Concourse Village
62 Mount Eden
63 Mount Hope
64 Sutton Place
65 Hunters Point
66 Turtle Bay
67 Tudor City
68 Stuyvesant

***
*Dataframe of venues returned from FourSquare - one row per venue*

In [18]:
print(nyc_venues.shape)
nyc_venues.head()

(846, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Port Ivory,40.639683,-74.174645,Jonesys Tavern,40.63964,-74.171252,Bar
1,Castleton Corners,40.613336,-74.119181,Joe & Pat Pizzeria and Restaurant,40.613046,-74.122128,Pizza Place
2,Castleton Corners,40.613336,-74.119181,Goodfella's Pizza & Restaurant,40.613114,-74.123999,Pizza Place
3,Castleton Corners,40.613336,-74.119181,Ron & Dave's Tattooing,40.612686,-74.12225,Tattoo Parlor
4,Castleton Corners,40.613336,-74.119181,SUBWAY,40.613178,-74.121314,Sandwich Place


***
*Number of venues returned for each neighborhood*

In [19]:
nyc_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
60640,10,10,10,10,10,10
Allerton,10,10,10,10,10,10
Annadale,9,9,9,9,9,9
Arden Heights,4,4,4,4,4,4
Arlington,4,4,4,4,4,4
Arrochar,10,10,10,10,10,10
Astoria Heights,10,10,10,10,10,10
Bay Terrace,10,10,10,10,10,10
Bayswater,2,2,2,2,2,2
Blissville,10,10,10,10,10,10


In [20]:
print('There are {} uniques categories.'.format(len(nyc_venues['Venue Category'].unique())))

There are 193 uniques categories.


***
*Convert venue data to one column per venue category*

In [21]:
# one hot encoding
nyc_onehot = pd.get_dummies(nyc_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
nyc_onehot['Neighborhood'] = nyc_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [nyc_onehot.columns[-1]] + list(nyc_onehot.columns[:-1])
nyc_onehot = nyc_onehot[fixed_columns]

nyc_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beach,Beer Garden,Big Box Store,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Bus Line,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Campground,Candy Store,Caribbean Restaurant,Child Care Service,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Community Center,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Donut Shop,Eastern European Restaurant,Ethiopian Restaurant,Event Space,Factory,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fish Market,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,Gay Bar,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Health & Beauty Service,Health Food Store,History Museum,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Moving Target,Museum,Music School,Neighborhood,Optical Shop,Other Great Outdoors,Other Repair Shop,Park,Performing Arts Venue,Peruvian Restaurant,Pharmacy,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Pool,Pub,Public Art,Ramen Restaurant,Recording Studio,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Rest Area,Restaurant,Rock Club,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoothie Shop,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sports Bar,Sports Club,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Tanning Salon,Tattoo Parlor,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Tibetan Restaurant,Toll Plaza,Tourist Information Center,Trail,Train Station,Tree,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Port Ivory,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Castleton Corners,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Castleton Corners,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Castleton Corners,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Castleton Corners,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [22]:
nyc_onehot.shape

(846, 193)

***
*Convert venue data to one row per Neighborhood*

*Use mean rather than count as the value for each venue category*

In [23]:
nyc_grouped = nyc_onehot.groupby('Neighborhood').mean().reset_index()
nyc_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beach,Beer Garden,Big Box Store,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Bus Line,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Campground,Candy Store,Caribbean Restaurant,Child Care Service,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Community Center,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cuban Restaurant,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Donut Shop,Eastern European Restaurant,Ethiopian Restaurant,Event Space,Factory,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fish Market,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,Gay Bar,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Health & Beauty Service,Health Food Store,History Museum,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Moving Target,Museum,Music School,Optical Shop,Other Great Outdoors,Other Repair Shop,Park,Performing Arts Venue,Peruvian Restaurant,Pharmacy,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Pool,Pub,Public Art,Ramen Restaurant,Recording Studio,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Rest Area,Restaurant,Rock Club,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoothie Shop,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sports Bar,Sports Club,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Tanning Salon,Tattoo Parlor,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Tibetan Restaurant,Toll Plaza,Tourist Information Center,Trail,Train Station,Tree,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint
0,60640,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Annadale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Arden Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Arlington,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Arrochar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Astoria Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bay Terrace,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bayswater,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Blissville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
nyc_grouped.shape

(100, 193)

In [26]:
#Top 5 venues per neighborhood - Good for debugging, but got tedious for 100 neighborhoods - so commented it out

#num_top_venues = 5

#for hood in nyc_grouped['Neighborhood']:
    #print("----"+hood+"----")
    #temp = nyc_grouped[nyc_grouped['Neighborhood'] == hood].T.reset_index()
    #temp.columns = ['venue','freq']
    #temp = temp.iloc[1:]
    #temp['freq'] = temp['freq'].astype(float)
    #temp = temp.round({'freq': 2})
    #print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    #print('\n')

***
*Top ten venue categories per neighborhood*

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [28]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = nyc_grouped['Neighborhood']

for ind in np.arange(nyc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(nyc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,60640,Breakfast Spot,Jazz Club,Ethiopian Restaurant,Gay Bar,Hot Dog Joint,Massage Studio,Hotel,Arcade,American Restaurant,Food
1,Allerton,Breakfast Spot,Dessert Shop,Supermarket,Discount Store,Martial Arts Dojo,Donut Shop,Pizza Place,Fast Food Restaurant,Pharmacy,Chinese Restaurant
2,Annadale,Cosmetics Shop,Pizza Place,Food,Park,Liquor Store,Train Station,Sports Bar,Diner,Restaurant,Discount Store
3,Arden Heights,Pharmacy,Bus Stop,Coffee Shop,Pizza Place,Factory,Food & Drink Shop,Food,Fish Market,Fish & Chips Shop,Field
4,Arlington,Bus Stop,Deli / Bodega,American Restaurant,Falafel Restaurant,French Restaurant,Food Truck,Food & Drink Shop,Food,Fish Market,Fish & Chips Shop
5,Arrochar,Italian Restaurant,Middle Eastern Restaurant,Bagel Shop,Pizza Place,Sandwich Place,Supermarket,Mediterranean Restaurant,Deli / Bodega,Hotel,History Museum
6,Astoria Heights,Bowling Alley,Hostel,Supermarket,Food,Gourmet Shop,Bakery,Pizza Place,Burger Joint,Italian Restaurant,Plaza
7,Bay Terrace,Italian Restaurant,Supermarket,Salon / Barbershop,Insurance Office,Sushi Restaurant,Train Station,Donut Shop,Shipping Store,Event Space,Fish Market
8,Bayswater,Playground,Park,Wings Joint,Falafel Restaurant,Food Truck,Food & Drink Shop,Food,Fish Market,Fish & Chips Shop,Field
9,Blissville,Hotel,Donut Shop,Rental Service,Restaurant,Skating Rink,Clothing Store,Mattress Store,Bar,Wings Joint,Falafel Restaurant


***
## K-means analysis

*Specifying 10 clusters for 100 neighborhoods (using the 100 tailed neighborhoods rather than 307 total) - Should provide ~10 neighborhoods per cluster*

*Using the full 307 neighborhoods, I'd try using 15-20 clusters. That should yield about 15-20 neighborhoods per cluster*


In [29]:
# set number of clusters
kclusters = 10

nyc_grouped_clustering = nyc_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(nyc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([8, 3, 3, 3, 4, 1, 1, 1, 0, 2], dtype=int32)

In [30]:
nyc_merged = ntail

# add clustering labels
nyc_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
nyc_merged = nyc_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

nyc_merged.tail() # check the last columns!

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
302,Queens,Hammels,40.587338,-73.80553,0,Beach,Food Truck,Diner,Gym / Fitness Center,Wings Joint,Farmers Market,French Restaurant,Food & Drink Shop,Food,Fish Market
303,Queens,Bayswater,40.611322,-73.765968,5,Playground,Park,Wings Joint,Falafel Restaurant,Food Truck,Food & Drink Shop,Food,Fish Market,Fish & Chips Shop,Field
304,Queens,Queensbridge,40.756091,-73.945631,4,Hotel,Sandwich Place,Ramen Restaurant,Spanish Restaurant,Park,Scenic Lookout,Hotel Bar,Department Store,Dessert Shop,Fish & Chips Shop
305,Staten Island,Fox Hills,40.617311,-74.08174,2,Bus Stop,Intersection,Sandwich Place,Factory,Food & Drink Shop,Food,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant
306,Moving From,60640,41.970176,-87.665151,8,Breakfast Spot,Jazz Club,Ethiopian Restaurant,Gay Bar,Hot Dog Joint,Massage Studio,Hotel,Arcade,American Restaurant,Food


***
*Create a map showing the clusters*

In [31]:
address = 'New York City, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

nyc_ll=[latitude, longitude]



The geograpical coordinate of New York City are 40.7308619, -73.9871558.


In [32]:
# create map
map_clusters = folium.Map(location=nyc_ll, zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(nyc_merged['Latitude'], nyc_merged['Longitude'], nyc_merged['Neighborhood'], nyc_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

***
*Create dataframe containing only the neighborhoods in the same cluster as the "Moving From" neighborhood*

In [33]:
cluster_id= nyc_merged.iloc[-1]['Cluster Labels']
nyc_merged_cluster=nyc_merged.loc[nyc_merged['Cluster Labels']==cluster_id]
nyc_merged_cluster.tail()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
292,Staten Island,Lighthouse Hill,40.576506,-74.137927,8,Spa,Moving Target,Italian Restaurant,Café,Art Museum,Trail,Massage Studio,Wings Joint,Falafel Restaurant,Food
293,Staten Island,Richmond Valley,40.519541,-74.229571,8,Bank,Deli / Bodega,Coffee Shop,Mexican Restaurant,Smoothie Shop,Fast Food Restaurant,Train Station,Sandwich Place,Convenience Store,Dog Run
294,Queens,Malba,40.790602,-73.826678,8,Scenic Lookout,Rest Area,Rock Club,Tennis Court,Wings Joint,Event Space,Food,Fish Market,Fish & Chips Shop,Field
301,Manhattan,Hudson Yards,40.756658,-74.000111,8,American Restaurant,Food Truck,Theater,Hotel,Residential Building (Apartment / Condo),Supermarket,Music School,Gym / Fitness Center,Park,Fish Market
306,Moving From,60640,41.970176,-87.665151,8,Breakfast Spot,Jazz Club,Ethiopian Restaurant,Gay Bar,Hot Dog Joint,Massage Studio,Hotel,Arcade,American Restaurant,Food


***
*Map the neighborhoods in the same cluster as the "Moving From" neighborhood*

*These should be the target neighborhoods for further investigation as possible target neighborhoods in NYC*

In [34]:
# create map
map_clusters = folium.Map(location=nyc_ll, zoom_start=11)

# add markers to the map
markers_colors = []
for lat, lon, neighborhood, borough in zip(nyc_merged_cluster['Latitude'], nyc_merged_cluster['Longitude'], nyc_merged_cluster['Neighborhood'], nyc_merged_cluster['Borough']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

***
*List of target neighborhoods in the same cluster as the "Moving From" neighborhood - which is listed last*

In [35]:
#cluster_id= nyc_merged.iloc[-1]['Cluster Labels']
#nyc_merged.loc[nyc_merged['Cluster Labels'] == cluster_id, nyc_merged.columns[[1] + list(range(5, nyc_merged.shape[1]))]]
nyc_merged_cluster

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
207,Staten Island,Port Ivory,40.639683,-74.174645,8,Bar,Wings Joint,Frozen Yogurt Shop,French Restaurant,Food Truck,Food & Drink Shop,Food,Fish Market,Fish & Chips Shop,Field
222,Brooklyn,Wingate,40.660947,-73.937187,8,Fast Food Restaurant,BBQ Joint,Gym / Fitness Center,Discount Store,Juice Bar,Donut Shop,Fish & Chips Shop,Field,Pharmacy,Food Truck
226,Staten Island,Graniteville,40.620172,-74.153152,8,Food Truck,Bus Stop,Grocery Store,Wings Joint,Fried Chicken Joint,Food & Drink Shop,Food,Fish Market,Fish & Chips Shop,Field
230,Staten Island,Old Town,40.596329,-74.087511,8,Italian Restaurant,Pharmacy,Bakery,American Restaurant,Gas Station,Optical Shop,Pizza Place,Food & Drink Shop,Food,Fish Market
233,Staten Island,Grant City,40.576216,-74.105856,8,Wings Joint,Pizza Place,Food & Drink Shop,Tanning Salon,Fast Food Restaurant,Grocery Store,Health & Beauty Service,Arts & Crafts Store,Event Space,Dessert Shop
239,Staten Island,Charleston,40.530531,-74.232158,8,Big Box Store,Furniture / Home Store,Grocery Store,Cosmetics Shop,Restaurant,Arts & Crafts Store,Pizza Place,Gift Shop,Bakery,Food & Drink Shop
250,Manhattan,Midtown South,40.74851,-73.988713,8,Lingerie Store,Street Food Gathering,Korean Restaurant,Food Truck,Grocery Store,Clothing Store,Cosmetics Shop,Dessert Shop,Italian Restaurant,Discount Store
252,Staten Island,Shore Acres,40.609719,-74.066678,8,Italian Restaurant,Bar,Chinese Restaurant,Food,Restaurant,Gastropub,Pizza Place,Bagel Shop,Falafel Restaurant,Food & Drink Shop
254,Staten Island,Concord,40.604473,-74.084024,8,Deli / Bodega,Coffee Shop,Supermarket,Park,Bagel Shop,Peruvian Restaurant,Train Station,Gym / Fitness Center,Donut Shop,Falafel Restaurant
255,Staten Island,Emerson Hill,40.606794,-74.097762,8,Moving Target,Construction & Landscaping,Food,Wings Joint,Falafel Restaurant,Food Truck,Food & Drink Shop,Fish Market,Fish & Chips Shop,Field
