<a href="https://colab.research.google.com/github/tmnguni/Coursera_Capstone/blob/main/IBM_Coursera_Capstone_Project_Report.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

                                            IBM Data Science Professional – Opening a restaurant in New York
                                                              Themba Mnguni
                                                             11 January 2021


# **1. Introduction/Business Problem**


If one wants to open a restaurant in New York, there might several questions that would have to be answered? For an example: How does the restaurant competition and the restaurant market look like in this city? Which areas of New York should be considered?

This report is relevant for investors who are interested in opening a restaurant or expanding into the New York Market by opening restaurant branches in New York and want some information about how the competition is looking in this city. What are the chances of succeeding with the restaurant they intent opening in the city?


## **2. Data**

The data used for this project is the New York Data which can be found here https://geo.nyu.edu/catalog/nyu_2451_34572. This data contains boroughs of New York. Each borough can contain several neighbourhoods.  This data also contains GPS coordinates for each neighbourhood, see the data below.

In [None]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [None]:
NY_neighborhoods_data = newyork_data['features']
NY_neighborhoods_data[0]

{'geometry': {'coordinates': [-73.84720052054902, 40.89470517661],
  'type': 'Point'},
 'geometry_name': 'geom',
 'id': 'nyu_2451_34572.1',
 'properties': {'annoangle': 0.0,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661],
  'borough': 'Bronx',
  'name': 'Wakefield',
  'stacked': 1},
 'type': 'Feature'}

# **3. Methodology**

## **3.1. Data Preparation**

I first obtained the New York Data which comes in the form of json file. This was then transformed into a data frame with the following columns: Borough, Neighbourhood, Latitude and Longitude. The following are the codes and the first five records of the Data Frame:

In [None]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

NY_neighborhoods = pd.DataFrame(columns=column_names)

for data in NY_neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    NY_neighborhoods = NY_neighborhoods.append({'Borough': borough,
                                                  'Neighborhood': neighborhood_name,
                                                  'Latitude': neighborhood_lat,
                                                  'Longitude': neighborhood_lon}, ignore_index=True)

In [None]:
NY_neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In total, the Data Frame consists of 5 Boroughs and 306 Neighbourhoods. 

Using Python geopy, the New York Address was converted into GPS geographical coordinates (Latitude and Longitude) and the New York map was obtained. Using Folium map, the coordinates of each neighbourhood was marked on the map of New York, see marked New York City below.

In [None]:
address = 'New York City, NY'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


Using Folium.map marked New York Neighbourhoods coordinates on the map

In [None]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10, 
                         min_zoom=9, max_zoom=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(NY_neighborhoods['Latitude'], 
                                           NY_neighborhoods['Longitude'], 
                                           NY_neighborhoods['Borough'], 
                                           NY_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        weight=2,
        color='#333333',
        fill=True,
        fill_color='#ffb300',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Obtained nearby venues using Foursquare API and GetNearbyVenues() function

In [None]:
CLIENT_ID = '123' 
CLIENT_SECRET = 'abc' 
VERSION = '20180605' 
LIMIT = 100 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: EW0TI34LM2RN3W5DH54WWO410K0QSWJHLW0VGQYBFLDYP3A2
CLIENT_SECRET:EEAOAOJ2GPYQGWBBYVSRAW5WNGVNNF14FZY4V2WOVYTHNEWY


## **3.2.	Exploratory analytics: Foursquare API and GetNearbyVenues function**

After marking the neighbouroods on the map using Folium map, the following step was retrieving the nearby venues. This was done by using Foursquare Application Programming Interface (API) to explore neighbourhoods in the New York City. The URL is as follows: 'https://api.foursquare.com/v2/venues/search?&client_id={1234}&client_secret={5678}'
'&v={}&ll={},{}&intent=browse&radius={}&limit={}' with CLIENT_ID, CLIENT_SECRET, VERSION, and neighbourhood lat, lng,  as well as choses radius and LIMIT as inputs.

This API was used in conjunction with the following GetNearbyVenues() function. 


In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
       
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print('•', end='')
            
        # create the API request URL
        url = ('https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}'
               '&v={}&ll={},{}&intent=browse&radius={}&limit={}'
               .format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT))
            
        # make the GET request
        results = None
        while results is None:
            try:
                results = requests.get(url).json()["response"]["venues"]
            except:
                print('X', end='')
                results = None
        
        # return only relevant information for each nearby venue
        venues_list.append([(name, lat, lng, v['name'], v['location']['lat'], 
                             v['location']['lng'], v['categories'][0]['name']) 
                            for v in results if len(v['categories']) > 0])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 
                             'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return(nearby_venues)

This function takes as input (latitudes, longitudes, and name) of each neighbourhood. It will then return as a data frame, each neighbourhood, and its venues. But this data Frame could include an office, Bus stop, bus line, road or building as venues. These are not relevant to the objective that I wanted to achieve, and they were thus removed.

In [None]:
NY_venues = NY_venues[~NY_venues['Venue Category'].isin(['Building', 'Office', 'Bus Line', 'Bus Station', 'Bus Stop', 'Road'])]
print(NY_venues.shape)

(23614, 7)


This data frame consists of 306 unique categories of venues. The following are the categories:

In [None]:
NY_venues.groupby('Neighborhood').size()

Neighborhood
Allerton                      83
Annadale                      77
Arden Heights                 67
Arlington                     71
Arrochar                      77
Arverne                       84
Astoria                       74
Astoria Heights               68
Auburndale                    62
Bath Beach                    81
Battery Park City             86
Bay Ridge                     83
Bay Terrace, Queens           82
Bay Terrace, Staten Island    75
Baychester                    82
Bayside                       84
Bayswater                     78
Bedford Park                  67
Bedford Stuyvesant            86
Beechhurst                    89
Bellaire                      76
Belle Harbor                  83
Bellerose                     80
Belmont                       62
Bensonhurst                   82
Bergen Beach                  78
Blissville                    77
Bloomfield                    73
Boerum Hill                   88
Borough Park                  

Since it is difficult to work with variables which are labels as opposed to integer categorical variables. I therefore converted these label categorical variables into integer (binary) categorical variable representation using one hot encoding, see the code and the resulting binary data frame.

In [None]:
NY_onehot = pd.get_dummies(NY_venues[['Venue Category']], prefix="", prefix_sep="")

NY_onehot['Neighborhood_'] = NY_venues['Neighborhood'] 

fixed_columns = [NY_onehot.columns[-1]] + list(NY_onehot.columns[:-1])
NY_onehot = NY_onehot[fixed_columns]

NY_onehot.head()

Unnamed: 0,Neighborhood_,ATM,Accessories Store,Acupuncturist,Adult Boutique,Advertising Agency,Afghan Restaurant,African Restaurant,Airport,Airport Gate,Airport Service,Airport Terminal,Airport Tram,Alternative Healer,American Restaurant,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Art Studio,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Assisted Living,Astrologer,Athletics & Sports,Auditorium,Australian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Baggage Claim,Baggage Locker,Bakery,Ballroom,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Bath House,Bathing Area,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Border Crossing,Botanical Garden,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buddhist Temple,Buffet,Burger Joint,Burrito Place,Business Center,Business Service,Butcher,Cable Car,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Campaign Office,Campground,Canal,Candy Store,Cantonese Restaurant,Capitol Building,Car Wash,Caribbean Restaurant,Carpet Store,Casino,Caucasian Restaurant,Cemetery,Check Cashing Service,Cheese Shop,Child Care Service,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Circus,City Hall,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College & University,College Academic Building,College Administrative Building,College Arts Building,College Auditorium,College Basketball Court,College Bookstore,College Cafeteria,College Classroom,College Communications Building,College Football Field,College Gym,College Lab,College Library,College Math Building,College Quad,College Rec Center,College Residence Hall,College Science Building,College Soccer Field,College Stadium,College Technology Building,College Theater,College Track,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Community Center,Community College,Concert Hall,Conference Room,Construction & Landscaping,Convenience Store,Convention Center,Cooking School,Corporate Amenity,Corporate Cafeteria,Cosmetics Shop,Costume Shop,Country Dance Club,Courthouse,Coworking Space,Credit Union,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Currency Exchange,Cycle Studio,Dance Studio,Daycare,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Distribution Center,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Driving School,Drugstore,Dry Cleaner,Dumpling Restaurant,EV Charging Station,Eastern European Restaurant,Electronics Store,Elementary School,Embassy / Consulate,Emergency Room,Empanada Restaurant,Entertainment Service,Event Service,Event Space,Exhibit,Eye Doctor,Fabric Shop,Factory,Fair,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Film Studio,Financial or Legal Service,Fire Station,Fish & Chips Shop,Fish Market,Fishing Spot,Fishing Store,Flea Market,Floating Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Stand,Food Truck,Forest,Fraternity House,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Funeral Home,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Herbs & Spices Store,High School,Himalayan Restaurant,Hindu Temple,Historic Site,History Museum,Hobby Shop,Hockey Field,Home Service,Hookah Bar,Hospital,Hospital Ward,Hostel,Hot Dog Joint,Hot Spring,Hotel,Hotel Bar,Hotpot Restaurant,Housing Development,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Industrial Estate,Insurance Office,Internet Cafe,Intersection,Irish Pub,Island,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kingdom Hall,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Law School,Lawyer,Leather Goods Store,Lebanese Restaurant,Library,Light Rail Station,Lighthouse,Line / Queue,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts School,Massage Studio,Maternity Clinic,Mattress Store,Medical Center,Medical Lab,Medical School,Medical Supply Store,Mediterranean Restaurant,Meeting Room,Memorial Site,Men's Store,Mental Health Office,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Middle School,Military Base,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monastery,Monument / Landmark,Moroccan Restaurant,Mosque,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multiplex,Museum,Music Festival,Music School,Music Store,Music Venue,Nail Salon,National Park,New American Restaurant,Newsstand,Night Market,Nightclub,Nightlife Spot,Non-Profit,Noodle House,Nursery School,Opera House,Optical Shop,Organic Grocery,Other Event,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Event Space,Outdoor Gym,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Outlet Mall,Outlet Store,Paella Restaurant,Paintball Field,Pakistani Restaurant,Paper / Office Supplies Store,Park,Parking,Pawn Shop,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Lab,Photography Studio,Physical Therapist,Piano Bar,Pie Shop,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Plane,Platform,Playground,Plaza,Poke Place,Police Station,Polish Restaurant,Pool,Pool Hall,Pop-Up Shop,Post Office,Prayer Room,Preschool,Print Shop,Private School,Professional & Other Places,Pub,Public Art,Public Bathroom,Puerto Rican Restaurant,Racetrack,Radio Station,Ramen Restaurant,Real Estate Office,Record Shop,Recording Studio,Recreation Center,Recruiting Agency,Recycling Facility,Rehab Center,Religious School,Rental Car Location,Rental Service,Research Laboratory,Residence,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Roller Rink,Roof Deck,Rugby Pitch,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Salsa Club,Sandwich Place,Sausage Shop,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Shrine,Sikh Temple,Skate Park,Skating Rink,Ski Chalet,Ski Lodge,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Social Club,Sorority House,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Squash Court,Sri Lankan Restaurant,Stables,Stadium,Stationery Store,Steakhouse,Stoop Sale,Storage Facility,Street Art,Strip Club,Student Center,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Synagogue,Szechuan Restaurant,TV Station,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Taxi,Tea Room,Tech Startup,Temple,Tennis Court,Tennis Stadium,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toll Booth,Toll Plaza,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trade School,Trail,Trailer Park,Train,Train Station,Transportation Service,Travel & Transport,Travel Agency,Travel Lounge,Tree,Tunnel,Turkish Restaurant,University,Urgent Care Center,Vacation Rental,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Volleyball Court,Voting Booth,Warehouse,Warehouse Store,Waste Facility,Watch Shop,Waterfront,Wedding Hall,Weight Loss Center,Well,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yemeni Restaurant,Yoga Studio
0,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Wakefield,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## **3.3. Obtaining the most common venues**

I then Obtained the most common categories. The function is as follows:

In [None]:
def return_most_common_venues(row, num_top_cat):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_cat]


num_top_cat = 7
indicators = ['st', 'nd', 'rd']

Created columns according to number of to venues
Created a new categories data frame

## **3.4.CLustering of Categories**

K_means clustering was used to cluster the different categories of venues. The function is as follows:


In [None]:
kclusters = 5

NY_grouped_clustering = NY_grouped.drop('Neighborhood_', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(NY_grouped_clustering)

kmeans.labels_[0:10]

array([3, 1, 1, 1, 1, 1, 2, 1, 0, 1], dtype=int32)

In [None]:
NY_neighborhoods_categories_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

NY_merged = NY_neighborhoods.rename(columns={'Neighborhood': 'Neighborhood_'}).copy()
NY_merged = NY_merged[~NY_merged['Neighborhood_'].isin(NY_excluded_neighborhoods)]

NY_merged = NY_merged.join(NY_neighborhoods_categories_sorted.set_index('Neighborhood_'), on='Neighborhood_')

NY_merged.head()

Unnamed: 0,Borough,Neighborhood_,Latitude,Longitude,Cluster Labels,1st Most Common Category,2nd Most Common Category,3rd Most Common Category,4th Most Common Category,5th Most Common Category,6th Most Common Category,7th Most Common Category
0,Bronx,Wakefield,40.894705,-73.847201,3,Salon / Barbershop,Church,Laundry Service,Food,Coworking Space,Doctor's Office,Convenience Store
1,Bronx,Co-op City,40.874294,-73.829939,1,Residential Building (Apartment / Condo),School,Parking,Other Great Outdoors,Church,Salon / Barbershop,Laundry Service
2,Bronx,Eastchester,40.887556,-73.827806,0,Automotive Shop,Deli / Bodega,Gas Station,Caribbean Restaurant,Auto Dealership,Factory,Hardware Store
3,Bronx,Fieldston,40.895437,-73.905643,1,College Academic Building,College Residence Hall,College Administrative Building,College Cafeteria,Residential Building (Apartment / Condo),College Quad,Synagogue
4,Bronx,Riverdale,40.890834,-73.912585,2,Residential Building (Apartment / Condo),Synagogue,Doctor's Office,Playground,Park,Dentist's Office,General College & University


Created a Clusterd map using the following function

In [None]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10,
                          min_zoom=8, max_zoom=11)

rainbow = pc[:5]

for lat, lon, poi, cluster in zip(NY_merged['Latitude'], NY_merged['Longitude'], 
                                  NY_merged['Neighborhood_'], NY_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        weight=1,
        popup=label,
        color='#333333',
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.8).add_to(map_clusters)
       
map_clusters

# **Results**

## **4.1. Cluster 1**

In [None]:
c1.groupby(['1st Most Common Category']).size().reset_index(name='Counts')

Unnamed: 0,1st Most Common Category,Counts
0,Automotive Shop,6


This category only contains Automotive shop and does not contain any restaurant category. If it is chosen by the potential restaurant investor, the investor will have to be mindful of the fact that the cluster might be very low on traffic. As it lacks very important categories such as residential, schools/college, park etc. which are a traffic generating categories.

## **4.2. Cluster 2**

In [None]:
c2.groupby(['1st Most Common Category',]).size().reset_index(name = 'Count')

Unnamed: 0,1st Most Common Category,Count
0,Art Gallery,2
1,Automotive Shop,10
2,Bank,1
3,Bar,3
4,Baseball Field,1
5,Beach,6
6,Boat or Ferry,4
7,Boutique,1
8,Bridge,1
9,Chinese Restaurant,3


This cluster has got 57 categories in total. These include 6 categories of restaurants and which are: Indian Restaurant (1), Italian Restaurants (4), Korean (2), food truck (2), pizza place (2), and Chinese (3). Other Categories include College, schools, churches, gas stations, taxi, grocery, salon/barbershop, Hospital medical centre, park, residential apartments etc.

While this cluster has got some restaurants, it appears to be having a lot of activities that generate traffic which is good for restaurants.

## **4.3. Cluster 3**

In [None]:
c3.groupby(['1st Most Common Category',]).size().reset_index(name = 'Count')

Unnamed: 0,1st Most Common Category,Count
0,Government Building,1
1,Residential Building (Apartment / Condo),43


This cluster consists of only a government building, a residential area and Salon/Barbershop and there are no restaurants. However, it appears that the area does not have a lot of pedestrian traffic and therefore might not be an attractive area.

## **4.4. Cluster 4**

In [None]:
c4.groupby(['1st Most Common Category',]).size().reset_index(name = 'Count')

Unnamed: 0,1st Most Common Category,Count
0,Bar,1
1,Church,2
2,College Academic Building,1
3,Deli / Bodega,13
4,Laundry Service,1
5,Nail Salon,1
6,Residential Building (Apartment / Condo),1
7,Salon / Barbershop,43
8,Synagogue,1


This cluster does not contain restaurants except for Deli/Bodega. This could mean that this area might be considered for opening restaurants. While this could indicate that this might be the right area in which a restaurant could be opened, caution should be exercised given the fact that this are does not appear to have a lot of traffic given the fact that other categories include church, salon, medical centre, and residential area.

## **4.5. Cluster 5**

In [None]:
c5.groupby(['1st Most Common Category',]).size().reset_index(name = 'Count')

Unnamed: 0,1st Most Common Category,Count
0,College Residence Hall,1
1,Deli / Bodega,1
2,Dentist's Office,1
3,Doctor's Office,32
4,Eye Doctor,1
5,Hospital,1
6,Residential Building (Apartment / Condo),2


This cluster does not contain restaurant categories. This means no competition. The categories in this cluster are: College Residence hall, Dentist’s office, Doctor’s office, Eye Doctor, Hospital, and a Residential Building. While this cluster does not show a lot of activities based on the categories it might be considered for a restaurant.

# **5. Discussion**

Based on the analysis above, the following was noted:
Cluster 1, Cluster 3, Cluster 4, and Cluster 5 have less or no restaurant competition. Cluster 2 appears to be having a lot of competition. But however, this does not automatically mean Cluster 1, 3, 4 and 5 are the right choices. 

Based on the facts above, the Clusters 4 and 5 could be the best options for opening a restaurant. The market appears to be good they appear to be having less competition. 


# **6. Conclusion**

In deciding where to open a restaurant, competition is not the only deciding factor. How the market looks like is very important as well. In our analysis above we included other categories such as schools, residential areas, parks, grocery stores, college/universities etc.  which are very important in deciding were to open the restaurant. 

I acknowledge that market and competition are not the only factors that influence where one could open a restaurant. There could be other factors such as Municipal zoning, which are out of my scope. But however, the factors I put forward in my report, I believe could be very useful in assisting potential investor on deciding whether to open a restaurant in New York. I also believe my report has been able to indicate how the competition looks like which I believe is very important for ant potential investor.
