# Capstone Project - Battle of the Neighborhoods

## Problem and Background Discussion

For one city, it has a large area. Which area is more profitable to open a restaurant? Which area is more suitable to start your new business. You will feel worried about these questions when you are in a large city. I will use the former argument in the above query for this assignment and say that I have a client looking to open a restaurant in Detroit, Michigan and they want to know the best place for that. Let's presume this restaurant will be a restaurant that serves Greek food. Detroit has a wide variety of downtown restaurants and we'll do this analysis with the aid of Foursquare and their comprehensive location dataset about the city and its locations. Through opening this new Greek food restaurant, this will allow people to choose from a wider range while dining downtown, creating revenue for themselves, nearby locations, and the city as a whole.

## Data Used

I will use the Foursquare API to request location data which I will use similarly to how I used it in the assignments in New York and Toronto. I'll start by specifying the initial location (Campus Martius) and setting our 800 m radius, and then exploring the neighborhood for restaurants similar. I am also going to want to look for a location that has a reasonable amount of foot traffic and is decently far enough away from other sestern restaurants so that the market is not over-saturated in one area. They will take the sporting arenas, casinos, and other entertainment venues into account as it will produce the most foot traffic. Main streets are suitable for passenger cars and foot traffic, but are likely to be more costly to rent out.

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, Campus_lat, Campus_lon, radius, LIMIT)'

In [58]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.0.1               |             py_0         575 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following NEW packages will be 

In [None]:
address = 'Detroit, MI'
Campus_lat = 42.3318
Campus_lon = -83.0466
radius = 800
LIMIT = 500

CLIENT_ID = 'OCSANRSKD4JUKC0SNU3OEVYHVDTTYZH0APEJ512LT2ORO5BG' # your Foursquare ID
CLIENT_SECRET = 'HTWUBMYI1HGF3QLBWECH0MU4IXDC2PE5FEC50S0FPI0D0TRL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
  CLIENT_ID, 
  CLIENT_SECRET, 
  VERSION, 
  Campus_lat, 
  Campus_lon, 
  radius, 
  LIMIT)

print('URL created!')

## Results

In [59]:
results = requests.get(url).json()

In [60]:

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Areas from the JSON data from Foursquare

In [61]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues
    




Unnamed: 0,name,categories,lat,lng
0,Campus Martius,Park,42.331575,-83.046598
1,Avalon Cafe and Bakery,Café,42.332834,-83.047694
2,Texas de Brazil,Steakhouse,42.332293,-83.046711
3,Dime Store,American Restaurant,42.331039,-83.047734
4,Chase Bank,Bank,42.330676,-83.046648
5,Lafayette Coney Island,Hot Dog Joint,42.331683,-83.04878
6,Bon Bon Bon,Dessert Shop,42.330548,-83.047914
7,Grand Trunk Pub,Pub,42.330403,-83.045943
8,Athens Souvlaki Detroit,Diner,42.33036,-83.048183
9,Detroit Water Ice Factory,Ice Cream Shop,42.332575,-83.047436


## Narrow down to resaurants

In [7]:
restaurants = nearby_venues[nearby_venues['categories'].str.contains("Restaurant")]
restaurants

Unnamed: 0,name,categories,lat,lng
3,Dime Store,American Restaurant,42.331039,-83.047734
14,Parc,American Restaurant,42.331564,-83.0467
17,The Standby,New American Restaurant,42.334439,-83.046009
20,Orchid Thai,Thai Restaurant,42.333579,-83.04522
22,Vicente's Cuban Cuisine,Cuban Restaurant,42.334436,-83.047193
27,Maru Sushi & Grill,Japanese Restaurant,42.33036,-83.048269
42,Freshii,Restaurant,42.331569,-83.047713
45,Central Kitchen + Bar,American Restaurant,42.331518,-83.045962
46,Sweetwater Tavern,American Restaurant,42.331861,-83.041839
47,Go Sy Thai,Thai Restaurant,42.332993,-83.049187


In [8]:
restaurants.shape

(23, 4)

## Let's see how many are Greek Restaurants.

In [9]:
greek_restaurants = nearby_venues[nearby_venues['categories'].str.contains("Greek")]
greek_restaurants

Unnamed: 0,name,categories,lat,lng
98,Pegasus Taverna,Greek Restaurant,42.335227,-83.04165


## So lets see where it's at on the map of Detroit in relation to Campus Martius.

In [16]:
locations = greek_restaurants[['lat', 'lng']]
locationlist = locations.values.tolist()
len(locationlist)

1

In [62]:
Detroit = folium.Map(location=[Campus_lat, Campus_lon], zoom_start=15)

# define our radius the restaurant should be in 
folium.Circle([Campus_lat, Campus_lon],
              radius=radius,
              color='black'
             ).add_to(Detroit)

# make a large circle for Campus Martius in green
folium.CircleMarker(
        [Campus_lat, Campus_lon],
        radius=40,
        popup='Campus Martius',
        color='green',
        fill=True,
        fill_color='#228B22',
        fill_opacity=0.7,
        parse_html=False).add_to(Detroit)

# make a circle for Fishbone's Cafe in blue
folium.CircleMarker(
        [42.334376, -83.043068],
        radius=10,
        popup='Fishbone Cafe',
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Detroit) 

Detroit

In [17]:
from bs4 import BeautifulSoup

floors_url = 'https://42floors.com/us/mi/detroit/downtown-detroit?regions%5B%5D=90816&within=0106000020e610000001000000010300000001000000090000002fd8209e22c354c03cbccc48af2b45405fd7209ea9c354c0f894cd26c62b454038da20c6ccc354c05c5f329a682b45406ad62086c1c354c03cfc203bdb2a454051d82066f1c254c02001f740a72a45405bda20cec5c254c02c98955d622b4540d1d920ce1fc354c03cbccc48af2b45402fd8209e22c354c03cbccc48af2b45402fd8209e22c354c03cbccc48af2b4540'
    
r = requests.get(floors_url) 
  
soup = BeautifulSoup(r.content, 'lxml') 
#print(soup.prettify())

In [18]:
table = soup.findAll('span', attrs={'class':'title block text-overflow text-blue'})
table

[<span class="title block text-overflow text-blue">
                 1442 Brush St
                 <span class="text-light">
                     -
                         Unit 401 (Floor 4)
                 </span>
 </span>, <span class="title block text-overflow text-blue">
                 415 Clifford St
                 <span class="text-light">
                     -
                         Space
                 </span>
 </span>, <span class="title block text-overflow text-blue">
                 1452 Brush St
                 <span class="text-light">
                     -
                         3 Spaces
                 </span>
 </span>, <span class="title block text-overflow text-blue">
                 1442 Brush St
                 <span class="text-light">
                     -
                         Floor 1
                 </span>
 </span>, <span class="title block text-overflow text-blue">
                 2310 Park Ave
                 <span class="text-light"

In [19]:
addresses = []

for address in range(0, len(table)):
    addresses.append(table[address].text)
    
addresses

['\n                1442 Brush St\n                \n                    -\n                        Unit 401 (Floor 4)\n                \n',
 '\n                415 Clifford St\n                \n                    -\n                        Space\n                \n',
 '\n                1452 Brush St\n                \n                    -\n                        3 Spaces\n                \n',
 '\n                1442 Brush St\n                \n                    -\n                        Floor 1\n                \n',
 '\n                2310 Park Ave\n                \n                    -\n                        Floor 5\n                \n']

## Let's clean up each array element that contains and address and make it into a usable value.

In [20]:
for _ in range(0,len(addresses)):
    addresses[_] = " ".join(addresses[_].split())
               
addresses

['1442 Brush St - Unit 401 (Floor 4)',
 '415 Clifford St - Space',
 '1452 Brush St - 3 Spaces',
 '1442 Brush St - Floor 1',
 '2310 Park Ave - Floor 5']

In [21]:
for _ in range(0,len(addresses)):
    addresses[_] = addresses[_].split('-', 1)[0]
    addresses[_] = addresses[_][:-1]
    
addresses

['1442 Brush St',
 '415 Clifford St',
 '1452 Brush St',
 '1442 Brush St',
 '2310 Park Ave']

## Let's do analysis for the type of space each proprty could be used for now. 

In [22]:
table = soup.findAll('span', attrs={'class':'width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0'})
table

[<span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0">
             Office
         </span>,
 <span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0">
             Office<br/>Retail
         </span>,
 <span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0">
             Retail<br/>Restaurant<br/>Office
         </span>,
 <span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0">
             Office<br/>Retail
         </span>,
 <span class="width-90-px-md col-4-sm col-center col-gutter-none-sm text-smaller text-subtle col-shrink-0">
             Office
         </span>]

In [23]:
building_types = []

for types in range(0, len(table)):
    building_types.append(table[types].text)
    building_types[types] = " ".join(building_types[types].split())
    
building_types

['Office', 'OfficeRetail', 'RetailRestaurantOffice', 'OfficeRetail', 'Office']

## I'll do a more efficient cleanup of the data

In [30]:
detroit_table = pd.DataFrame(list(zip(addresses, building_types)), 
               columns =['Address', 'Building Type'])

detroit_table

Unnamed: 0,Address,Building Type
0,1442 Brush St,Office
1,415 Clifford St,OfficeRetail
2,1452 Brush St,RetailRestaurantOffice
3,1442 Brush St,OfficeRetail
4,2310 Park Ave,Office


## So let's actually plot out points on a map so I can figure out which is closest to Campus Martius and is a good fit for a restaurant.

In [53]:
building_lats = []
building_lons = []

zipcode = '48226'
city = 'Detroit, MI'

geolocator = Nominatim(user_agent="detroit_app")

for _ in range(0,len(addresses)-1):
    place = addresses[_] + ' ' + city + ' ' + zipcode
    location = geolocator.geocode(place)
    building_lats.append(location.latitude)
    building_lons.append(location.longitude)
    print(location.latitude, location.longitude)

42.33634192741168 -83.044893667659
42.334701571428575 -83.05227371428572
42.33643104751674 -83.04495319129133
42.33634192741168 -83.044893667659


In [54]:
detroit_table['Latitude'] = building_lats
detroit_table['Longitude'] = building_lons

detroit_table

Unnamed: 0,Address,Building Type,Latitude,Longitude
0,,,42.336342,-83.044894
1,,,42.334702,-83.052274
2,,,42.336431,-83.044953
3,,,42.336342,-83.044894


## Let's plot our building locations now and append them to the map of Detroit we already have.

In [63]:
for _ in range(0,len(addresses)-1):
    folium.Marker([building_lats[_], building_lons[_]], popup=addresses[_]).add_to(Detroit)
    
Detroit

### Because 1452 Brush St, Detroit, MI, 48226 has been identified as a restaurant space and is within 800 meters of Campus Martius, this seems to be the best location for the new business of our client. This is fairly close to a high foot traffic area and will be ideal for a new seafood restaurant because there is only one rival in the neighbourhood to keep up with.

## Conclusions

As a whole, this project was a great learning experience on how to scrape data and use it efficiently to solve a business problem for business owners. In the future I would love to revisit this problem and use more advanced techniques to address it with relative ease. I would strongly recommend that any and all aspiring data scientists practice a problem like this to improve their own abilities.