# Business Market Analysis
___

## Introduction / Business Problem

This analysis is meant to help a business venture choose between locations. Specifically we will look at Mexican retaraunts in Los Angeles. Finding a neighborhood that is not saturated with other Mexican restaunts would be a great place to start!

## Data

In [None]:
import numpy as np
import pandas as pd
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
import requests
import urllib.request
from urllib.request import urlopen
import time
from bs4 import BeautifulSoup
#!pip install pgeocode
import pgeocode

<div class="alert alert-block alert-danger">
<b>Delete this cell before publishing
</div>



In [None]:
# Foursquare API authentication
CLIENT_ID = '' 
CLIENT_SECRET = '' 
ACCESS_TOKEN = '' 
VERSION = '20180604'
LIMIT = 30

### scrape for list of neighborhoods in LA


First we need to set some variables, including defining the source of our information. We will be scraping LAalminac.com to get out communities and zip codes.

In [None]:
url = "http://www.laalmanac.com/communications/cm02_communities.php" 
data = requests.get(url).text 
soup = BeautifulSoup(data,"html.parser")

Here we are creating a list, finding our cells, cleaning some unnececary characters, and appending our list.

In [None]:
import re

table=soup.find('table')
list_rows = []
for row in table:
    cells = soup.find_all('td')
    str_cells = str(cells)
    clean = re.compile('<.*?>')
    clean2 = (re.sub(clean, '',str_cells))
    list_rows.append(clean2)
print(clean2)
type(clean2)

[Acton, 93510, Agoura Hills, 91301, Agoura Hills (PO Boxes), 91376, Agua Dulce, 91390, Alhambra, 91801, 91803, Alhambra (Non-Geographic Zip Code Within 91801), 91804, Alhambra (PO Boxes), 91802, 91896, 91899, Altadena, 91001, Altadena (PO Boxes), 91003, Americana at Brand &amp; Glendale Galleria (Glendale), 91210, Arcadia, 91006, 91007, Arcadia (PO Boxes), 91066, 91077, Arleta (Los Angeles), 91331, Arlington Heights (Los Angeles), 90019, Artesia, 90701, Artesia (PO Boxes), 90702, Athens, 90044, Atwater Village (Los Angeles), 90039, Avalon (PO Boxes), 90704, Avocado Heights, 91746, Azusa, 91010, 91702, Baldwin Hills (Los Angeles), 90008, Baldwin Park, 91706, Bassett, 91746, Bel Air Estates (Los Angeles), 90049, Bel Air Estates, Beverly Glen (Los Angeles), 90077, Bell, 90201, 90270, Bell (PO Boxes), 90202, Bell (Shared Firms), 90096, Bell Canyon, 91307, Bell Gardens, 90201, Bell Gardens (PO Boxes), 90202, Bellflower, 90706, Bellflower (PO Boxes), 90707, Belmont Shore (Long Beach), 90803,

str

Next we must split the data at the commas.

In [None]:
s = clean2.split(',')
s

['[Acton',
 ' 93510',
 ' Agoura Hills',
 ' 91301',
 ' Agoura Hills (PO Boxes)',
 ' 91376',
 ' Agua Dulce',
 ' 91390',
 ' Alhambra',
 ' 91801',
 ' 91803',
 ' Alhambra (Non-Geographic Zip Code Within 91801)',
 ' 91804',
 ' Alhambra (PO Boxes)',
 ' 91802',
 ' 91896',
 ' 91899',
 ' Altadena',
 ' 91001',
 ' Altadena (PO Boxes)',
 ' 91003',
 ' Americana at Brand &amp; Glendale Galleria (Glendale)',
 ' 91210',
 ' Arcadia',
 ' 91006',
 ' 91007',
 ' Arcadia (PO Boxes)',
 ' 91066',
 ' 91077',
 ' Arleta (Los Angeles)',
 ' 91331',
 ' Arlington Heights (Los Angeles)',
 ' 90019',
 ' Artesia',
 ' 90701',
 ' Artesia (PO Boxes)',
 ' 90702',
 ' Athens',
 ' 90044',
 ' Atwater Village (Los Angeles)',
 ' 90039',
 ' Avalon (PO Boxes)',
 ' 90704',
 ' Avocado Heights',
 ' 91746',
 ' Azusa',
 ' 91010',
 ' 91702',
 ' Baldwin Hills (Los Angeles)',
 ' 90008',
 ' Baldwin Park',
 ' 91706',
 ' Bassett',
 ' 91746',
 ' Bel Air Estates (Los Angeles)',
 ' 90049',
 ' Bel Air Estates',
 ' Beverly Glen (Los Angeles)',
 ' 9

Here we have the bulk of our data cleaning! See the comments for more information on what is happening but the basic idea is to take a list of strings and turn some of the items into int, while also getting rid of some extra zip codes we wont be using and fix a couple formatting problems. We will also be grabbing some latitude and longitude data while we are at it

In [None]:
#remove spaces
s1 = [s.strip(' ') for s in s]
#create empty list
a = []
nomi = pgeocode.Nominatim('US')
#create variable to use for itteration
cityNumber = 0
#create boolean variables for determining if last itterated was a string or int
lastWasZIP = False
lastwasletter = False
#for loop to convert zip codes to int and group city with zip code
for x in s1:
  if not x[1].isdigit() and lastwasletter == False: #if current item is NOT a number AND last item WAS
    if cityNumber == 0:                                 #if city number is 0->
      a.append([x[1:],10000,0,0])                           #append with 10000 for easy to spot error catching and remove 1st "["
      cityNumber = cityNumber + 1                       #move to next variable
      lastwasletter = True                              #last item WAS a letter
    else:
      if not str(a[cityNumber-1][0][0:5]) in x:         #if not a str that matches previous cities 1st 6 characters
        a.append([x,10000,0,0])                             #append with 10000 for easy to spot error catching
        cityNumber = cityNumber + 1                     #move to next variable
        lastWasZIP = False                              #last item was NOT a number
        lastwasletter = True                            #last item WAS a letter
  elif lastWasZIP == False and x[1].isdigit():      #if last item was NOT a number AND current IS
      if x[len(x)-1] != "]":                            
        a[cityNumber-1][1] = int(x)                     #replace 10000 with int version or this line of x     
        a[cityNumber-1][2] = nomi.query_postal_code(int(x)).latitude
        a[cityNumber-1][3] = nomi.query_postal_code(int(x)).longitude
        lastWasZIP = True
        lastwasletter = False
      else:
        a[cityNumber-1][1] = int(x[:-1])                #replace 10000 with int version without last character of x
        a[cityNumber-1][2] = nomi.query_postal_code(int(x[:-1])).latitude
        a[cityNumber-1][3] = nomi.query_postal_code(int(x[:-1])).longitude
        lastWasZIP = True
        lastwasletter = False

In [None]:
a

[['Acton', 93510, 34.4835, -118.1959],
 ['Agoura Hills', 91301, 34.1227, -118.7573],
 ['Agua Dulce', 91390, 34.4684, -118.5261],
 ['Alhambra', 91801, 34.0914, -118.1293],
 ['Altadena', 91001, 34.1912, -118.1392],
 ['Americana at Brand &amp; Glendale Galleria (Glendale)',
  91210,
  34.1425,
  -118.2551],
 ['Arcadia', 91006, 34.1324, -118.0264],
 ['Arleta (Los Angeles)', 91331, 34.2556, -118.4208],
 ['Arlington Heights (Los Angeles)', 90019, 34.0482, -118.3343],
 ['Artesia', 90701, 33.8654, -118.0731],
 ['Athens', 90044, 33.9551, -118.2901],
 ['Atwater Village (Los Angeles)', 90039, 34.1121, -118.2594],
 ['Avalon (PO Boxes)', 90704, 33.332, -118.3437],
 ['Avocado Heights', 91746, 34.0443, -117.9862],
 ['Azusa', 91010, 34.1407, -117.9567],
 ['Baldwin Hills (Los Angeles)', 90008, 34.0116, -118.3411],
 ['Bassett', 91746, 34.0443, -117.9862],
 ['Bel Air Estates (Los Angeles)', 90049, 34.066, -118.47399999999999],
 ['Beverly Glen (Los Angeles)', 90077, 34.1112, -118.4502],
 ['Bell', 90201, 3

Lets make it a data frame and have a look!

In [None]:
df = pd.DataFrame(a, columns =['Neighborhood', 'PostalCode', 'Latitude', 'Longitude'])
df

Unnamed: 0,Neighborhood,PostalCode,Latitude,Longitude
0,Acton,93510,34.4835,-118.1959
1,Agoura Hills,91301,34.1227,-118.7573
2,Agua Dulce,91390,34.4684,-118.5261
3,Alhambra,91801,34.0914,-118.1293
4,Altadena,91001,34.1912,-118.1392
...,...,...,...,...
256,Wilshire Center (Los Angeles),90004,34.0762,-118.3029
257,Wilsona Gardens,93535,34.7131,-117.8783
258,Windsor Hills (Los Angeles),90043,33.9871,-118.3321
259,Winnetka (Los Angeles),91306,34.2092,-118.5749


## define a function to locate nearby venues
___


This will grab all the restaurants labeled "Mexican"

In [None]:
def getNearbyVenues(Neighborhood, PostalCode, Latitude, Longitude, radius=500):
    
    venues_list=[]
    for name, postalcode, lat, lng in zip(Neighborhood, PostalCode, Latitude, Longitude):
        if postalcode != 91797:
          #print(name)
            
          # create the API request URL
          url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
              CLIENT_ID, 
              CLIENT_SECRET, 
              VERSION, 
              lat, 
              lng, 
              radius, 
              LIMIT)
            
          # make the GET request
          #print(requests.get(url).json())
          results = requests.get(url).json()["response"]['groups'][0]['items']
          #print(results)
          #print("\n")
          # return only relevant information for each nearby venue
          for v in results:
            if v['venue']['categories'][0]['shortName'] == 'Mexican':
          #print("Here1\n")
              venues_list.append([(
               name, 
               lat, 
               lng, 
               v['venue']['name'], 
               v['venue']['location']['lat'], 
               v['venue']['location']['lng'],  
               v['venue']['categories'][0]['name'])])
    #print("Here2\n")
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Lat',
                  'Neighborhood Long',
                  'Venue', 
                  'Venue Lat',
                  'Venue Long',
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
LA_venues = getNearbyVenues(Neighborhood=df['Neighborhood'],PostalCode=df['PostalCode'], Latitude=df['Latitude'],Longitude=df['Longitude'])

In [None]:
LA_venues

Unnamed: 0,Neighborhood,Neighborhood Lat,Neighborhood Long,Venue,Venue Lat,Venue Long,Venue Category
0,Arcadia,34.1324,-118.0264,Taco Lita,34.130003,-118.027064,Mexican Restaurant
1,Arlington Heights (Los Angeles),34.0482,-118.3343,Chipotle Mexican Grill,34.048183,-118.336012,Mexican Restaurant
2,Arlington Heights (Los Angeles),34.0482,-118.3343,El Compita,34.048592,-118.332846,Mexican Restaurant
3,Azusa,34.1407,-117.9567,Tacos Ensenada,34.140143,-117.957711,Mexican Restaurant
4,Baldwin Hills (Los Angeles),34.0116,-118.3411,Chipotle Mexican Grill,34.013284,-118.336625,Mexican Restaurant
...,...,...,...,...,...,...,...
174,Willowbrook,33.9293,-118.2463,Tacos El Bronco,33.929620,-118.246058,Mexican Restaurant
175,Wilmington (Los Angeles),33.7855,-118.2645,Mariscos La Paz,33.787514,-118.263280,Mexican Restaurant
176,Wilshire Center (Los Angeles),34.0762,-118.3029,Cactus Mexican Food,34.076194,-118.304147,Mexican Restaurant
177,Wilshire Center (Los Angeles),34.0762,-118.3029,Casa Carnitas,34.076540,-118.297723,Mexican Restaurant


In [None]:
LA_venues.head

<bound method NDFrame.head of                         Neighborhood  ...      Venue Category
0                            Arcadia  ...  Mexican Restaurant
1    Arlington Heights (Los Angeles)  ...  Mexican Restaurant
2    Arlington Heights (Los Angeles)  ...  Mexican Restaurant
3                              Azusa  ...  Mexican Restaurant
4        Baldwin Hills (Los Angeles)  ...  Mexican Restaurant
..                               ...  ...                 ...
174                      Willowbrook  ...  Mexican Restaurant
175         Wilmington (Los Angeles)  ...  Mexican Restaurant
176    Wilshire Center (Los Angeles)  ...  Mexican Restaurant
177    Wilshire Center (Los Angeles)  ...  Mexican Restaurant
178           Winnetka (Los Angeles)  ...  Mexican Restaurant

[179 rows x 7 columns]>

## map venues in relation to neighborhoods

Here is an interactive map that shows all the venues in blue and all the neighborhoods in red.

In [None]:
map_LA = folium.Map(location=[34.1324,-118.0264],zoom_start=10)

for ven_lat,ven_lng,neighborhood, venue in zip(LA_venues['Venue Lat'],LA_venues['Venue Long'],LA_venues['Neighborhood'],LA_venues['Venue']):
    label = '{}, {}'.format(venue, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [ven_lat,ven_lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_LA)

for n_lat,n_lng,neighborhood in zip(LA_venues['Neighborhood Lat'],LA_venues['Neighborhood Long'],LA_venues['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [n_lat,n_lng],
    radius=3,
    popup=label,
    color='red',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.4,
    parse_html=False).add_to(map_LA)

map_LA