# Battle of the Neighborhoods: Dog Breeders!

## Introduction: Business Problem

The stakeholders in question are dog breeders looking to move to a neighborhood in San Francisco out of which to breed, sell, and care for beagles. Beagles are a social breed that require plenty of exercise as well as face-to-face time with other dogs. The client wants to move to a neighborhood where there are pet stores nearby so it's easy to provide for their dogs, as well as many open-air public spaces where the dogs can socialize and get their daily exercise.

The ideal neighborhood will have an even balance of all these characteristics so the breeders can give the beagles adequate stimulation while they raise them as well as "advertise" their dogs to locals when they take them for walks. Adopting their dogs out locally will also assure that potential owners will have access to the same facilities  as the breeders that are necessary for caring for the dog.

## Data

This project's objective is to find the ideal neighborhood into which the beagle breeders can set up their business. The data required will come from multiple sources as follows:

#### OpenDataSoft + HealthySF

San Francisco is composed of a number of neighborhoods associated with a given zip code. HealthySF has a list of the names of these neighborhoods by zip, and OpenDataSoft has a data set that indexes each zip code in the US to its latitude and longitude, which can be used to map these neighborhoods.

#### Foursquare API

We will use Foursquare's venue data to determine the number of dog-friendly venues near a given neighborhood.

## Methodology

First we'll import the necessary python modules to deal with the data on a base level.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analyssis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

We use the read_html function to scrape the healthySF website for its list of neighborhoods and zip codes.

In [2]:
healthSFtables = pd.read_html('http://www.healthysf.org/bdi/outcomes/zipmap.htm')
SF_zip = healthSFtables[3]
SF_zip

Unnamed: 0,0,1,2
0,Zip Code,Neighborhood,Population (Census 2000)
1,94102,Hayes Valley/Tenderloin/North of Market,28991
2,94103,South of Market,23016
3,94107,Potrero Hill,17368
4,94108,Chinatown,13716
5,94109,Polk/Russian Hill (Nob Hill),56322
6,94110,Inner Mission/Bernal Heights,74633
7,94112,Ingelside-Excelsior/Crocker-Amazon,73104
8,94114,Castro/Noe Valley,30574
9,94115,Western Addition/Japantown,33115


First, we need to set the first imported row as the column name. We do not need the population of each neighborhood for this exercise, nor the extraneous catergory "All Zips." Cursory examination of this dataframe's data types shows that 'Zip Code' was imported as an object, so we much change it to an integer.

In [3]:
SF_zip.columns = SF_zip.iloc[0]
SF_zip = SF_zip.drop(SF_zip.index[0])
SF_zip = SF_zip.drop(columns = 'Population (Census 2000)')
SF_zip = SF_zip.drop([22])
SF_zip = SF_zip.astype({'Zip Code': 'int64'})
SF_zip.head()

Unnamed: 0,Zip Code,Neighborhood
1,94102,Hayes Valley/Tenderloin/North of Market
2,94103,South of Market
3,94107,Potrero Hill
4,94108,Chinatown
5,94109,Polk/Russian Hill (Nob Hill)


This data frame carries only the information we need. Now, we get OpenDataSoft's latitude and longitude data. This data frame is large and time-consuming to import when it contains all the zipcodes in the US, so we are reading in a filtered version of the CSV that is narrowed only to California zip codes. We indicate that the separator is a semicolon rather than a comma, and import only the columns including zip code, city, and latitude and longitude.

In [4]:
CA_zip = \
pd.read_csv('https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/download/?format=csv&refine.state=CA&timezone=America/Los_Angeles&lang=en&use_labels_for_header=true&csv_separator=%3B',\
           sep = ';',\
           usecols = [0, 1, 3, 4])
CA_zip = CA_zip.rename(columns={'Zip': 'Zip Code'})
print(CA_zip.shape)
CA_zip.head()

(2716, 4)


Unnamed: 0,Zip Code,City,Latitude,Longitude
0,91319,Newbury Park,34.032383,-119.1343
1,92503,Riverside,33.91355,-117.46052
2,94211,Sacramento,38.377411,-121.444429
3,91902,Bonita,32.663803,-117.02456
4,95901,Marysville,39.15973,-121.53735


We now merge the dataframes on the zip codes they have in common to get the primary dataframe we will be using in this analysis.

In [5]:
SF_data = pd.merge(SF_zip, CA_zip, on = 'Zip Code')
SF_data.head()

Unnamed: 0,Zip Code,Neighborhood,City,Latitude,Longitude
0,94102,Hayes Valley/Tenderloin/North of Market,San Francisco,37.779329,-122.41915
1,94103,South of Market,San Francisco,37.772329,-122.41087
2,94107,Potrero Hill,San Francisco,37.766529,-122.39577
3,94108,Chinatown,San Francisco,37.792678,-122.40793
4,94109,Polk/Russian Hill (Nob Hill),San Francisco,37.792778,-122.42188


We now import the tools necessary to visualize the neighborhoods in the above dataframe.

In [6]:
# IF CODE IN BELOW CELLS IS NOT GENERATING OUTPUT, UNCOMMENT THESE LINES OF CODE

# !conda install -c conda-forge geopy --yes
# !conda install -c conda-forge folium=0.5.0 --yes

In [7]:
import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


We must set San Francisco as the default area on which the map will be centered.

In [8]:
address = 'San Francisco, CA, USA'
geolocator = Nominatim(user_agent="sf_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('SF Coordinates: {}, {}'.format(latitude, longitude))

SF Coordinates: 37.7790262, -122.4199061


In [9]:
# create map of SF using latitude and longitude values
SF_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(SF_data['Latitude'], SF_data['Longitude'], SF_data['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='orange',
        fill=True,
        fill_color='gold',
        fill_opacity=0.7,
        parse_html=False).add_to(SF_map)  
    
SF_map

We need to call up our Foursquare credentials to find venues nearby. We are designating the radius as 4000 meters because SF has a good public transit system (BART) we are treating 2.5 miles as a "close-by" radius.

In [10]:
CLIENT_ID = '5G1RAVU2YF0UDWUNCMCCKEIC5IAWNQ0FN42VNAHFJLHAVOAX' # Foursquare ID
CLIENT_SECRET = 'AVKNN5YHYHCNRMEUTI04HCABA4YKAFVWEFEHK4QGDWIIZJIK' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
radius = 4000 # 2.5 miles is 4023m
LIMIT = 100

We create a fuction to call nearby venues from each neighborhood and read them into a dataframe.

In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
SF_venues = getNearbyVenues(names = SF_data['Neighborhood'],
                                   latitudes = SF_data['Latitude'],
                                   longitudes = SF_data['Longitude']
                                  )

Hayes Valley/Tenderloin/North of Market
South of Market
Potrero Hill
Chinatown
Polk/Russian Hill (Nob Hill)
Inner Mission/Bernal Heights
Ingelside-Excelsior/Crocker-Amazon
Castro/Noe Valley
Western Addition/Japantown
Parkside/Forest Hill
Haight-Ashbury
Inner Richmond
Outer Richmond
Sunset
Marina
Bayview-Hunters Point
St. Francis Wood/Miraloma/West Portal
Twin Peaks-Glen Park
Lake Merced
North Beach/Chinatown
Visitacion Valley/Sunnydale


In [13]:
# print(SF_venues.shape)
# SF_venues.head()

In [14]:
# SF_venues.groupby('Neighborhood').count()

In [15]:
# one hot encoding
SF_onehot = pd.get_dummies(SF_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
SF_onehot['Neighborhood'] = SF_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [SF_onehot.columns[-1]] + list(SF_onehot.columns[:-1])
SF_onehot = SF_onehot[fixed_columns]

SF_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,African Restaurant,Alternative Healer,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bar,Baseball Field,Beer Bar,Beer Garden,Bike Shop,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Café,Camera Store,Candy Store,Cantonese Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Comedy Club,Concert Hall,Convenience Store,Cosmetics Shop,Credit Union,Creperie,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distillery,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Electronics Store,Empanada Restaurant,Entertainment Service,Ethiopian Restaurant,Event Space,Eye Doctor,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Hardware Store,Health & Beauty Service,Hill,Historic Site,History Museum,Hobby Shop,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Jiangsu Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Latin American Restaurant,Laundromat,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Music School,Music Store,Music Venue,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Park,Parking,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Ramen Restaurant,Record Shop,Restaurant,Road,Rock Club,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Sicilian Restaurant,Smoke Shop,Smoothie Shop,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Street Art,Street Food Gathering,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Trail,Trattoria/Osteria,Turkish Restaurant,Tuscan Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Yoga Studio
0,Hayes Valley/Tenderloin/North of Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Hayes Valley/Tenderloin/North of Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Hayes Valley/Tenderloin/North of Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Hayes Valley/Tenderloin/North of Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Hayes Valley/Tenderloin/North of Market,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Dog Breeders do not need to know about all of these venues. Let's narrow it down to dog-friendly places.

In [16]:
SF_onehot = SF_onehot[['Neighborhood','Park','Plaza','Farmers Market','Food Truck','Pet Store','Pool','Monument / Landmark','Sculpture Garden','Road','Historic Site','Fish Market','Dog Run','Garden Center','Hill','Trail','Scenic Lookout','Playground','Garden','Field','Baseball Field']]
# SF_onehot.head()

In [17]:
SF_grouped = SF_onehot.groupby('Neighborhood').mean().reset_index()
SF_grouped
# print(SF_grouped.shape)
# SF_grouped

Unnamed: 0,Neighborhood,Park,Plaza,Farmers Market,Food Truck,Pet Store,Pool,Monument / Landmark,Sculpture Garden,Road,Historic Site,Fish Market,Dog Run,Garden Center,Hill,Trail,Scenic Lookout,Playground,Garden,Field,Baseball Field
0,Bayview-Hunters Point,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Castro/Noe Valley,0.0,0.025,0.0,0.0,0.0125,0.0,0.0125,0.0,0.0,0.0125,0.0,0.0125,0.0125,0.0125,0.0125,0.0375,0.025,0.025,0.0,0.0
2,Chinatown,0.010989,0.0,0.0,0.0,0.0,0.0,0.010989,0.010989,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Haight-Ashbury,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0
4,Hayes Valley/Tenderloin/North of Market,0.023256,0.011628,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Ingelside-Excelsior/Crocker-Amazon,0.027778,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Inner Mission/Bernal Heights,0.044444,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Inner Richmond,0.0,0.0,0.017544,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0
8,Lake Merced,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Marina,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0


We can look at the top 5 venues from each neighborhood.

In [18]:
num_top_venues = 5

for hood in SF_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = SF_grouped[SF_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bayview-Hunters Point----
        venue  freq
0        Park  0.05
1       Plaza  0.00
2       Field  0.00
3      Garden  0.00
4  Playground  0.00


----Castro/Noe Valley----
            venue  freq
0  Scenic Lookout  0.04
1          Garden  0.02
2      Playground  0.02
3           Plaza  0.02
4            Hill  0.01


----Chinatown----
                 venue  freq
0                 Park  0.01
1  Monument / Landmark  0.01
2     Sculpture Garden  0.01
3                 Road  0.01
4        Garden Center  0.00


----Haight-Ashbury----
              venue  freq
0              Park  0.05
1           Dog Run  0.02
2        Playground  0.02
3    Scenic Lookout  0.02
4  Sculpture Garden  0.02


----Hayes Valley/Tenderloin/North of Market----
            venue  freq
0            Park  0.02
1  Farmers Market  0.01
2           Plaza  0.01
3         Dog Run  0.00
4           Field  0.00


----Ingelside-Excelsior/Crocker-Amazon----
        venue  freq
0        Park  0.03
1  Food Truck  0.03
2   

About half of these neighborhoods (largely the west side of San Fran) have too few venues to be of interest. We will excise them from the dataframes to get a better idea of which neighborhoods in east SF would be appropriate. But first, let's create a dataframe with the most common venues of each neighborhood.

In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [20]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = SF_grouped['Neighborhood']

for ind in np.arange(SF_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(SF_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Bayview-Hunters Point,Park,Field,Plaza,Farmers Market,Food Truck
1,Castro/Noe Valley,Scenic Lookout,Garden,Playground,Plaza,Historic Site
2,Chinatown,Park,Monument / Landmark,Sculpture Garden,Road,Field
3,Haight-Ashbury,Park,Scenic Lookout,Sculpture Garden,Dog Run,Historic Site
4,Hayes Valley/Tenderloin/North of Market,Park,Plaza,Farmers Market,Field,Food Truck
5,Ingelside-Excelsior/Crocker-Amazon,Park,Food Truck,Field,Plaza,Farmers Market
6,Inner Mission/Bernal Heights,Park,Pool,Fish Market,Road,Plaza
7,Inner Richmond,Field,Farmers Market,Pet Store,Pool,Baseball Field
8,Lake Merced,Food Truck,Baseball Field,Field,Plaza,Farmers Market
9,Marina,Park,Playground,Road,Plaza,Farmers Market


While this list *looks* good, if we look at the printout a couple of cells above we can see that about half of these neighborhoods only contain a few venues of interest. Let's excise locations from both dataframes that have fewer than 3 venues actually present.

In [21]:
SF_grouped = SF_grouped.drop([0,5,8,9,11,12,13,15,16,17])
SF_grouped = SF_grouped.reset_index()
SF_grouped = SF_grouped.drop(columns = 'index')
# SF_grouped

In [22]:
neighborhoods_venues_sorted = neighborhoods_venues_sorted.drop([0,5,8,9,11,12,13,15,16,17])
neighborhoods_venues_sorted = neighborhoods_venues_sorted.reset_index()
neighborhoods_venues_sorted = neighborhoods_venues_sorted.drop(columns = 'index')
# neighborhoods_venues_sorted

Now that we have this more specific dataframe, we can cluster them by similar characteristics to find the best neighborhoods to have a dog breeding business in.

In [23]:
# set number of clusters
kclusters = 5

SF_grouped_clustering = SF_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(SF_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 3, 3, 3, 4, 3, 4, 4, 2, 0], dtype=int32)

In [24]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [25]:
SF_merged = SF_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
# SF_merged = SF_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
SF_merged = pd.merge(SF_data, neighborhoods_venues_sorted, on = 'Neighborhood')

SF_merged.head()

Unnamed: 0,Zip Code,Neighborhood,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,94102,Hayes Valley/Tenderloin/North of Market,San Francisco,37.779329,-122.41915,3,Park,Plaza,Farmers Market,Field,Food Truck
1,94107,Potrero Hill,San Francisco,37.766529,-122.39577,4,Park,Pet Store,Pool,Field,Plaza
2,94108,Chinatown,San Francisco,37.792678,-122.40793,3,Park,Monument / Landmark,Sculpture Garden,Road,Field
3,94110,Inner Mission/Bernal Heights,San Francisco,37.74873,-122.41545,4,Park,Pool,Fish Market,Road,Plaza
4,94114,Castro/Noe Valley,San Francisco,37.758434,-122.43512,1,Scenic Lookout,Garden,Playground,Plaza,Historic Site


In [26]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(SF_merged['Latitude'], SF_merged['Longitude'], SF_merged['Neighborhood'], SF_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Here we can see that there are 3 neighborhoods that are different, and a cluster in the northeast bay that are divided into two groups. Interestingly, there are two neighborhoods in Chinatown that seem to fit into both categories. Let's have a closer look at the data sets.

In [27]:
SF_merged.loc[SF_merged['Cluster Labels'] == 0, SF_merged.columns[[1] + list(range(5, SF_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
10,Visitacion Valley/Sunnydale,0,Baseball Field,Trail,Park,Garden,Playground


In [28]:
SF_merged.loc[SF_merged['Cluster Labels'] == 1, SF_merged.columns[[1] + list(range(5, SF_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Castro/Noe Valley,1,Scenic Lookout,Garden,Playground,Plaza,Historic Site


In [29]:
SF_merged.loc[SF_merged['Cluster Labels'] == 2, SF_merged.columns[[1] + list(range(5, SF_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
8,Twin Peaks-Glen Park,2,Park,Trail,Playground,Road,Plaza


In [30]:
SF_merged.loc[SF_merged['Cluster Labels'] == 3, SF_merged.columns[[1] + list(range(5, SF_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Hayes Valley/Tenderloin/North of Market,3,Park,Plaza,Farmers Market,Field,Food Truck
2,Chinatown,3,Park,Monument / Landmark,Sculpture Garden,Road,Field
5,Western Addition/Japantown,3,Park,Pet Store,Food Truck,Historic Site,Playground
6,Haight-Ashbury,3,Park,Scenic Lookout,Sculpture Garden,Dog Run,Historic Site
7,Inner Richmond,3,Field,Farmers Market,Pet Store,Pool,Baseball Field


In [31]:
SF_merged.loc[SF_merged['Cluster Labels'] == 4, SF_merged.columns[[1] + list(range(5, SF_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Potrero Hill,4,Park,Pet Store,Pool,Field,Plaza
3,Inner Mission/Bernal Heights,4,Park,Pool,Fish Market,Road,Plaza
9,North Beach/Chinatown,4,Park,Trail,Playground,Pet Store,Pool


## Results

This map makes it clear that the neighborhoods we eliminated for not having enough variety in dog-friendly venues were all on the west side of San Francisco, further from the tourist district.

We can see that the yellow and green clusters on the map mostly have parks as a prominent feature. Additionally, the green cluster appears to feature a wealth of outdoor venues with with it might be possible to visit with a dog.

Sunnydale is furthest from the tourist district and appears to have more residential landmars near it, as does Noe Valley. Twin Peaks is close to a plaza, but seems similarly sterile. Both the yellow and green clusters have ready access to a pet store, which is important for ease of keeping the dogs fed.

## Discussion

It seems that both the yellow and green clusters are located closest to the tourist areas of San Francisco, where our dog breeder stakeholders can easily socialize their dogs and advertise them to visiting tourists and locals alike. This makes them both ideal for setting up shop. Critically, the clusters closest to Chinatown and Japantown feature parks, open-air venues, and a pet store. This makes Chinatown and Japantown the optimal neighborhoods into which to move to maximize our stakeholders' resources to care for the dogs as well as to expose the dogs to enough foot traffic to increase their likelihood of being adopted.

## Conclusion

The purpose of this project was to determine the best neighborhood in San Francisco in which to set up a dog breeding business for beagles. Given beagles' need for activity and socialization, we tried to find desirable neighborhoods by looking for neighborhoods near places with parks and open-air venues where the beagles could get activity, socialize with people, and meet potential new owners. We also tried to find nearby pet stores to make it easier to get the resources necessary to care for each dog. After creating a dataframe containing neighborhoods' zip codes and location, we used the Foursquare API to call nearby venues, then cleaned the dataframe to include only the dog-friendly ones. We also eliminated venues with limited variety of venues since the dogs and the owners benefit from variation. Finally, we clustered the remaining neighborhoods by similar characteristics and mapped them, revealing that the neighborhoods with variety were in the east bay, which is closer to tourist districts. Two clusters showed proximity to many open-air venues. Chinatown showed that it encompassed two zip codes in different clusters, and featured additional proximity to a pet store, which would be useful for furnishing the beagles with necessary toys, food, and bedding. We concluded that Chinatown is the best place to set up a dog-breeding business in San Francisco.