# Usha Manoharan
## Capstone Project  - Commuter friendly Neighborhoods of San Francisco, CA, USA

This notebook will be used for the **"Battle of Neighborhoods"** capstone project of the IBM Applied DataScience Professional Certificate course. The purpose of this project is to identify commuter friendly neighborhoods in San Francisco, USA.

I will start by scraping the web page at [opendatasoft](https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/?refine.state=CA&q=san+francisco) to get a list of zip codes along with their latitude and longitude in San Francisco. Next, I will use the Foursquare API to explore the neighborhoods and group them into clusters based on the venue categories that are related to public transportation. I will use the k-means clustering algorithm to cluster the neighborhoods. Finally, I will use the Folium library to visualize the neighborhoods in San Francisco that are commuter friendly.

## Table Of Contents
* [San Francisco Neighborhood](#get-sfdata)
* [Explore San Francisco](#explore-sfdata)
* [Analyze San Francisco](#analyze-sfdata)
* [Visualize San Francisco](#visualize-sfdata)


## San Francisco neighborhood  <a class="anchor" id="get-sfdata"></a>

In [4]:
# load all the libraried needed for the analysis.
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analysis
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

import matplotlib.pyplot as plt

print('All Libraries imported.')

All Libraries imported.


In [45]:
# download/export the zip, lat, long data for SF from the opendatasoft website
# https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/export/?refine.state=CA&q=san+francisco&dataChart=eyJxdWVyaWVzIjpbeyJjb25maWciOnsiZGF0YXNldCI6InVzLXppcC1jb2RlLWxhdGl0dWRlLWFuZC1sb25naXR1ZGUiLCJvcHRpb25zIjp7InEiOiJzYW4gZnJhbmNpc2NvIiwicmVmaW5lLnN0YXRlIjoiQ0EifX0sImNoYXJ0cyI6W3siYWxpZ25Nb250aCI6dHJ1ZSwidHlwZSI6ImNvbHVtbiIsImZ1bmMiOiJBVkciLCJ5QXhpcyI6ImxhdGl0dWRlIiwic2NpZW50aWZpY0Rpc3BsYXkiOnRydWUsImNvbG9yIjoiI0ZGNTE1QSJ9XSwieEF4aXMiOiJzdGF0ZSIsIm1heHBvaW50cyI6NTAsInNvcnQiOiIifV0sInRpbWVzY2FsZSI6IiIsImRpc3BsYXlMZWdlbmQiOnRydWUsImFsaWduTW9udGgiOnRydWV9&location=10,37.60061,-122.53131&basemap=jawg.streets

with open('/Users/umano/Downloads/us-zip-code-latitude-and-longitude.json') as json_data:
    sf_data = json.load(json_data)
sf_data

[{'datasetid': 'us-zip-code-latitude-and-longitude',
  'recordid': 'ebdf016d793fbd1f68f8e7b646dd8b4767a574a4',
  'fields': {'city': 'San Francisco',
   'zip': '94175',
   'dst': 1,
   'geopoint': [37.784827, -122.727802],
   'longitude': -122.727802,
   'state': 'CA',
   'latitude': 37.784827,
   'timezone': -8},
  'geometry': {'type': 'Point', 'coordinates': [-122.727802, 37.784827]},
  'record_timestamp': '2018-02-09T08:33:38.603-08:00'},
 {'datasetid': 'us-zip-code-latitude-and-longitude',
  'recordid': '841e8dccbf9477884281b53066c5642f554d108d',
  'fields': {'city': 'San Francisco',
   'zip': '94160',
   'dst': 1,
   'geopoint': [37.784827, -122.727802],
   'longitude': -122.727802,
   'state': 'CA',
   'latitude': 37.784827,
   'timezone': -8},
  'geometry': {'type': 'Point', 'coordinates': [-122.727802, 37.784827]},
  'record_timestamp': '2018-02-09T08:33:38.603-08:00'},
 {'datasetid': 'us-zip-code-latitude-and-longitude',
  'recordid': '85e3afa1f95184038d8665c37e4694bf555d7be8',

In [47]:
sf_data[0]

{'datasetid': 'us-zip-code-latitude-and-longitude',
 'recordid': 'ebdf016d793fbd1f68f8e7b646dd8b4767a574a4',
 'fields': {'city': 'San Francisco',
  'zip': '94175',
  'dst': 1,
  'geopoint': [37.784827, -122.727802],
  'longitude': -122.727802,
  'state': 'CA',
  'latitude': 37.784827,
  'timezone': -8},
 'geometry': {'type': 'Point', 'coordinates': [-122.727802, 37.784827]},
 'record_timestamp': '2018-02-09T08:33:38.603-08:00'}

In [72]:
# define the dataframe columns
cols = ['Zipcode', 'Latitude', 'Longitude'] 

# instantiate the dataframe
sf = pd.DataFrame(columns=cols)

In [76]:
for data in sf_data:
    sfzip = data['fields']['zip'] 
    sflat = data['fields']['latitude']
    sflong = data['fields']['longitude']
    sf = sf.append({'Zipcode': sfzip,
                            'Latitude': sflat,
                            'Longitude': sflong}, ignore_index=True)

print('The dataframe has {} unique zipcodes'.format(
        len(sf['Zipcode'].unique()),sf.shape[0])
     )
sf.head(5)

The dataframe has 74 unique zipcodes


Unnamed: 0,Zipcode,Latitude,Longitude
0,94175,37.784827,-122.727802
1,94160,37.784827,-122.727802
2,94164,37.784827,-122.727802
3,94131,37.741797,-122.4378
4,94114,37.758434,-122.43512


In [77]:
#Let's pick the 94114 area to map
latitude = 37.758434
longitude = -122.435120

# create map of San Francisco using latitude and longitude values
map_sf = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, zipcode in zip(sf['Latitude'], sf['Longitude'],sf['Zipcode']):
    label = '{}'.format(zipcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sf)  
    
map_sf