# Cousera Capstone Final Project - Find Similar Neighborhood

In [5]:
# import packages
import pandas as pd

# show all columns
pd.set_option('display.max_columns', None)

## 1 Problem Description

Suppose someone is currently living in a neighborhood in **Toronto, ON, Canada**, and is moving to the Santa Clara County to starting her new career.  
Her new work address will be **1330 Geneva Dr, Sunnyvale, CA 94089**.  
She wants to find a similar neighborhood in the Santa Clara County to live in, but she would also weigh commute time.
This project will help people who currently live in Toronto who will move to the Santa Clara County and wish to live in a similar neighborhood with reasonable commute time.

## 2. Data

**The neighborhood data of Santa Clara County** is available on https://www.sccgov.org/sites/phd/hi/hd/Pages/city-profiles.aspx to get all neighborhoods in Santa Clara County.  
For example, we can obtain Neighborhood name, the city name where the neighborhood is located from this data set. 

**Foursquares location data (Places API)** will be used to obtain neighborhood nearby venues data (for example, Venue Category) to find similar neighborhoods in Santa Clara County.  
**Google Maps Geocoding API** will be used to obtain coordinates (latitude and longitude) of the neighborhoods to feed into Foursquares Places API.  
**Google Maps Distance Matrix API** will be used to obtain the commute time from each neighborhood to her new work address.

### A overview of neighborhood data

In [7]:
df_santa_clara = pd.read_excel('final-small-area-neighborhood-data-for-web.xls')
df_santa_clara.head(3)

Unnamed: 0,City,Neighborhood,Population Size,Race/Ethnicity,African American,Asian/Pacific Islander,Latino,White,Foreign-born,Speaks a language other than English at home,Single parent households,Households with children,Average household size,Age Groups,0-5 years,6-11 years,12-17 years,18-24 years,25-34 years,35-44 years,45-54 years,55-64 years,Ages 65 and older,Median household income,Unemployed (ages ≥ 16 years),Poverty,Families below 185% FPL,Children (ages 0-17) below 185% FPL,Children (ages 3-5) enrolled in preschool or nursery school,Educational attainment (ages ≥25),Less than high school,High school graduate,Some college or associates degree,College graduate or higher,"Number of vehicle-pedestrian injury collisions, 10 years","Number of vehicle-bicycle injury collisions, 10 years","Number of motor vehicle collisions, 1 year",Lives within ½ mile of a regional bus/rail/ferry and within ¼ mile of bus/light rail,Percentage of residents who commute to work by mode,Drove alone,Carpooled,Public transportation,Other,Households receiving CalFresh benefits,Average distance (miles) to nearest full-service grocery store,Average distance (miles) to nearest farmers’ market,Number of fast food outlets per square mile,Households with gross rent 30% or more of household income,Overcrowded households,Lives in multi-unit housing,Average distance (miles) to nearest park or open space,Number of tobacco retail outlets per square mile,Average number of violent crimes within 1 mile,Number of alcohol retail outlets per square mile,"Maternal, child and infant health","Births per 1,000 people",Low birth weight infants,Preterm births,Overweight or obese in first trimester of pregnancy,Mothers who received early and adequate prenatal care,"Teen live births per 1,000 females, ages 15-19",Mortality,Life expectancy,"Cancer deaths per 100,000 people","Heart disease deaths per 100,000 people","Alzheimer’s disease deaths per 100,000 people","Stroke deaths per 100,000 people","Chronic lower respiratory disease deaths per 100,000 people","Unintentional injury deaths per 100,000 people","Diabetes deaths per 100,000 people","Influenza and pneumonia deaths per 100,000 people","Hypertension deaths per 100,000 people"
0,San Jose,Almaden Valley,28613,,0.01,0.21,0.15,0.58,0.28,0.36,0.07,0.43,2.952857,,0.07,0.1,0.1,0.06,0.07,0.15,0.18,0.12,0.15,127204,0.04,,0.09,0.14,0.51,,0.06,0.12,0.26,0.56,33,47,47,0.0,,0.79,0.09,0.02,0.1,0.0,2.175714,2.30788,1.6014,0.62,0.03,0.12,0.233461,3.0025,3.354286,2.6022,,7.4675,0.06,0.08,0.36,0.82,8.9903,,84.3255,119.561,125.474,27.1714,21.0522,22.9304,15.0997,21.2062,--,17.9124
1,San Jose,Alum Rock,23670,,0.02,0.41,0.47,0.08,0.54,0.78,0.17,0.48,3.56,,0.1,0.09,0.08,0.1,0.16,0.14,0.13,0.1,0.1,50172,0.08,,0.4,0.55,0.23,,0.34,0.26,0.24,0.17,135,102,109,0.0,,0.76,0.12,0.07,0.06,0.14,0.583333,1.720827,3.5337,0.55,0.24,0.53,0.166138,15.9015,32.88,10.1593,,15.702,0.07,0.11,0.5,0.69,43.049,,82.938,131.299,113.522,47.8295,28.2585,23.7045,20.8876,36.819,--,20.6205
2,San Jose,Alum Rock / East Foothills,9309,,0.03,0.19,0.33,0.42,0.24,0.38,0.07,0.33,3.0,,0.07,0.07,0.07,0.08,0.11,0.13,0.17,0.14,0.15,104155,0.05,,0.1,0.17,0.44,,0.13,0.18,0.29,0.4,11,14,11,0.0,,0.81,0.05,0.02,0.12,0.01,1.77,1.967254,0.1397,0.54,0.05,0.06,0.472141,0.1397,3.62,0.1397,,10.7423,0.05,0.1,0.46,0.69,25.2324,,83.8465,136.396,131.64,36.7919,--,--,--,--,--,--


In this project, we will only use **City** and **Neighborhood** from this data set