## Introduction

#### Background
The Chicagoland area is one of the largest metropolitan areas in the United States.  Outside the city limits of Chicago there is a collar of suburbs that many people call home.  While the suburbs are very similar in some respects, they also vary in many different ways including home values and nearby business and entertainment venues.  This project will explore and compare the suburbs of Chicago in Cook County using median home value data from Zillow and venue location data from Foursquare.  

#### Description of Project
Different suburbs in the Chicagoland area have very different home values and very different amenities available.  There are many variables in determining the price of a home, one of those variables would be the available amenities.  In this case we will attempt to compare the suburbs of Chicago using Foursquare location data to give an idea of what neighborhoods have similar entertainment and other venues available.  This data will then be cross referenced to home value prices of those neighborhoods.  Analysis of this data should show neighborhoods that might be desirable because they include the same amenities as higher priced neighborhoods.  While other variables such as school rankings play a large role in determining home prices in various neighborhoods for many people availble entertainment venues in a neighborhood may be the most important variable to them.  By exploring this data one could look for neighborhoods where they can buy a more affordable home while still having access to a similar set of amenities that are available in some of the more expensive suburbs.
    
#### Target Audience
The results are useful to anyone involved in real estate transactions in the Chicago suburbs, people looking to research Chicago suburban neighborhoods to buy a home, investors looking for possible up and coming neighborhoods which may be more likely to experience rapid appreciation of property values and those interested in discovering neighborhoods in the Chicago suburbs that are similar to other neighborhoods they are more familiar with.

Neighborhoods are always changing and often business investment in an area preceeds gains in housing prices.  If a suburb is becoming popular and businesses are opening new exciting experiences in the area, home prices in that area may begin to increase as people move to that area to be closer to those amenities.  By comparing the foursquare location data of different suburbs with median home values it may be possible to find neighborhoods that have had significant business investment but haven't seen a large increase in home values yet.  This data could be used to target neighborhoods for investment in housing to meet the demand of new buyers who will want to move to this area to take advantage of the available amenities.

## Data

For this project data will be used from Zillow and Foursquare in conjunction with latitude and longitude data from arcgis.

#### Median Home Value Index
Zillow provides detailed information about real estate in the United States.  Zillow uses available information from various sources about properties such as school rankings, previous sale prices, current listing prices, neighborhood crime, and much more to allow Zillow to estimate home values throughout the United States.  Additionally they can compile this data by zip code, state or other boundaries to estimate median home values for specific neighborhoods or states.

#### Service and Entertainment Venues
Foursquare provides data on entertainment and services venues available in requested locations.  Foursquare has a detailed database of businesses throughout the United States with information about these businesses such as location and type of business.

#### Importation, Collection and Processing of Data follows Below

#### Import Libraries

In [1]:
!conda install -c conda-forge geocoder
import geocoder
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geocoder:        1.38.1-py_1       conda-forge
    ratelim:         0.1.6-py_2        conda-forge

The following packages will be UPDATED:

    

#### Import Zillow home value data

In [2]:
def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_0a2158cf8e42429fbd9b9213092af051 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='q9g_olu0u0y4lDU69-w-QJyPLFr7s_lWpTV6jOEUy_C5',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_0a2158cf8e42429fbd9b9213092af051.get_object(Bucket='capstonefinalproject-donotdelete-pr-toqvrdatdw1hqs',Key='homevalues.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body)
homevaluesdf = df_data_1

#### Get latitude and longitude data for suburbs using arcgis

In [3]:
#Make lists to store Lat and Long data
Lat = []
Lon = []

#loop to get lat and long from Zip Code in dataframe
for index, row in homevaluesdf.iterrows():
    g = geocoder.arcgis(row['Zip Code'])
    Lat.append(g.latlng[0])
    Lon.append(g.latlng[1])
    
#add lat and long data to dataframe
homevaluesdf['Latitude'] = Lat
homevaluesdf['Longitude'] = Lon
homevaluesdf

Unnamed: 0,Zip Code,City,Zillow Home Value Index,Latitude,Longitude
0,60803,Alsip,172200,41.671395,-87.733745
1,60004,Arlington Heights,320200,42.111150,-87.980430
2,60005,Arlington Heights,293200,42.069047,-87.992100
3,60010,Barrington,473900,42.110325,-88.157915
4,60104,Bellwood,159800,41.881870,-87.870935
5,60163,Berkeley,188900,41.886505,-87.908110
6,60402,Berwyn,205500,41.836362,-87.782985
7,60406,Blue Island,123500,41.664450,-87.686075
8,60455,Bridgeview,191100,41.745127,-87.806882
9,60155,Broadview,178900,41.858080,-87.855250


#### Import Libraries

In [4]:
import numpy as np

import json

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

!conda install requests
import requests

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          91 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0   conda-forge
    geopy:         1.20.0-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.20.0         | 57 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Solving environ

#### Map Chicago Suburbs

In [5]:
#Get Chicago Coordinates
address = 'Chicago, IL'
c = geocoder.arcgis(address)
latitude = c.latlng[0]
longitude = c.latlng[1]
print('The geograpical coordinates of Chicago are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Chicago are 41.884250000000065, -87.63244999999995.


In [6]:
# create map of Chicago suburbs with markers using latitude and longitude values
map_chicago = folium.Map(location=[latitude, longitude], zoom_start=9)

# add markers to map
for lat, lng, zipcode, suburb in zip(homevaluesdf['Latitude'], homevaluesdf['Longitude'], homevaluesdf['Zip Code'], homevaluesdf['City']):
    label = '{}, {}'.format(suburb, zipcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)

## Cook County Suburbs of Chicago

In [7]:
map_chicago

#### Get Foursquare Location Data For Chicago Suburbs

In [8]:
# @hidden_cell
CLIENT_ID = '4LFX3MOVXA5D4JNOLH5AH1LIXMICNOYA5LQZ3Z4YYD1TI5KX' # Foursquare ID
CLIENT_SECRET = 'C1KG0G4NB51RUD2DVWEBQITUGDEMGMUYGRPDXWKR3LVFOMLK' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [9]:
LIMIT = 100 # limit number of venues

In [10]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
suburb_venues = getNearbyVenues(names=homevaluesdf['Zip Code'],
                                   latitudes=homevaluesdf['Latitude'],
                                   longitudes=homevaluesdf['Longitude']
                                  )

60803
60004
60005
60010
60104
60163
60402
60406
60455
60155
60513
60459
60409
60411
60415
60804
60478
60445
60016
60018
60007
60707
60201
60202
60203
60805
60422
60130
60131
60025
60026
60426
60429
60457
60162
60169
60192
60456
60430
60458
60043
60525
60526
60438
60439
60712
60534
60443
60153
60160
60053
60056
60714
60706
60062
60164
60452
60453
60302
60304
60301
60461
60462
60467
60067
60074
60463
60465
60464
60068
60469
60070
60471
60305
60171
60546
60472
60008
60193
60194
60173
60195
60176
60076
60077
60473
60475
60165
60107
60501
60476
60477
60487
60154
60558
60090
60480
60091
60093
60482


#### One Hot Encoding Foursquare Data

In [13]:
suburb_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,60803,41.671395,-87.733745,Burrito Jalisco,41.674929,-87.739512,Mexican Restaurant
1,60803,41.671395,-87.733745,LA Fitness,41.673129,-87.740504,Gym / Fitness Center
2,60803,41.671395,-87.733745,DoubleTree by Hilton Hotel Chicago - Alsip,41.662869,-87.744492,Hotel
3,60803,41.671395,-87.733745,Jerry's Hockey,41.663567,-87.743044,Sporting Goods Shop
4,60803,41.671395,-87.733745,Dollar Tree,41.674474,-87.739088,Discount Store


In [14]:
# one hot encoding
suburb_onehot = pd.get_dummies(suburb_venues[['Venue Category']], prefix="", prefix_sep="")

# add zip code column back to dataframe
suburb_onehot['Zip Code'] = suburb_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [suburb_onehot.columns[-1]] + list(suburb_onehot.columns[:-1])
suburb_onehot = suburb_onehot[fixed_columns]

suburb_onehot.head()

Unnamed: 0,Zip Code,ATM,Accessories Store,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Water Park,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,60803,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,60803,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,60803,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,60803,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,60803,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [15]:
suburb_onehot.shape

(6524, 344)

In [16]:
# group venues by zip code
suburb_grouped = suburb_onehot.groupby('Zip Code').mean().reset_index()
suburb_grouped

Unnamed: 0,Zip Code,ATM,Accessories Store,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Water Park,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,60004,0.000000,0.000000,0.0,0.000000,0.012500,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.012500,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.00
1,60005,0.000000,0.000000,0.0,0.000000,0.010000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.010000,0.000000,0.000000,0.010000,0.010000,0.00,0.00
2,60007,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.032258,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.00
3,60008,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.00
4,60010,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.00
5,60016,0.018868,0.000000,0.0,0.000000,0.037736,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.018868,0.000000,0.000000,0.00,0.00
6,60018,0.010000,0.000000,0.0,0.000000,0.060000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.00
7,60025,0.000000,0.000000,0.0,0.000000,0.013889,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.00
8,60026,0.000000,0.012195,0.0,0.000000,0.012195,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.012195,0.000000,0.000000,0.000000,0.000000,0.00,0.00
9,60043,0.000000,0.000000,0.0,0.000000,0.032787,0.016393,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00,0.00


In [17]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [18]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Zip Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
suburb_venues_sorted = pd.DataFrame(columns=columns)
suburb_venues_sorted['Zip Code'] = suburb_grouped['Zip Code']

for ind in np.arange(suburb_grouped.shape[0]):
    suburb_venues_sorted.iloc[ind, 1:] = return_most_common_venues(suburb_grouped.iloc[ind, :], num_top_venues)

suburb_venues_sorted.head()

Unnamed: 0,Zip Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,60004,Pizza Place,Park,Bakery,Grocery Store,Chinese Restaurant,Liquor Store,Gift Shop,Mobile Phone Shop,Cosmetics Shop,Sushi Restaurant
1,60005,Sandwich Place,Mexican Restaurant,Sushi Restaurant,Italian Restaurant,Pizza Place,Bakery,Thai Restaurant,Bar,Ice Cream Shop,Cosmetics Shop
2,60007,Pizza Place,Mexican Restaurant,Sandwich Place,Japanese Restaurant,Flower Shop,Liquor Store,Bowling Alley,Gas Station,Supermarket,Burger Joint
3,60008,Pizza Place,Sandwich Place,Racetrack,Fast Food Restaurant,Sports Bar,Bank,Coffee Shop,Mexican Restaurant,Salon / Barbershop,Skating Rink
4,60010,Trail,Ice Cream Shop,Nature Preserve,Hockey Arena,Filipino Restaurant,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm


## Methodology

## Results

## Discussion

## Conclusion