<h1 align=center><font size = 5>Battle of Neighborhood in Atlanta</font></h1>

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in Atlanta</a>

3. <a href="#item3">Cluster Neighborhoods</a>

4. <a href="#item4">Examine Clusters</a>
  
</font>
</div>

In [13]:
!conda install -c conda-forge folium=0.5.0 --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-2.2.2               |           py35_1         462 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.0 MB

The following NEW packages will

## 1. Download and Explore Dataset

In [14]:
import requests
import pandas as pd
from geopy.geocoders import Nominatim
import folium
import numpy as np 
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from sklearn import preprocessing

load the neighborhood data by using read_html function in pandas

In [5]:
url = 'https://en.wikipedia.org/wiki/Table_of_Atlanta_neighborhoods_by_population'

In [6]:
df=pd.read_html(url,attrs = {'class': 'wikitable sortable'}, header=0)
df=pd.DataFrame(df[0])
df['Neighborhood']=df['Neighborhood'].astype(str)+', Atlanta'
address = df['Neighborhood'].values.tolist()

In [7]:
column_names = ['Neighborhood', 'Latitude', 'Longitude'] 

df_Atlanta = pd.DataFrame(columns=column_names)

Use geopy library to get the latitude and longitude values of Neighborhoods in Atlanta.

In [9]:
geolocator = Nominatim(user_agent="capstoneproject")
for i in address: 
    location = geolocator.geocode(i)
    
    if location is None: 
        location = geolocator.geocode("{},Atlanta, GA".format(i))
    if location is None: 
        print("The location data of {} is not available!".format(i))
    else:
        latitude = location.latitude
        longitude = location.longitude
        
        df_Atlanta = df_Atlanta.append({'Neighborhood': i,
                                          'Latitude': latitude,
                                          'Longitude': longitude}, ignore_index=True)

The location data of Almond Park, Atlanta is not available!
The location data of Arlington Estates, Atlanta is not available!
The location data of Baker Hills, Atlanta is not available!
The location data of Betmar LaVilla, Atlanta is not available!
The location data of Capitol Gateway, Atlanta is not available!
The location data of Carver Hills, Atlanta is not available!
The location data of Cascade Avenue/Road, Atlanta is not available!
The location data of Chosewood Park, Atlanta is not available!
The location data of Custer/McDonough/Guice, Atlanta is not available!
The location data of Druid Hills (City of Atlanta), Atlanta is not available!
The location data of East Chastain Park, Atlanta is not available!
The location data of Fairburn Mays, Atlanta is not available!
The location data of Fairway Acres, Atlanta is not available!
The location data of Glenrose Heights, Atlanta is not available!
The location data of Harland Terrace, Atlanta is not available!
The location data of Harri

In [10]:
df_Atlanta.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Adair Park, Atlanta",33.724974,-84.411429
1,"Adams Park, Atlanta",33.712052,-84.456873
2,"Adamsville, Atlanta",33.759274,-84.505209
3,"Ansley Park, Atlanta",33.79455,-84.376315
4,"Ardmore, Atlanta",33.806282,-84.400028


In [11]:
address = 'Atlanta, GA'
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Atlanta are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Atlanta are 33.7490987, -84.3901849.


Create a map of Atlanta with neighborhoods superimposed on top.

In [15]:
map_Atlanta = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_Atlanta['Latitude'], df_Atlanta['Longitude'], df_Atlanta['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Atlanta)  
    
map_Atlanta

In [18]:
# @hidden_cell
CLIENT_ID = 'G2T5UW3LY2XD3V5PCGF52ORGOCGZJ0BE3MVFE1EQ0DQMWGBL' # your Foursquare ID
CLIENT_SECRET = 'EWPYHMFSIMPBLESGBY05RVHZU55BUXV5UMMRMFXEKTSHH45K' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

## 2. Explore Neighborhoods in Atlanta

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=15000, limit=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
Atlanta_venues = getNearbyVenues(names=df_Atlanta['Neighborhood'],
                                   latitudes=df_Atlanta['Latitude'],
                                   longitudes=df_Atlanta['Longitude']
                                  )

Adair Park, Atlanta
Adams Park, Atlanta
Adamsville, Atlanta
Ansley Park, Atlanta
Ardmore, Atlanta
Argonne Forest, Atlanta
Adair Park, Atlanta
Adams Park, Atlanta
Adamsville, Atlanta
Ansley Park, Atlanta
Ardmore, Atlanta
Argonne Forest, Atlanta
Ashview Heights, Atlanta
Atlanta University Center, Atlanta
Atlantic Station, Atlanta
Audubon Forest, Atlanta
Bankhead, Atlanta
Beecher Hills, Atlanta
Ben Hill, Atlanta
Ben Hill Terrace, Atlanta
Benteen Park, Atlanta
Berkeley Park, Atlanta
Bolton, Atlanta
Boulevard Heights, Atlanta
Brandon, Atlanta
Brookhaven (City of Atlanta), Atlanta
Brookwood, Atlanta
Brookwood Hills, Atlanta
Browns Mill Park, Atlanta
Buckhead Forest, Atlanta
Buckhead Village, Atlanta
Cabbagetown, Atlanta
Campbellton Road, Atlanta
Candler Park, Atlanta
Capitol View, Atlanta
Capitol View Manor, Atlanta
Carey Park, Atlanta
Carroll Heights, Atlanta
Cascade Heights, Atlanta
Castleberry Hill, Atlanta
Castlewood, Atlanta
Center Hill, Atlanta
Chastain Park, Atlanta
Collier Heights, A

Let's check the size of the resulting dataframe

In [21]:
Atlanta_venues=Atlanta_venues[Atlanta_venues['Venue Category'].str.contains("Restaurant")]
print(Atlanta_venues.shape)
Atlanta_venues.head()

(3270, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
11,"Adair Park, Atlanta",33.724974,-84.411429,Nick's Food To Go,33.747492,-84.379749,Greek Restaurant
14,"Adair Park, Atlanta",33.724974,-84.411429,Aviva by Kameel,33.760597,-84.386548,Mediterranean Restaurant
20,"Adair Park, Atlanta",33.724974,-84.411429,Six Feet Under Pub & Fish House,33.746555,-84.372869,Seafood Restaurant
24,"Adair Park, Atlanta",33.724974,-84.411429,Alma Cocina,33.75907,-84.387536,Latin American Restaurant
29,"Adair Park, Atlanta",33.724974,-84.411429,Agave,33.747968,-84.36802,Mexican Restaurant


Group and count the restaurant by type in each neighborhood 

In [22]:
Atlanta_onehot = pd.get_dummies(Atlanta_venues[['Venue Category']], prefix="", prefix_sep="")

Atlanta_onehot['Neighborhood'] = Atlanta_venues['Neighborhood']

fixed_columns = [Atlanta_onehot.columns[-1]] + list(Atlanta_onehot.columns[:-1])
Atlanta_onehot = Atlanta_onehot[fixed_columns]

Atlanta_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Comfort Food Restaurant,...,Restaurant,Seafood Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
11,"Adair Park, Atlanta",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
14,"Adair Park, Atlanta",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
20,"Adair Park, Atlanta",0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
24,"Adair Park, Atlanta",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
29,"Adair Park, Atlanta",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Count the total restaurants in each neighborhood

In [23]:
df=Atlanta_venues[['Neighborhood','Venue Category']].groupby('Neighborhood').count()
df.sort_values(by='Venue Category',ascending=False).head()

Unnamed: 0_level_0,Venue Category
Neighborhood,Unnamed: 1_level_1
"Adamsville, Atlanta",60
"Adams Park, Atlanta",58
"Adair Park, Atlanta",52
"Ardmore, Atlanta",46
"Ansley Park, Atlanta",44


Add the total restaurant count into the dataframe

In [24]:
Atlanta_grouped = Atlanta_onehot.groupby('Neighborhood').sum().reset_index()
Atlanta_grouped.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Comfort Food Restaurant,...,Restaurant,Seafood Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,"Adair Park, Atlanta",0,8,0,0,0,0,0,0,0,...,4,4,0,4,0,0,2,2,2,2
1,"Adams Park, Atlanta",0,4,0,2,0,0,0,0,0,...,4,4,0,2,0,2,2,2,2,2
2,"Adamsville, Atlanta",0,6,0,2,0,0,0,0,0,...,0,6,0,8,0,0,2,4,4,2
3,"Ansley Park, Atlanta",0,8,0,0,0,0,0,0,0,...,2,4,0,6,0,0,2,6,0,0
4,"Ardmore, Atlanta",0,4,0,0,2,0,0,0,0,...,2,4,0,6,0,2,2,6,0,0


list the top 5 most popular restaurant

In [25]:
df1=Atlanta_venues[['Venue','Venue Category']]
df1=df1.drop_duplicates()
df2=df1.groupby('Venue Category').count().sort_values(by='Venue',ascending=False)
df2.head()

Unnamed: 0_level_0,Venue
Venue Category,Unnamed: 1_level_1
American Restaurant,36
Mexican Restaurant,29
Italian Restaurant,18
Seafood Restaurant,14
Vietnamese Restaurant,13


In [26]:
df3=pd.DataFrame(columns=['Neighborhood','American Restaurant','Total'])
df3[['Neighborhood','American Restaurant']]=Atlanta_grouped[['Neighborhood','American Restaurant']]
df3['Total']=Atlanta_grouped.sum(axis=1)
df3.head()

Unnamed: 0,Neighborhood,American Restaurant,Total
0,"Adair Park, Atlanta",8,52
1,"Adams Park, Atlanta",4,58
2,"Adamsville, Atlanta",6,60
3,"Ansley Park, Atlanta",8,44
4,"Ardmore, Atlanta",4,46


## 3. Clustering Neighborhood

In [27]:
# set number of clusters
kclusters = 8

Atlanta_grouped_clustering = df3.drop('Neighborhood', 1)
normalized_Atlanta_grouped_clustering = preprocessing.normalize(Atlanta_grouped_clustering)

#Atlanta_grouped_clustering
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(normalized_Atlanta_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 5, 1, 2, 5, 6, 2, 1, 4, 5], dtype=int32)

In [28]:
df3['Cluster Lables']=kmeans.labels_

In [29]:
Atlanta_merged = df_Atlanta

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Atlanta_merged = Atlanta_merged.join(df3.set_index('Neighborhood'), on='Neighborhood')

Atlanta_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,American Restaurant,Total,Cluster Lables
0,"Adair Park, Atlanta",33.724974,-84.411429,8,52,4
1,"Adams Park, Atlanta",33.712052,-84.456873,4,58,5
2,"Adamsville, Atlanta",33.759274,-84.505209,6,60,1
3,"Ansley Park, Atlanta",33.79455,-84.376315,8,44,2
4,"Ardmore, Atlanta",33.806282,-84.400028,4,46,5


In [30]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Atlanta_merged['Latitude'], Atlanta_merged['Longitude'], Atlanta_merged['Neighborhood'], Atlanta_merged['Cluster Lables']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 4. Exam Clusters

In [31]:
Atlanta_merged.loc[Atlanta_merged['Cluster Lables'] == 7, Atlanta_merged.columns[[0,3,4] + list(range(5, Atlanta_merged.shape[1]))]]

Unnamed: 0,Neighborhood,American Restaurant,Total,Cluster Lables
18,"Ben Hill, Atlanta",1,30,7
19,"Ben Hill Terrace, Atlanta",1,30,7
45,"Cross Creek, Atlanta",1,22,7
91,"Paces, Atlanta",1,24,7
92,"Peachtree Heights East, Atlanta",1,23,7
93,"Peachtree Heights West, Atlanta",1,23,7
118,"Tuxedo Park, Atlanta",1,26,7
119,"Underwood Hills, Atlanta",1,22,7
