# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

* [Introduction: Business Problem](#introduction)

* [Data](#data)

* [Methodology](#methodology)

* [Analysis](#analysis)

* [Results and Discussion](#results)

* [Conclusion](#conclusion)

    </font>
    </div>

## 1. Introduction: Business Problem <a name="introduction"></a>

In this project we will try to compare the neighborhood cities of a particular state in the US i.e. how similar & dissimilar they are. Specifically, this report will be targeted to stakeholders interested in opening different types of businesses as they see fit in any particular city of US.

Since there are lots of businesses in US we will try to detect **top businesses in any particular city**. We are also particularly interested in **a certain business which is not so popular in a particular city**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promising neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## 2. Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of any particular existing business in the city.
* distance of neighborhood from city center.

Following data sources will be needed to extract/generate the required information:
* list of cities in all of the states in the US obtained from webscraping of Britannica website and then forming it into a dataframe(.csv file)
* number of businesses and their type and location in every neighborhood city will be obtained using **Foursquare API**

#### Before we get the data and start exploring it, let's import all the dependencies that we will need.

In [1]:
import pandas as pd # for data processing
import numpy as np # library to handle data in a vectorized manner
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
import folium   #for creating maps
import requests  #for retreiving Information from URL

import geocoder
from geopy.geocoders import Nominatim  #converting address to cordinates
from pandas.io.json import json_normalize #converting json to DataFrame

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Libraries imported.


#### Getting the dataframe of all the US states and its cities. (Source:Britannica) (Already Webscrapped and converted into a CSV file for efficient execution)

In [2]:
df1_usa = pd.read_csv('file1.csv')
df1_usa.head()

Unnamed: 0,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,Florida,Georgia,Hawaii,Idaho,Illinois,Indiana,Iowa,Kansas,Kentucky,Louisiana,Maine,Maryland,Massachusetts,Michigan,Minnesota,Mississippi,Missouri,Montana,Nebraska,Nevada,New Hampshire,New Jersey,New Mexico,New York,North Carolina,North Dakota,Ohio,Oklahoma,Oregon,Pennsylvania,Rhode Island,South Carolina,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
0,Alexander City,Anchorage,Ajo,Arkadelphia,Alameda,Alamosa,Ansonia,Dover,Apalachicola,Albany,Hanalei,Blackfoot,Alton,Anderson,Amana Colonies,Abilene,Ashland,Abbeville,Auburn,Aberdeen,Abington,Adrian,Albert Lea,Bay Saint Louis,Boonville,Anaconda,Beatrice,Boulder City,Berlin,Asbury Park,Acoma,Albany,Asheboro,Bismarck,Akron,Ada,Albany,Abington,Barrington,Abbeville,Aberdeen,Alcoa,Abilene,Alta,Barre,Abingdon,Aberdeen,Bath,Appleton,Buffalo
1,Andalusia,Cordova,Avondale,Arkansas Post,Alhambra,Aspen,Berlin,Lewes,Bartow,Americus,Hilo,Boise,Arlington Heights,Bedford,Ames,Arkansas City,Barbourville,Alexandria,Augusta,Annapolis,Adams,Alma,Alexandria,Biloxi,Branson,Billings,Bellevue,Carson City,Claremont,Atlantic City,Alamogordo,Amsterdam,Asheville,Devils Lake,Alliance,Altus,Ashland,Aliquippa,Bristol,Aiken,Belle Fourche,Athens,Alpine,American Fork,Bellows Falls,Alexandria,Anacortes,Beckley,Ashland,Casper
2,Anniston,Fairbanks,Bisbee,Batesville,Anaheim,Aurora,Bloomfield,Milford,Belle Glade,Andersonville,Honaunau,Bonners Ferry,Arthur,Bloomington,Boone,Atchison,Bardstown,Bastrop,Bangor,Baltimore,Amesbury,Ann Arbor,Austin,Canton,Cape Girardeau,Bozeman,Boys Town,Elko,Concord,Bayonne,Albuquerque,Auburn,Bath,Dickinson,Ashtabula,Alva,Astoria,Allentown,Central Falls,Anderson,Brookings,Chattanooga,Amarillo,Bountiful,Bennington,Bristol,Auburn,Bluefield,Baraboo,Cheyenne
3,Athens,Haines,Casa Grande,Benton,Antioch,Boulder,Branford,New Castle,Boca Raton,Athens,Honolulu,Caldwell,Aurora,Columbus,Burlington,Chanute,Berea,Baton Rouge,Bar Harbor,Bethesda-Chevy Chase,Amherst,Battle Creek,Bemidji,Clarksdale,Carthage,Butte,Chadron,Ely,Derry,Bloomfield,Artesia,Babylon,Beaufort,Fargo,Athens,Anadarko,Baker City,Altoona,Cranston,Beaufort,Canton,Clarksville,Arlington,Brigham City,Brattleboro,Charlottesville,Bellevue,Buckhannon,Belmont,Cody
4,Atmore,Homer,Chandler,Blytheville,Arcadia,Breckenridge,Bridgeport,Newark,Bradenton,Atlanta,Kahului,Coeur d’Alene,Belleville,Connersville,Cedar Falls,Coffeyville,Boonesborough,Bogalusa,Bath,Bowie,Andover,Bay City,Bloomington,Columbia,Chillicothe,Dillon,Columbus,Fallon,Dover,Bordentown,Belen,Batavia,Boone,Grand Forks,Barberton,Ardmore,Beaverton,Ambridge,East Greenwich,Camden,Custer,Cleveland,Austin,Cedar City,Burlington,Chesapeake,Bellingham,Charles Town,Beloit,Douglas


#### Selecting a state below from the list of 50 states in the above dataframe.

In [3]:
col_list= ["Alabama"]
x = 'Alabama'
state_data = pd.read_csv("file1.csv", usecols=col_list)
temp_df = state_data
temp1_df = temp_df.rename(columns={x: "Neighborhood"})
state_df= temp1_df.dropna()
state_df

Unnamed: 0,Neighborhood
0,Alexander City
1,Andalusia
2,Anniston
3,Athens
4,Atmore
5,Auburn
6,Bessemer
7,Birmingham
8,Chickasaw
9,Clanton


#### Get the geographical coordinates of all the neighboring cities of that state.

In [4]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, {}, USA'.format(neighborhood, x)) #Here x is the state you entered.
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [5]:
coords = [ get_latlng(neighborhood) for neighborhood in state_df["Neighborhood"].tolist() ]
coords

[[32.93884000000003, -85.95294999999999],
 [31.320140000000038, -86.49448999999998],
 [33.65712000000008, -85.81890999999996],
 [34.804500000000075, -86.97127999999998],
 [31.02526000000006, -87.49379999999996],
 [32.60829000000007, -85.48172999999997],
 [33.402030000000025, -86.95398999999998],
 [33.52068000000003, -86.81175999999994],
 [30.76461000000006, -88.07475999999997],
 [32.840850000000046, -86.63201999999995],
 [34.17437000000007, -86.84344999999996],
 [34.60740000000004, -86.97978999999998],
 [32.50475000000006, -87.83780999999999],
 [31.223250000000064, -85.39336999999995],
 [31.315150000000074, -85.85461999999995],
 [31.894980000000032, -85.14584999999994],
 [34.80060000000003, -87.67488999999995],
 [34.441310000000044, -85.71436999999997],
 [34.014770000000055, -86.00716999999997],
 [31.829440000000034, -86.63360999999998],
 [34.35143000000005, -86.29893999999996],
 [34.72929000000005, -86.58509999999995],
 [33.83128000000005, -87.28084999999999],
 [32.63467000000003, -87

In [6]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

#### Convert it into a dataframe and quickly examine the resulting dataframe.

In [7]:
# merge the coordinates into the original dataframe
state_df['Latitude'] = df_coords['Latitude']
state_df['Longitude'] = df_coords['Longitude']
print(state_df.shape)
state_df

(39, 3)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  state_df['Latitude'] = df_coords['Latitude']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  state_df['Longitude'] = df_coords['Longitude']


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Alexander City,32.93884,-85.95295
1,Andalusia,31.32014,-86.49449
2,Anniston,33.65712,-85.81891
3,Athens,34.8045,-86.97128
4,Atmore,31.02526,-87.4938
5,Auburn,32.60829,-85.48173
6,Bessemer,33.40203,-86.95399
7,Birmingham,33.52068,-86.81176
8,Chickasaw,30.76461,-88.07476
9,Clanton,32.84085,-86.63202


#### Use geopy library to get the latitude and longitude values of the selected state.

In [8]:
# get the coordinates of the state
address = '{}, USA'.format(x)

geolocator = Nominatim(user_agent="x_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {}, USA {}, {}.'.format(x, latitude, longitude))

The geograpical coordinate of Alabama, USA 33.2588817, -86.8295337.


#### Creating a map of the selected state with neighboring cities superimposed on top.

In [9]:
# create map of Toronto using latitude and longitude values
map_x = folium.Map(location=[latitude, longitude], zoom_start=5)

# add markers to map
for lat, lng, neighborhood in zip(state_df['Latitude'], state_df['Longitude'], state_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='purple',
        fill=True,
        fill_color='#6600cc',
        fill_opacity=0.7).add_to(map_x)  
    
map_x

#### Define your Foursquare Credentials.

In [10]:
CLIENT_ID = 'I1EWMQ1R1WILTIR2QLXNWSL5FTED2B2LSKBXBZIPKWZCWPPJ' # replace it with your Client id
CLIENT_SECRET = 'YF0LMXIW1I0C0E1BPY0FEMZPPDBYUPKQW5NHOUPHYE52G52D' # replace it with your client secret
VERSION = '20180605' # Foursquare API version

#### Now, lets get the top 1400 venues that are within a radius of 15 KMs or 9.3 Miles.

In [11]:
radius = 15000  #You can change these according to your preferences.
LIMIT = 1400

venues = []

for lat, long, neighborhood in zip(state_df['Latitude'], state_df['Longitude'], state_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['distance'], 
            venue['venue']['location']['lat'],  
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [12]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'Distance', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head(50)

(2700, 8)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,Distance,VenueLatitude,VenueLongitude,VenueCategory
0,Alexander City,32.93884,-85.95295,JR's Sports Bar & Grill,607,32.944276,-85.952404,American Restaurant
1,Alexander City,32.93884,-85.95295,La Posada Mexican Grill,1729,32.926987,-85.96491,Mexican Restaurant
2,Alexander City,32.93884,-85.95295,Ruby Tuesday,1786,32.923797,-85.959622,American Restaurant
3,Alexander City,32.93884,-85.95295,Wind Creek State Park Campground,9625,32.855093,-85.927328,Campground
4,Alexander City,32.93884,-85.95295,MAPCO Mart,2509,32.916299,-85.952386,Gas Station
5,Alexander City,32.93884,-85.95295,Anytime Fitness,717,32.94523,-85.95396,Gym / Fitness Center
6,Alexander City,32.93884,-85.95295,Dollar General,961,32.931278,-85.947983,Discount Store
7,Alexander City,32.93884,-85.95295,Wind Creek State Park,9541,32.85671,-85.923739,State / Provincial Park
8,Alexander City,32.93884,-85.95295,Jim Bob's,1090,32.92991,-85.948142,American Restaurant
9,Alexander City,32.93884,-85.95295,Subway,2193,32.919853,-85.959221,Sandwich Place


### Moving on to the number of venues that were returned for neighboring cities.

In [13]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 218 uniques categories.


#### Sorting & assigning each venue category a unique venue number.

In [14]:
sortedvenues = venues_df.sort_values('VenueCategory')
sortedvenues['VenueNumber']=sortedvenues.groupby(['VenueCategory']).ngroup()
sortedvenues.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,Distance,VenueLatitude,VenueLongitude,VenueCategory,VenueNumber
96,Anniston,33.65712,-85.81891,Bama Fever Tiger Pride,6358,33.60793,-85.784037,Accessories Store,0
1974,Ozark,31.45895,-85.64071,Hunt Stagefield,10732,31.377644,-85.579995,Airport,1
874,Demopolis,32.50475,-87.83781,Demopolis Municipal Airport,11802,32.46205,-87.952852,Airport,1
1975,Ozark,31.45895,-85.64071,Molinelli,13814,31.481746,-85.783731,Airport,1
1062,Enterprise,31.31515,-85.85462,Cairns Army Airfield,13333,31.282587,-85.719722,Airport,1


### Creating a map for visualizing all the different venues in different cities with there unique venue number & colour i.e. how each city is different from each other for the particular selected state.

In [18]:
# create map
map_k = folium.Map(location=[latitude, longitude], zoom_start=7)

# set color scheme for the clusters
vlength = len(venues_df['VenueCategory'].unique())
x = np.arange(vlength)
ys = [i + x + (i*x)**2 for i in range(vlength)]
colors_array = iter(cm.rainbow(np.linspace(0, 1, len(ys))))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []

for lat, lon, cate, nos, city in zip(sortedvenues.VenueLatitude, sortedvenues.VenueLongitude, sortedvenues.VenueCategory, sortedvenues.VenueNumber, sortedvenues.Neighborhood):
    label = 'Category:{}, Category No.:{}, City:{}'.format(cate, nos, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[nos-1],
        fill=True,
        fill_opacity=1,
        fill_color=rainbow[nos-1]).add_to(map_k)
       
map_k

### You can also select a particular city for seeing the top venues around the neighborhood.

In [17]:
y = 'Birmingham'
x_data = venues_df[venues_df['Neighborhood'] == y].reset_index(drop=True)
x_data.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,Distance,VenueLatitude,VenueLongitude,VenueCategory
0,Birmingham,33.52068,-86.81176,Birmingham Museum Of Art,220,33.522311,-86.810409,Art Museum
1,Birmingham,33.52068,-86.81176,El Barrio,910,33.516913,-86.803053,Mexican Restaurant
2,Birmingham,33.52068,-86.81176,Birmingham Civil Rights Institute,574,33.516083,-86.814564,History Museum
3,Birmingham,33.52068,-86.81176,McWane Science Center,668,33.515297,-86.808559,Science Museum
4,Birmingham,33.52068,-86.81176,The Collins Bar,869,33.516507,-86.803835,Bar


### Moving on to the number of venues that were returned for the selected city.

In [35]:
print('There are {} uniques categories.'.format(len(x_data['VenueCategory'].unique())))

There are 59 uniques categories.


#### Sorting & assigning each venue category a unique venue number.

In [18]:
sortedx = x_data.sort_values('VenueCategory')
sortedx['VenueNumber']=sortedx.groupby(['VenueCategory']).ngroup()
sortedx.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,Distance,VenueLatitude,VenueLongitude,VenueCategory,VenueNumber
66,Birmingham,33.52068,-86.81176,Newk's Express Cafe,1910,33.507419,-86.798701,American Restaurant,0
46,Birmingham,33.52068,-86.81176,Galley & Garden,2924,33.501173,-86.790663,American Restaurant,0
50,Birmingham,33.52068,-86.81176,Jack Brown's Beer & Burger Joint,2410,33.511413,-86.78828,American Restaurant,0
22,Birmingham,33.52068,-86.81176,Pies & Pints,1326,33.51117,-86.803148,American Restaurant,0
0,Birmingham,33.52068,-86.81176,Birmingham Museum Of Art,220,33.522311,-86.810409,Art Museum,1


#### Let's get the geographical coordinates of your selected city.

In [26]:
lati = sortedx.Latitude[0]
lng = sortedx.Longitude[0]
print('The geograpical coordinate of {} are {}, {}.'.format(y, lati, lng))

The geograpical coordinate of Birmingham are 33.52068000000003, -86.81175999999994.


#### Creating a map of the selected city with different venues superimposed on top with there unique venue number and colour.

In [36]:
# create map
map_y = folium.Map(location=[lati, lng], zoom_start=11)

# set color scheme for the clusters
xlength = len(x_data['VenueCategory'].unique())
a = np.arange(xlength)
bc = [i + a + (i*a)**2 for i in range(xlength)]
colors_array = iter(cm.rainbow(np.linspace(0, 1, len(bc))))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# drawing radius of search
folium.Circle(location=[lati,lng],radius=radius,color='green',opacity=0.5,fill=True,fill_color='blue').add_to(map_y)

# add markers to the map
markers_colors = []

for lat, lon, cate, nos, city in zip(sortedx.VenueLatitude, sortedx.VenueLongitude, sortedx.VenueCategory, sortedx.VenueNumber, sortedx.Neighborhood):
    label = 'Category:{}, Category No.:{}, City:{}'.format(cate, nos, city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[nos-1],
        fill=True,
        fill_opacity=1,
        fill_color=rainbow[nos-1]).add_to(map_y)
       
map_y

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting the common kind of cities in a particular state.

In first step we have collected the required **data: dataframe containing list of all 50 states and their cities. We have also **identified top venues** (according to Foursquare categorization).

Second step in our analysis will be exploration of '**top venues**' across different cities of the selected state - we will use **folium** for creating maps of all the top venues in the vicinity.

In third and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no more than five similar venues in radius of 9.3 Miles**, and we will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

#### Analyze each neighboring city.

In [18]:
# one hot encoding
y_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
y_onehot['Neighborhoods'] = venues_df['Neighborhood']

# move neighborhood column to the first column
fixed_columns = [y_onehot.columns[-1]] + list(y_onehot.columns[:-1])
y_onehot = y_onehot[fixed_columns]

print(y_onehot.shape)
y_onehot.head()

(2687, 218)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Trail,Bistro,Boat or Ferry,Bookstore,Border Crossing,Botanical Garden,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Buffet,Burger Joint,Bus Station,Business Service,Café,Cajun / Creole Restaurant,Campground,Caribbean Restaurant,Casino,Cave,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Basketball Court,College Football Field,College Gym,College Rec Center,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Country Dance Club,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distillery,Distribution Center,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Electronics Store,Exhibit,Fabric Shop,Farm,Farmers Market,Fast Food Restaurant,Festival,Fish & Chips Shop,Flea Market,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,Go Kart Track,Golf Course,Greek Restaurant,Grocery Store,Gun Range,Gun Shop,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Home Service,Hospital,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Korean Restaurant,Lake,Laser Tag,Latin American Restaurant,Lawyer,Library,Lingerie Store,Liquor Store,Lounge,Market,Medical Center,Medical Supply Store,Mediterranean Restaurant,Mexican Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Museum,Music Venue,National Park,Nature Preserve,Neighborhood,New American Restaurant,Non-Profit,Noodle House,Other Great Outdoors,Outdoor Supply Store,Outlet Store,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Post Office,Pub,Racetrack,Ramen Restaurant,Recording Studio,Rental Car Location,Resort,Rest Area,Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Smoke Shop,Smoothie Shop,Snack Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,State / Provincial Park,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tattoo Parlor,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Truck Stop,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Zoo
0,Alexander City,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Alexander City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Alexander City,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Alexander City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Alexander City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighboring city and by taking the mean of the frequency of occurrence of each category.

In [19]:
y_grouped = y_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(y_grouped.shape)
y_grouped

(39, 218)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport,Airport Terminal,American Restaurant,Antique Shop,Arcade,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Trail,Bistro,Boat or Ferry,Bookstore,Border Crossing,Botanical Garden,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Buffet,Burger Joint,Bus Station,Business Service,Café,Cajun / Creole Restaurant,Campground,Caribbean Restaurant,Casino,Cave,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Basketball Court,College Football Field,College Gym,College Rec Center,Concert Hall,Construction & Landscaping,Convenience Store,Convention Center,Cosmetics Shop,Country Dance Club,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Distillery,Distribution Center,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Electronics Store,Exhibit,Fabric Shop,Farm,Farmers Market,Fast Food Restaurant,Festival,Fish & Chips Shop,Flea Market,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,Go Kart Track,Golf Course,Greek Restaurant,Grocery Store,Gun Range,Gun Shop,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Home Service,Hospital,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Kids Store,Korean Restaurant,Lake,Laser Tag,Latin American Restaurant,Lawyer,Library,Lingerie Store,Liquor Store,Lounge,Market,Medical Center,Medical Supply Store,Mediterranean Restaurant,Mexican Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Museum,Music Venue,National Park,Nature Preserve,Neighborhood,New American Restaurant,Non-Profit,Noodle House,Other Great Outdoors,Outdoor Supply Store,Outlet Store,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Post Office,Pub,Racetrack,Ramen Restaurant,Recording Studio,Rental Car Location,Resort,Rest Area,Restaurant,Rock Club,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Smoke Shop,Smoothie Shop,Snack Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,State / Provincial Park,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tattoo Parlor,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Truck Stop,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Zoo
0,Alexander City,0.027027,0.0,0.0,0.0,0.108108,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.108108,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Andalusia,0.0,0.0,0.0,0.026316,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.131579,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.078947,0.052632,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.078947,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Anniston,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.028571,0.0,0.014286,0.0,0.014286,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.042857,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.028571,0.014286,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.071429,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.042857,0.0,0.0,0.0,0.014286,0.014286,0.028571,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.042857,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.028571,0.0,0.014286,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.0
3,Athens,0.0,0.0,0.0,0.0,0.067797,0.016949,0.0,0.0,0.0,0.0,0.0,0.033898,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.186441,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.016949,0.0,0.067797,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.050847,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.016949,0.0,0.0,0.033898,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.050847,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.050847,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.033898,0.0,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Atmore,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.162162,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Auburn,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.02,0.01,0.05,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.03,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.02,0.01,0.01,0.0,0.01,0.0,0.0,0.04,0.0,0.05,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.02,0.0,0.01,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bessemer,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.08,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0
7,Birmingham,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.01,0.0,0.04,0.0,0.02,0.0,0.04,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.06,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.05,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
8,Chickasaw,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.07,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.01,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0
9,Clanton,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.020833,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.020833,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.020833,0.041667,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.020833,0.0,0.0,0.0625,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Here we can check percentage existance of any particular category of venue in all the cities of the selected state.

In [20]:
len(y_grouped[y_grouped["Seafood Restaurant"] > 0])

34

In [21]:
y_seafood = y_grouped[["Neighborhoods","Seafood Restaurant"]]
y_seafood.head()

Unnamed: 0,Neighborhoods,Seafood Restaurant
0,Alexander City,0.027027
1,Andalusia,0.026316
2,Anniston,0.014286
3,Athens,0.033898
4,Atmore,0.027027


#### Let's print each neighboring city along with the top 5 most common venues.

In [23]:
num_top_venues = 5

for hood in y_grouped['Neighborhoods']:
    print("----"+hood+"----")
    temp = y_grouped[y_grouped['Neighborhoods'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alexander City----
                  venue  freq
0   American Restaurant  0.11
1  Fast Food Restaurant  0.11
2           Gas Station  0.08
3         Grocery Store  0.08
4           Pizza Place  0.05


----Andalusia----
                  venue  freq
0        Discount Store  0.13
1  Fast Food Restaurant  0.11
2              Pharmacy  0.08
3        Sandwich Place  0.08
4   American Restaurant  0.05


----Anniston----
                  venue  freq
0  Fast Food Restaurant  0.07
1        Discount Store  0.06
2              Pharmacy  0.04
3          Burger Joint  0.04
4           Gas Station  0.04


----Athens----
                  venue  freq
0        Discount Store  0.19
1  Fast Food Restaurant  0.07
2   American Restaurant  0.07
3    Mexican Restaurant  0.05
4         Grocery Store  0.05


----Atmore----
                  venue  freq
0  Fast Food Restaurant  0.16
1         Grocery Store  0.08
2                 Hotel  0.05
3   Fried Chicken Joint  0.05
4          Intersection  0.05


--

### Let's put that into a _pandas_ dataframe.

#### First, let's write a function to sort the venues in descending order.

In [24]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [25]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhoods']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhoods'] = y_grouped['Neighborhoods']

for ind in np.arange(y_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(y_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alexander City,American Restaurant,Fast Food Restaurant,Gas Station,Grocery Store,Pizza Place,Discount Store,Fried Chicken Joint,ATM,Campground,Mexican Restaurant
1,Andalusia,Discount Store,Fast Food Restaurant,Pharmacy,Sandwich Place,American Restaurant,Pizza Place,Fried Chicken Joint,Japanese Restaurant,Construction & Landscaping,Post Office
2,Anniston,Fast Food Restaurant,Discount Store,Pharmacy,Burger Joint,Gas Station,Hotel,Grocery Store,Department Store,Baseball Field,Sandwich Place
3,Athens,Discount Store,Fast Food Restaurant,American Restaurant,Mexican Restaurant,Grocery Store,Pharmacy,BBQ Joint,Seafood Restaurant,Sandwich Place,Pizza Place
4,Atmore,Fast Food Restaurant,Grocery Store,Hotel,Fried Chicken Joint,Intersection,American Restaurant,Discount Store,Sandwich Place,Buffet,Breakfast Spot


## 4. Cluster Neighborhoods

#### Run _k_-means to cluster the neighboring cities into 5 clusters according to their similarities in venues.

In [26]:
# set number of clusters
kclusters = 5

y_grouped_clustering = y_grouped.drop('Neighborhoods', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(y_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 1, 1, 1, 2, 4, 4, 4, 4, 1])

#### Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighboring city.

In [27]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

y_merged = state_df

# merge y_grouped with state_df to add latitude/longitude for each neighborhood
y_merged = y_merged.join(neighborhoods_venues_sorted.set_index('Neighborhoods'), on='Neighborhood')

y_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alexander City,32.93884,-85.95295,2,American Restaurant,Fast Food Restaurant,Gas Station,Grocery Store,Pizza Place,Discount Store,Fried Chicken Joint,ATM,Campground,Mexican Restaurant
1,Andalusia,31.32014,-86.49449,1,Discount Store,Fast Food Restaurant,Pharmacy,Sandwich Place,American Restaurant,Pizza Place,Fried Chicken Joint,Japanese Restaurant,Construction & Landscaping,Post Office
2,Anniston,33.65712,-85.81891,1,Fast Food Restaurant,Discount Store,Pharmacy,Burger Joint,Gas Station,Hotel,Grocery Store,Department Store,Baseball Field,Sandwich Place
3,Athens,34.8045,-86.97128,1,Discount Store,Fast Food Restaurant,American Restaurant,Mexican Restaurant,Grocery Store,Pharmacy,BBQ Joint,Seafood Restaurant,Sandwich Place,Pizza Place
4,Atmore,31.02526,-87.4938,2,Fast Food Restaurant,Grocery Store,Hotel,Fried Chicken Joint,Intersection,American Restaurant,Discount Store,Sandwich Place,Buffet,Breakfast Spot


## Results and Discussion <a name="results"></a>

#### Let's visualize the clusters.

In [28]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=7)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(y_merged['Latitude'], y_merged['Longitude'], y_merged['Neighborhood'], y_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

#### Now, we can examine each cluster & determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster like cities with top common venues containing restaurants.

#### Cluster 1

In [29]:
y_merged.loc[y_merged['Cluster Labels'] == 0, y_merged.columns[[0] + list(range(4, y_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Demopolis,Fast Food Restaurant,Hotel,Pizza Place,Mexican Restaurant,Gas Station,Sandwich Place,Fried Chicken Joint,Convenience Store,Pharmacy,BBQ Joint
31,Selma,Fried Chicken Joint,BBQ Joint,Hotel,Pizza Place,Pharmacy,Sandwich Place,Fast Food Restaurant,Gas Station,Grocery Store,Department Store


#### Cluster 2

In [30]:
y_merged.loc[y_merged['Cluster Labels'] == 1, y_merged.columns[[0] + list(range(4, y_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Andalusia,Discount Store,Fast Food Restaurant,Pharmacy,Sandwich Place,American Restaurant,Pizza Place,Fried Chicken Joint,Japanese Restaurant,Construction & Landscaping,Post Office
2,Anniston,Fast Food Restaurant,Discount Store,Pharmacy,Burger Joint,Gas Station,Hotel,Grocery Store,Department Store,Baseball Field,Sandwich Place
3,Athens,Discount Store,Fast Food Restaurant,American Restaurant,Mexican Restaurant,Grocery Store,Pharmacy,BBQ Joint,Seafood Restaurant,Sandwich Place,Pizza Place
9,Clanton,Discount Store,Fast Food Restaurant,Gas Station,Sandwich Place,Burger Joint,Hotel,Intersection,Grocery Store,Fried Chicken Joint,Pizza Place
10,Cullman,Fast Food Restaurant,Mexican Restaurant,Burger Joint,Sandwich Place,American Restaurant,Discount Store,BBQ Joint,Pizza Place,Fried Chicken Joint,Department Store
14,Enterprise,Discount Store,Sandwich Place,Pizza Place,Fast Food Restaurant,Coffee Shop,Hotel,Bar,Gym,Grocery Store,Department Store
15,Eufaula,Discount Store,Fast Food Restaurant,Convenience Store,Sandwich Place,Fried Chicken Joint,Mexican Restaurant,Gas Station,BBQ Joint,Grocery Store,Restaurant
16,Florence,Discount Store,Grocery Store,Steakhouse,Burger Joint,Mexican Restaurant,Coffee Shop,Museum,Fast Food Restaurant,Pizza Place,Fried Chicken Joint
17,Fort Payne,Fast Food Restaurant,Discount Store,Pizza Place,Gas Station,Convenience Store,Mexican Restaurant,Sandwich Place,Seafood Restaurant,Scenic Lookout,Restaurant
18,Gadsden,Fast Food Restaurant,Mexican Restaurant,Discount Store,Grocery Store,Seafood Restaurant,Breakfast Spot,American Restaurant,Fried Chicken Joint,Ice Cream Shop,Bakery


#### Cluster 3

In [31]:
y_merged.loc[y_merged['Cluster Labels'] == 2, y_merged.columns[[0] + list(range(4, y_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alexander City,American Restaurant,Fast Food Restaurant,Gas Station,Grocery Store,Pizza Place,Discount Store,Fried Chicken Joint,ATM,Campground,Mexican Restaurant
4,Atmore,Fast Food Restaurant,Grocery Store,Hotel,Fried Chicken Joint,Intersection,American Restaurant,Discount Store,Sandwich Place,Buffet,Breakfast Spot
19,Greenville,Fast Food Restaurant,Discount Store,American Restaurant,Gas Station,Sandwich Place,Rest Area,Pharmacy,Hotel,Grocery Store,Fried Chicken Joint
27,Ozark,Fast Food Restaurant,Gas Station,Fried Chicken Joint,Hotel,Airport,Pizza Place,American Restaurant,Food & Drink Shop,Big Box Store,Mexican Restaurant
38,Tuskegee,Fast Food Restaurant,Rest Area,Grocery Store,Historic Site,Park,Museum,Forest,American Restaurant,Pharmacy,Fried Chicken Joint


#### Cluster 4

In [32]:
y_merged.loc[y_merged['Cluster Labels'] == 3, y_merged.columns[[0] + list(range(4, y_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Marion,Discount Store,Hotel,Fast Food Restaurant,Sandwich Place,Bus Station,Outdoor Supply Store,Museum,Music Venue,National Park,Nature Preserve


#### Cluster 5

In [33]:
y_merged.loc[y_merged['Cluster Labels'] == 4, y_merged.columns[[0] + list(range(4, y_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Auburn,American Restaurant,Grocery Store,BBQ Joint,Sandwich Place,Golf Course,Pizza Place,Mexican Restaurant,Burger Joint,Italian Restaurant,Coffee Shop
6,Bessemer,Grocery Store,Fast Food Restaurant,Ice Cream Shop,Gym,Fried Chicken Joint,American Restaurant,Mexican Restaurant,Burger Joint,Coffee Shop,Video Game Store
7,Birmingham,Brewery,Coffee Shop,Hotel,Mediterranean Restaurant,American Restaurant,BBQ Joint,Bar,Gym,Park,Gastropub
8,Chickasaw,Grocery Store,Coffee Shop,Mexican Restaurant,Seafood Restaurant,Sandwich Place,Southern / Soul Food Restaurant,Italian Restaurant,American Restaurant,Café,Fast Food Restaurant
11,Decatur,BBQ Joint,Steakhouse,Italian Restaurant,Sandwich Place,Grocery Store,Mexican Restaurant,Discount Store,Pharmacy,Park,Breakfast Spot
13,Dothan,American Restaurant,Fast Food Restaurant,Mexican Restaurant,Grocery Store,Pizza Place,Breakfast Spot,BBQ Joint,Hotel,Department Store,Steakhouse
21,Huntsville,Coffee Shop,Mexican Restaurant,Grocery Store,BBQ Joint,Brewery,American Restaurant,Restaurant,Science Museum,Fast Food Restaurant,Steakhouse
24,Mobile,Seafood Restaurant,Coffee Shop,Southern / Soul Food Restaurant,Grocery Store,Museum,Café,Gym / Fitness Center,BBQ Joint,Hotel,Italian Restaurant
25,Montgomery,American Restaurant,Grocery Store,Pizza Place,Coffee Shop,Fast Food Restaurant,Bar,Burger Joint,Bakery,Park,Seafood Restaurant
26,Opelika,American Restaurant,Sandwich Place,Pizza Place,Pharmacy,BBQ Joint,Grocery Store,Mexican Restaurant,Italian Restaurant,Burger Joint,Restaurant


Our analysis shows that although there is a great number of restaurants in Berlin (~2000 in our initial area of interest which was 12x12km around Alexanderplatz), there are pockets of low restaurant density fairly close to city center. Highest concentration of restaurants was detected north and west from Alexanderplatz, so we focused our attention to areas south, south-east and east, corresponding to boroughs Kreuzberg, Friedrichshain and south-east corner of central Mitte borough. Another borough was identified as potentially interesting (Prenzlauer Berg, north-east from Alexanderplatz), but our attention was focused on Kreuzberg and Friedrichshain which offer a combination of popularity among tourists, closeness to city center, strong socio-economic dynamics *and* a number of pockets of low restaurant density.

After directing our attention to this more narrow area of interest (covering approx. 5x5km south-east from Alexanderplatz) we first created a dense grid of location candidates (spaced 100m appart); those locations were then filtered so that those with more than two restaurants in radius of 250m and those with an Italian restaurant closer than 400m were removed.

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Result of all this is 15 zones containing largest number of potential new restaurant locations based on number of and distance to existing venues - both restaurants in general and Italian restaurants particularly. This, of course, does not imply that those zones are actually optimal locations for a new restaurant! Purpose of this analysis was to only provide info on areas close to Berlin center but not crowded with existing restaurants (particularly Italian) - it is entirely possible that there is a very good reason for small number of restaurants in any of those areas, reasons which would make them unsuitable for a new restaurant regardless of lack of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Berlin areas close to center with low number of restaurants (particularly Italian restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Italian restaurant. By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis (Kreuzberg and Friedrichshain), and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.