# Comparison of Rome and Oslo

## Introduction
#### Background
My idea is to compare the capitals of two different countries in Europe, that is Italy and Norway. Since the climate and temperature are different, I wonder what kind of venues or kind of business can you find in Oslo and Rome. Do people living in western countries have different ways of spending time than in the colder North? What kind of business is the most profitable in these places? We will be able to see how much residents' needs differ depending on the location.

#### Target Audience
This project may be interesting for those, who have money and an idea of some kind of business, but are not sure, where should they start it.

## Data
A list of Oslo's and Rome's districts will be scrapped with BeautifulSoup library from wikipedia website. After that I'm going to use Geocoder to look for the coordinates and Foursquare API to get more details about the venues. So in the end the data will consist of such details as:
- District's name
- Neighbourhood
- Business name (for example store name)
- Business category
- Longitude
- Latitude

In [1]:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup as soup
import requests

In [2]:
link = "https://en.wikipedia.org/wiki/Administrative_subdivision_of_Rome"
page = requests.get(link)
page = soup(page.content, 'html.parser')

In [63]:
municip = []

tables = page.findAll("table")[1].findAll("tr")[1:]
for tabrow in tables:
    tabrow = tabrow.findAll("td")[0].text.replace("\n", "").split(" – ")
    municipio = tabrow[0].split(" ")[1]
    name = tabrow[1]
    municip.append([municipio, name])
    
municip_df = pd.DataFrame(municip, columns=["Municipio", "District"])
municip_df

Unnamed: 0,Municipio,District
0,I,Historical Center
1,II,Parioli/Nomentano
2,III,Monte Sacro
3,IV,Tiburtina
4,V,Prenestino/Centocelle
5,VI,Roma Delle Torri
6,VII,Appio-Latino/Tuscolano/Cinecittà
7,VIII,Appia Antica
8,IX,EUR
9,X,Ostia/Acilia


In [64]:
subdivisions = page.findAll("ul")[3].findAll("li")

italian_subdivs = []
for subdiv in subdivisions:
    subdiv = subdiv.text.split(": ")
    municipio = subdiv[0].split(" ")[1]
    divs = subdiv[1].split(", ")
    divs = [dv.replace(";", "")[3:] for dv in divs]
    for dv in divs:
        if dv[0] == " ":
            dv = dv[1:]
        if "[it]" in dv:
            dv = dv[:-8]
        italian_subdivs.append([municipio, dv])
italy_df = pd.DataFrame(italian_subdivs, columns =["Municipio", "Neighborhood"])
italy_df

Unnamed: 0,Municipio,Urban zone
0,I,Historic centre
1,I,Trastevere
2,I,Aventino
3,I,Testaccio
4,I,Esquilino
...,...,...
150,XV,Prima Porta
151,XV,Labaro
152,XV,Cesano
153,XV,Martignano


In [65]:
italy_df = pd.merge(municip_df, italy_df, on="Municipio")
italy_df.drop(["Municipio"], axis=1, inplace=True)
italy_df

Unnamed: 0,District,Urban zone
0,Historical Center,Historic centre
1,Historical Center,Trastevere
2,Historical Center,Aventino
3,Historical Center,Testaccio
4,Historical Center,Esquilino
...,...,...
150,Cassia/Flaminia,Prima Porta
151,Cassia/Flaminia,Labaro
152,Cassia/Flaminia,Cesano
153,Cassia/Flaminia,Martignano


In [70]:
link = "https://en.wikipedia.org/wiki/List_of_boroughs_of_Oslo"
page = requests.get(link)
page = soup(page.content, 'html.parser')

In [107]:
def open_link_and_get_districts(link, order):
    page = requests.get(link)
    page = soup(page.content, 'html.parser')
    districts = page.findAll("ul")[order-1].findAll("li")
    districts = [d.text.replace("\n", "") for d in districts]
    return districts

In [168]:
first = ["Alna", "Bjerke", "Gamle Oslo", "Nordstrand", "Søndre Nordstrand", "Vestre Aker"]
second = ["Frogner"]

other = {
    "Sagene":["Sagene", "Bjølsen", "Iladalen (Ila)", "Sandaker", "Åsen", "Torshov"],
     "St. Hanshaugen":["St. Hanshaugen"],
     "Stovner":["Stovner"],
     "Ullern":["Lysejordet", "Øraker", "Lilleaker", "Sollerud", "Vækerø", "Bestum", "Ullern", 
     "Bjørnsletta", "Ullernåsen", "Montebello", "Hoff", "Skøyen"],
     "Østensjø":["Bøler", "Oppsal" , "Manglerud"],
     "Grünerløkka":["Grünerløkka"],
     "Grorud":["Grorud", "Ammerud", "Grorud", "Kalbakken", "Rødtvet", "Nordtvet", "Romsås"],
    "Nordre Aker":["Gaustad", "Øvre Blindern", "Ullevål Hageby", "Sogn", "Kringsjå", "Nordberg", "Korsvoll", "Tåsen", 
    "Ullevål", "Berg", "Nydalen", "Storo", "Frysja", "Disen", "Kjelsås", "Grefsen", "Nordre Åsen"]
     }

In [169]:
wiki_link = "https://en.wikipedia.org/"

oslo_subdivs = []
tab = page.find("table", {"class":"wikitable"}).findAll("tr")[1:]
for i in tab:
    borough = i.find("td").text
    neigh_link = i.find("td").a["href"]
    if borough in first:
        districts = open_link_and_get_districts(wiki_link + neigh_link, 1)
    elif borough in second:
        districts = open_link_and_get_districts(wiki_link + neigh_link, 2)
    if borough in list(other.keys()):
        districts = other[borough]
    for dv in districts:
        oslo_subdivs.append([borough, dv])
oslo_df = pd.DataFrame(oslo_subdivs, columns =["Borough", "Neighborhood"])
oslo_df

Unnamed: 0,Borough,Neighborhood
0,Alna,Alnabru
1,Alna,Ellingsrud
2,Alna,Furuset
3,Alna,Haugerud
4,Alna,Hellerud
...,...,...
91,Vestre Aker,Sørkedalen
92,Vestre Aker,"Smestad, Oslo"
93,Østensjø,Bøler
94,Østensjø,Oppsal


In [170]:
oslo_df["Neighborhood"][oslo_df["Neighborhood"].str.contains("Lofthus")] = "Lofthus"
oslo_df["Neighborhood"][oslo_df["Neighborhood"].str.contains("Iladalen")] = "Iladalen"
oslo_df["Neighborhood"][oslo_df["Neighborhood"].str.contains("Holmenkollen")] = "Holmenkollen"

In [148]:
from opencage.geocoder import OpenCageGeocode
from pprint import pprint

In [149]:
with open("creds.txt", "r") as f:
    creds = f.read()
creds = creds.split("\n")
CLIENT_ID = creds[0]
CLIENT_SECRET = creds[1]
KEY2 = creds[2]

In [150]:
geocoder = OpenCageGeocode(KEY2)

In [171]:
latitudes, longitudes = [], []

for i, row in oslo_df.iterrows():
    neigh = row[-1]
    query = f"{neigh}, Norway"
    results = geocoder.geocode(query)
    lat = results[0]['geometry']['lat']
    lng = results[0]['geometry']['lng']
    latitudes.append(lat)
    longitudes.append(lng)

In [172]:
oslo_df["Latitude"] = latitudes
oslo_df["Longitude"] = longitudes
oslo_df

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Alna,Alnabru,59.926682,10.836498
1,Alna,Ellingsrud,59.934191,10.920897
2,Alna,Furuset,59.941067,10.896399
3,Alna,Haugerud,59.922116,10.854522
4,Alna,Hellerud,59.910067,10.829839
...,...,...,...,...
91,Vestre Aker,Sørkedalen,60.016105,10.612502
92,Vestre Aker,"Smestad, Oslo",59.937212,10.684161
93,Østensjø,Bøler,59.884271,10.845545
94,Østensjø,Oppsal,66.961121,13.985395


In [360]:
to_get_rid_of = ["Ekeberg", "Sørkedalen", "Nordre Åsen", "Klemetsrud", "Sandaker", "Lofthus", "Disen",
                "Nordberg", "Åsen", "Røa", "Iladalen", "Oppsal", "Sogn"]
oslo_df = oslo_df[~oslo_df.Neighborhood.isin(to_get_rid_of)].reset_index(drop=True)

In [361]:
import folium
OSLO_COORDS = [59.91273, 10.74609]
map_oslo = folium.Map(location=OSLO_COORDS,zoom_start=11)

for i, row in oslo_df.iterrows():
    lat = float(row[-2])
    lng = float(row[-1])
    label = row[1]
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_oslo) 
map_oslo

In [362]:
oslo_df['Borough'].value_counts()

Nordre Aker          13
Ullern               12
Gamle Oslo           11
Alna                  8
Grorud                7
Søndre Nordstrand     6
Bjerke                6
Frogner               4
Nordstrand            4
Vestre Aker           4
Sagene                3
Østensjø              2
Stovner               1
St. Hanshaugen        1
Grünerløkka           1
Name: Borough, dtype: int64

In [158]:
latitudes, longitudes = [], []

for i, row in italy_df.iterrows():
    neigh = row[-1]
    query = f"{neigh}, Italy"
    results = geocoder.geocode(query)
    lat = results[0]['geometry']['lat']
    lng = results[0]['geometry']['lng']
    latitudes.append(lat)
    longitudes.append(lng)

In [159]:
italy_df["Latitude"] = latitudes
italy_df["Longitude"] = longitudes
italy_df

Unnamed: 0,District,Urban zone,Latitude,Longitude
0,Historical Center,Historic centre,45.671719,12.537325
1,Historical Center,Trastevere,41.883765,12.471270
2,Historical Center,Aventino,41.882825,12.486819
3,Historical Center,Testaccio,42.721313,12.730305
4,Historical Center,Esquilino,41.898044,12.498863
...,...,...,...,...
150,Cassia/Flaminia,Prima Porta,42.001975,12.485970
151,Cassia/Flaminia,Labaro,41.990223,12.489245
152,Cassia/Flaminia,Cesano,42.077635,12.341911
153,Cassia/Flaminia,Martignano,40.237425,18.255227


In [354]:
italy_df = italy_df[italy_df.Latitude < 42.721312].reset_index(drop=True)
italy_df = italy_df[italy_df.Latitude > 41.640312].reset_index(drop=True)
italy_df = italy_df[italy_df.Longitude < 13.155438].reset_index(drop=True)
italy_df = italy_df[italy_df.Longitude > 11.438524].reset_index(drop=True)
italy_df

Unnamed: 0,District,Urban zone,Latitude,Longitude
0,Historical Center,Trastevere,41.883765,12.471270
1,Historical Center,Aventino,41.882825,12.486819
2,Historical Center,Esquilino,41.898044,12.498863
3,Historical Center,XX Settembre,41.906151,12.496895
4,Parioli/Nomentano,Villaggio Olimpico,41.932966,12.474157
...,...,...,...,...
98,Cassia/Flaminia,Santa Cornelia,42.029061,12.451673
99,Cassia/Flaminia,Prima Porta,42.001975,12.485970
100,Cassia/Flaminia,Labaro,41.990223,12.489245
101,Cassia/Flaminia,Cesano,42.077635,12.341911


In [355]:
ROME_COORDS = [41.902782, 12.496366]
map_rome = folium.Map(location=ROME_COORDS,zoom_start=10)

for i, row in italy_df.iterrows():
    lat = float(row[-2])
    lng = float(row[-1])
    label = row[1]
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#ff8282',
        fill_opacity=0.7,
        parse_html=False).add_to(map_rome) 
map_rome

In [356]:
italy_df['District'].value_counts()

Prenestino/Centocelle               11
Monte Sacro                         10
Cassia/Flaminia                     10
Appio-Latino/Tuscolano/Cinecittà     9
Ostia/Acilia                         8
Tiburtina                            8
Roma Delle Torri                     8
Arvalia/Portuense                    7
Parioli/Nomentano                    7
Appia Antica                         6
EUR                                  6
Historical Center                    4
Aurelia                              3
Monte Mario                          3
Monte Verde                          3
Name: District, dtype: int64

## Foursquare API

In [363]:
LIMIT = 80
radius = 500
VERSION = '20180605'

In [364]:
def get_close_venues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):

        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [365]:
oslo_data_venues = get_close_venues(names=oslo_df['Neighborhood'],
                                   latitudes=oslo_df['Latitude'],
                                   longitudes=oslo_df['Longitude']
                                  )

In [366]:
oslo_data_venues.shape

(949, 7)

In [367]:
oslo_data_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alnabru,2,2,2,2,2,2
Ammerud,6,6,6,6,6,6
Bekkelaget,4,4,4,4,4,4
Berg,7,7,7,7,7,7
Bestum,4,4,4,4,4,4
...,...,...,...,...,...,...
Vækerø,4,4,4,4,4,4
Årvoll,13,13,13,13,13,13
Økern,10,10,10,10,10,10
Øraker,6,6,6,6,6,6


In [412]:
def df_to_oneshot(df_data_venues):
    df_onehot = pd.get_dummies(df_data_venues[['Venue Category']], prefix="", prefix_sep="")

    df_onehot['Neighborhood'] = df_data_venues['Neighborhood'] 

    fixed_columns = [df_onehot.columns[-1]] + list(df_onehot.columns[:-1])
    df_onehot = df_onehot[fixed_columns]

    return df_onehot

In [413]:
oslo_onehot = df_to_oneshot(oslo_data_venues)
oslo_onehot.head()

Unnamed: 0,Neighborhood,Advertising Agency,Amphitheater,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Auto Workshop,Bakery,Bar,...,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Warehouse Store,Water Park,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Alnabru,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Alnabru,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Ellingsrud,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ellingsrud,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ellingsrud,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [414]:
oslo_onehot.shape

(949, 162)

In [427]:
def places_by_frequency(df_oneshot, num_top_venues = 5):
    df_grouped = df_oneshot.groupby('Neighborhood').mean().reset_index()
    
    for hood in df_grouped['Neighborhood']:
        print("----"+hood+"----")
        temp = df_grouped[df_grouped['Neighborhood'] == hood].T.reset_index()
        temp.columns = ['venue','freq']
        temp = temp.iloc[1:]
        temp['freq'] = temp['freq'].astype(float)
        temp = temp.round({'freq': 2})
        print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
        print('\n')
    return df_grouped

In [428]:
oslo_grouped = places_by_frequency(oslo_onehot)

----Alnabru----
                           venue  freq
0  Paper / Office Supplies Store   0.5
1                    Bus Station   0.5
2           Other Great Outdoors   0.0
3                           Park   0.0
4               Pedestrian Plaza   0.0


----Ammerud----
                venue  freq
0       Metro Station  0.33
1         Supermarket  0.17
2        Soccer Field  0.17
3  Athletics & Sports  0.17
4       Grocery Store  0.17


----Bekkelaget----
                venue  freq
0  Light Rail Station  0.25
1         Gas Station  0.25
2              Bakery  0.25
3         Pizza Place  0.25
4  Advertising Agency  0.00


----Berg----
           venue  freq
0    Bus Station  0.29
1  Grocery Store  0.29
2           Café  0.14
3  Shopping Mall  0.14
4  Metro Station  0.14


----Bestum----
                venue  freq
0  Light Rail Station  0.50
1        Burger Joint  0.25
2         Bus Station  0.25
3  Advertising Agency  0.00
4               Plaza  0.00


----Bjølsen----
            venue  

                    venue  freq
0  Furniture / Home Store  0.23
1                   Hotel  0.15
2           Metro Station  0.08
3             Bus Station  0.08
4                    Café  0.08


----Tryvann----
                venue  freq
0      Scenic Lookout   0.2
1  Athletics & Sports   0.2
2              Bakery   0.2
3         Pizza Place   0.2
4            Ski Area   0.2


----Tveita----
                  venue  freq
0  Gym / Fitness Center  0.17
1     Convenience Store  0.17
2           Supermarket  0.17
3             Bookstore  0.17
4      Video Game Store  0.17


----Tåsen----
           venue  freq
0  Grocery Store  0.29
1           Park  0.14
2    Bus Station  0.14
3           Café  0.14
4  Shopping Mall  0.14


----Tøyen----
            venue  freq
0             Bar  0.16
1          Bakery  0.12
2     Coffee Shop  0.08
3            Park  0.08
4  Science Museum  0.04


----Ullern----
                venue  freq
0  Light Rail Station  0.50
1         Flower Shop  0.25
2         

In [430]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [431]:
def return_dataframe_most_common_venues(df_grouped, num_top_venues = 6):
    indicators = ['st', 'nd', 'rd']
    columns = ['Neighborhood']
    for ind in np.arange(num_top_venues):
        try:
            columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
        except:
            columns.append('{}th Most Common Venue'.format(ind+1))

    df_venues_sorted = pd.DataFrame(columns=columns)
    df_venues_sorted['Neighborhood'] = df_grouped['Neighborhood']

    for ind in np.arange(df_grouped.shape[0]):
        df_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_grouped.iloc[ind, :], num_top_venues)

    return df_venues_sorted

In [432]:
oslo_venues_sorted = return_dataframe_most_common_venues(oslo_grouped)
oslo_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Alnabru,Paper / Office Supplies Store,Bus Station,Yoga Studio,Food Truck,Flower Shop,Fish Market
1,Ammerud,Metro Station,Soccer Field,Supermarket,Athletics & Sports,Grocery Store,Electronics Store
2,Bekkelaget,Pizza Place,Light Rail Station,Bakery,Gas Station,Yoga Studio,Flower Shop
3,Berg,Bus Station,Grocery Store,Shopping Mall,Café,Metro Station,Yoga Studio
4,Bestum,Light Rail Station,Burger Joint,Bus Station,Yoga Studio,Electronics Store,Food
...,...,...,...,...,...,...,...
77,Vækerø,Burger Joint,Bus Station,Park,Beach,Yoga Studio,Farm
78,Årvoll,Supermarket,Convenience Store,Grocery Store,Racetrack,Farm,Bus Station
79,Økern,IT Services,Metro Station,Pizza Place,Park,Bus Stop,Bus Station
80,Øraker,Grocery Store,Hockey Field,Athletics & Sports,Light Rail Station,Soccer Field,Metro Station


## Clustering Oslo

In [395]:
from sklearn.cluster import KMeans

kclusters = 8
oslo_grouped_clustering = oslo_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(oslo_grouped_clustering)
oslo_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

oslo_merged = oslo_df

oslo_merged = oslo_merged.join(oslo_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

oslo_merged.head(13)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Alna,Alnabru,59.926682,10.836498,0,Paper / Office Supplies Store,Bus Station,Yoga Studio,Food Truck,Flower Shop,Fish Market
1,Alna,Ellingsrud,59.934191,10.920897,1,Warehouse Store,Gym / Fitness Center,Grocery Store,Metro Station,Bed & Breakfast,Yoga Studio
2,Alna,Furuset,59.941067,10.896399,1,Shopping Mall,Hockey Arena,Park,Supermarket,Furniture / Home Store,Metro Station
3,Alna,Haugerud,59.922116,10.854522,1,Women's Store,Soccer Field,Gym,Shopping Mall,Yoga Studio,Donut Shop
4,Alna,Hellerud,59.910067,10.829839,7,Metro Station,Brewery,Moving Target,Furniture / Home Store,Yoga Studio,Food Truck
5,Alna,Lindeberg,59.933155,10.882706,2,Grocery Store,Furniture / Home Store,Motorcycle Shop,Metro Station,Yoga Studio,Electronics Store
6,Alna,Trosterud,59.927182,10.865258,5,Furniture / Home Store,Hotel,Convenience Store,Fast Food Restaurant,Bus Station,Pedestrian Plaza
7,Alna,Tveita,59.914031,10.842241,1,Convenience Store,Video Game Store,Supermarket,Grocery Store,Gym / Fitness Center,Bookstore
8,Bjerke,Linderud,59.940963,10.83842,5,Grocery Store,Wine Shop,Fast Food Restaurant,Gym / Fitness Center,Bakery,Café
9,Bjerke,Tonsenhagen,59.947696,10.827078,5,Trail,Bus Station,Café,Grocery Store,Dumpling Restaurant,Flower Shop


In [396]:
for i in range(1, num_top_venues+1):
    if i == 1:
        label = "1st Most Common Venue"
    elif i == 2:
        label = "2nd Most Common Venue"
    elif i == 3:
        label = "3rd Most Common Venue"
    else:
        label = f"{i}th Most Common Venue"
    print(label.upper())
    print(oslo_merged[label].value_counts()[:5])
    print("\n")

1ST MOST COMMON VENUE
Grocery Store        19
Bus Station           5
Convenience Store     4
Metro Station         4
Shopping Mall         3
Name: 1st Most Common Venue, dtype: int64


2ND MOST COMMON VENUE
Grocery Store        7
Bus Station          6
Convenience Store    5
Soccer Field         5
Metro Station        5
Name: 2nd Most Common Venue, dtype: int64


3RD MOST COMMON VENUE
Bus Station      7
Café             5
Park             5
Bakery           4
Grocery Store    3
Name: 3rd Most Common Venue, dtype: int64


4TH MOST COMMON VENUE
Grocery Store    6
Yoga Studio      5
Café             4
Hotel            4
Bus Station      4
Name: 4th Most Common Venue, dtype: int64


5TH MOST COMMON VENUE
Yoga Studio             8
Dumpling Restaurant     6
Flower Shop             5
Metro Station           5
Gym / Fitness Center    4
Name: 5th Most Common Venue, dtype: int64


6TH MOST COMMON VENUE
Fish Market      8
Flower Shop      7
Food             5
Donut Shop       4
Grocery Store    

In [397]:
import matplotlib.cm as cm
import matplotlib.colors as colors

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

map_oslo = folium.Map(location=OSLO_COORDS, zoom_start=10)

for i, row in oslo_merged.iterrows():
    lat = row["Latitude"]
    poi = row["Neighborhood"]
    lon = row["Longitude"]
    try:
        cluster = int(row["Cluster Labels"])
    except:
        continue
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_oslo)
map_oslo

In [406]:
Cluster1=oslo_merged.loc[oslo_merged['Cluster Labels'] ==  7, oslo_merged.columns[[1] + list(range(5, oslo_merged.shape[1]))]]
Cluster1

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
4,Hellerud,Metro Station,Brewery,Moving Target,Furniture / Home Store,Yoga Studio,Food Truck
38,Øvre Blindern,Breakfast Spot,Bookstore,College Cafeteria,Plaza,Gym,Bar
77,Holmenkollen,Restaurant,Metro Station,Scandinavian Restaurant,Ski Area,Hotel,Health & Beauty Service


# Clustering Rome

In [409]:
italy_df.columns = ["Borough", "Neighborhood", "Latitude", "Longitude"]
rome_data_venues = get_close_venues(names=italy_df['Neighborhood'],
                                   latitudes=italy_df['Latitude'],
                                   longitudes=italy_df['Longitude']
                                  )

In [410]:
rome_data_venues.shape

(1348, 7)

In [411]:
rome_data_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Acilia Nord,5,5,5,5,5,5
Acilia Sud,12,12,12,12,12,12
Acqua Vergine,80,80,80,80,80,80
Alessandrina,11,11,11,11,11,11
Appio-Claudio,24,24,24,24,24,24
...,...,...,...,...,...,...
Villa Ada,4,4,4,4,4,4
Villa Borghese,16,16,16,16,16,16
Villa Pamphili,5,5,5,5,5,5
Villaggio Olimpico,13,13,13,13,13,13


In [433]:
rome_onehot = df_to_oneshot(rome_data_venues)
rome_onehot.head()

Unnamed: 0,Neighborhood,Abruzzo Restaurant,Accessories Store,Airport Service,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Volleyball Court,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio,Zoo
0,Trastevere,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Trastevere,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Trastevere,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Trastevere,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Trastevere,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [435]:
rome_onehot.shape

(1348, 191)

In [436]:
rome_grouped = places_by_frequency(rome_onehot)

----Acilia Nord----
                venue  freq
0        Home Service   0.2
1    Sushi Restaurant   0.2
2                Café   0.2
3  Chinese Restaurant   0.2
4              Bakery   0.2


----Acilia Sud----
            venue  freq
0           Plaza  0.17
1            Café  0.17
2  Ice Cream Shop  0.08
3     Bus Station  0.08
4  Clothing Store  0.08


----Acqua Vergine----
                venue  freq
0  Italian Restaurant  0.21
1      Ice Cream Shop  0.09
2               Plaza  0.08
3               Hotel  0.06
4              Bistro  0.05


----Alessandrina----
                 venue  freq
0                Plaza  0.18
1               Bakery  0.18
2    Indian Restaurant  0.09
3  Fried Chicken Joint  0.09
4    Trattoria/Osteria  0.09


----Appio-Claudio----
            venue  freq
0     Pizza Place  0.17
1            Café  0.17
2     Supermarket  0.08
3             Pub  0.08
4  Clothing Store  0.08


----Aventino----
                venue  freq
0  Italian Restaurant  0.17
1              

                venue  freq
0        Dessert Shop  0.12
1  Italian Restaurant  0.12
2                Café  0.12
3         Supermarket  0.12
4    Sushi Restaurant  0.12


----Pietralata----
           venue  freq
0  Shopping Mall  0.17
1    Pizza Place  0.17
2    Supermarket  0.17
3            Gym  0.17
4  Metro Station  0.17


----Ponte Galeria----
                venue  freq
0   Trattoria/Osteria   0.2
1     Bed & Breakfast   0.2
2  Italian Restaurant   0.2
3           Racetrack   0.2
4       Train Station   0.2


----Portuense----
                venue  freq
0         Pizza Place  0.14
1  Italian Restaurant  0.11
2                Café  0.11
3  Seafood Restaurant  0.07
4  Light Rail Station  0.04


----Prima Porta----
                    venue  freq
0      Light Rail Station  0.25
1  Furniture / Home Store  0.25
2       Electronics Store  0.25
3    Gym / Fitness Center  0.25
4      Abruzzo Restaurant  0.00


----Primavalle----
                    venue  freq
0  Argentinian Restaurant 

In [437]:
rome_venues_sorted = return_dataframe_most_common_venues(rome_grouped)
rome_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Acilia Nord,Sushi Restaurant,Café,Home Service,Chinese Restaurant,Bakery,Zoo
1,Acilia Sud,Plaza,Café,Clothing Store,Ice Cream Shop,Steakhouse,Bus Station
2,Acqua Vergine,Italian Restaurant,Ice Cream Shop,Plaza,Hotel,Art Museum,Café
3,Alessandrina,Bakery,Plaza,Light Rail Station,Fried Chicken Joint,Fast Food Restaurant,Trattoria/Osteria
4,Appio-Claudio,Café,Pizza Place,Pub,Ice Cream Shop,Supermarket,Clothing Store
...,...,...,...,...,...,...,...
93,Villa Ada,Music Venue,Park,Lake,Beer Garden,Zoo,Filipino Restaurant
94,Villa Borghese,Movie Theater,Zoo,Historic Site,Dog Run,Plaza,Snack Place
95,Villa Pamphili,Paper / Office Supplies Store,Bookstore,Pool Hall,Hotel,Gym Pool,Zoo
96,Villaggio Olimpico,Concert Hall,Music Venue,Nightclub,Basketball Stadium,Sandwich Place,Auditorium


In [438]:
# italy_df.to_csv("csv/rome.csv")
# oslo_df.to_csv("csv/oslo.csv")