# **The Battle Neighbourhoods - Eastern European Version**


## Introduction

Budapest, Hungary is one of the biggest start-up hubs of the CEE region, and numerous multinational  companies have offices/ factories located here as well. As a result, the number of expats in the city is growing, and is expected to grow further if the economy does well. Being a local of Budapest, I have been witnessing the globalization of my home city for quite a few years now, and my project intends to offer some insight to those who would consider moving here. 

## Business Problem

When a person moves into a new place there are a number of factors to be considered before they can comfortably agree on a place to settle. My project will explore 22 of Budapest's districts from two main perspectives - rental prices and things to do. 

The overall purpose is to provide an outline of the city map - so it is easier to consider the advantages and disadvantages of certain areas based on the cost of renting and additional opportunities for leisure activities. It is probably most useful for those who already know where they would be working as that location would come in as an extra factor. Nevertheless, I hope my analysis will provide some valuable guidance for anyone interested. 

So in a nutshell, the goal of this project is to check what the average rent prices are per district in Budapest Hungary, and then to visualize the leisure opportunities within these areas with the help of the Foursquare API.

# Data

In order to conduct my analysis I will use three main sources of data - Wikipedia, [a Hungarian website with real-eastate statistics](https://koltozzbe.hu/statisztikak/budapest/?listing_type=2&property_type=1001&rooms=&metric=price), and the Foursquare API. 

To get the average monthly rental price per district, I am going to scrape the database from a real estate agency's website.  The first thing I need to do is to read the table into a dataframe, and remove the parts that I will not need. Budapest has 23 districts, however, given that there was no data available concerning district 23, I will not include this area in my analysis.  In order to make the mapping part easier, I will also scrape the Wikipedia page of Budapest for the relevant postal codes, and I am going to add that to the dataframe.

The second part of data will be obtained by the use of Geocoder - to get the coordinates of each district, and I will utilize the Foursquare API to explore each neighbourhood.

But let's dive into it and see. 

## Methodology

## Step one
In order to start asembling my database, I will first visit a real estate website and see if there are any useful tables I could scrape with beautiful soup.

In [1]:
#First things first, we will need a few libraries here. 
import requests
from bs4 import BeautifulSoup

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json
from pandas.io.json import json_normalize

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import numpy as np # library to handle data in a vectorized manner

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [2]:
#Let's assign the url to a variable and create a soup object then have a look. 
url = requests.get("https://koltozzbe.hu/statisztikak/budapest/?listing_type=2&property_type=1001&rooms=&metric=price").text
soup = BeautifulSoup(url,"lxml")
my_table = soup.find("tbody")

my_table

<tbody>
<tr class="root">
<th>Budapesti átlag</th>
<td class="ar">3 478 Ft</td>
<td class="ar">232 ezer Ft / hó</td>
<td class="ar">69,25 m²</td>
<td class="ar">2 429</td>
</tr>
<tr>
<th><a href="/statisztikak/budapest-i-kerulet/?listing_type=2&amp;property_type=1001&amp;rooms=&amp;metric=price">I. kerület</a></th>
<td class="ar">3 948 Ft</td>
<td class="ar">321 ezer Ft / hó</td>
<td class="ar">81,75 m²</td>
<td class="ar">118</td>
</tr>
<tr>
<th><a href="/statisztikak/budapest-ii-kerulet/?listing_type=2&amp;property_type=1001&amp;rooms=&amp;metric=price">II. kerület</a></th>
<td class="ar">3 624 Ft</td>
<td class="ar">353 ezer Ft / hó</td>
<td class="ar">93,25 m²</td>
<td class="ar">268</td>
</tr>
<tr>
<th><a href="/statisztikak/budapest-iii-kerulet/?listing_type=2&amp;property_type=1001&amp;rooms=&amp;metric=price">III. kerület</a></th>
<td class="ar">2 788 Ft</td>
<td class="ar">166 ezer Ft / hó</td>
<td class="ar">55,50 m²</td>
<td class="ar">73</td>
</tr>
<tr>
<th><a href="/statis

In [3]:
#the names of the districts are contained within the th tags, let's fetch these ones first. 
district_name = []
for district in my_table.find_all("th"):
    district_name.append(district.text) 

In [4]:
#The districts are represented by Roman numbers so unfortunately they are put in alphabetical order by default. Let's fix this. 
myorder = [0,1,2,3,4,6,7,8,9,5,10,11,12,13,14,16,17,18,19,15,20,21,22,23]
district_name = [district_name[i] for i in myorder]
print(district_name)

['Budapesti átlag', 'I. kerület', 'II. kerület', 'III. kerület', 'IV. kerület', 'V. kerület', 'VI. kerület', 'VII. kerület', 'VIII. kerület', 'IX. kerület', 'X. kerület', 'XI. kerület', 'XII. kerület', 'XIII. kerület', 'XIV. kerület', 'XV. kerület', 'XVI. kerület', 'XVII. kerület', 'XVIII. kerület', 'XIX. kerület', 'XX. kerület', 'XXI. kerület', 'XXII. kerület', 'XXIII. kerület']


In [5]:
#In order to extract the necessary information, I need to create empty list objects than can be appended.
rows_content = []
items_pc = []

In [6]:
#let's find the rows, and within them the td tags, this way we should have all the data scraped from the table as one big list item
for rows in my_table.find_all("tr"):
    rows_content.append(rows.text)
    for items in rows.find_all("td"):
        items_pc.append(items.text)

In [7]:
#let's check what it looks like
print(items_pc)

['3 478 Ft', '232 ezer Ft / hó', '69,25 m²', '2 429', '3 948 Ft', '321 ezer Ft / hó', '81,75 m²', '118', '3 624 Ft', '353 ezer Ft / hó', '93,25 m²', '268', '2 788 Ft', '166 ezer Ft / hó', '55,50 m²', '73', '2 663 Ft', '100 ezer Ft / hó', '36,50 m²', '33', '3 336 Ft', '198 ezer Ft / hó', '63,25 m²', '122', '4 477 Ft', '322 ezer Ft / hó', '79,50 m²', '494', '3 953 Ft', '267 ezer Ft / hó', '75,25 m²', '332', '3 357 Ft', '198 ezer Ft / hó', '62,25 m²', '161', '3 232 Ft', '160 ezer Ft / hó', '52 m²', '137', '2 429 Ft', '88 ezer Ft / hó', '35,50 m²', '36', '3 078 Ft', '200 ezer Ft / hó', '62 m²', '95', '3 568 Ft', '314 ezer Ft / hó', '86,50 m²', '101', '3 331 Ft', '186 ezer Ft / hó', '58 m²', '206', '3 106 Ft', '153 ezer Ft / hó', '52,50 m²', '96', '1 947 Ft', '90 ezer Ft / hó', '52,50 m²', '30', '2 250 Ft', '90 ezer Ft / hó', '36 m²', '56', '1 923 Ft', '70 ezer Ft / hó', '43 m²', '24', '1 957 Ft', '95 ezer Ft / hó', '52 m²', '23', '2 000 Ft', '88 ezer Ft / hó', '39 m²', '40', '2 143 Ft', '7

*In the current format, my list would make one row with way too many columns, so let's create a function to slice the list into smaller chunks. Credit goes to [Chris Albon](https://chrisalbon.com/python/data_wrangling/break_list_into_chunks_of_equal_size/) for the sample code.*

In [8]:
# Create a function called "chunks" with two arguments, l and n:
def chunks(l, n):
    # For item i in a range that is a length of l,
    for i in range(0, len(l), n):
        # Create an index range for l of n items:
        yield l[i:i+n]

In [9]:
#let's use the function and slice the list into chunks of 4
chunk = list(chunks(items_pc, 4))

In [10]:
#and assign the result to a dataframe
database = pd.DataFrame(chunk)

In [11]:
#Name the columns and have a look at the current state of the df
database.columns = ['Price per m2 in HUF', 'Monthly rent in HUF', 'Size in m2', 'Listings']
database

Unnamed: 0,Price per m2 in HUF,Monthly rent in HUF,Size in m2,Listings
0,3 478 Ft,232 ezer Ft / hó,"69,25 m²",2 429
1,3 948 Ft,321 ezer Ft / hó,"81,75 m²",118
2,3 624 Ft,353 ezer Ft / hó,"93,25 m²",268
3,2 788 Ft,166 ezer Ft / hó,"55,50 m²",73
4,2 663 Ft,100 ezer Ft / hó,"36,50 m²",33
5,3 336 Ft,198 ezer Ft / hó,"63,25 m²",122
6,4 477 Ft,322 ezer Ft / hó,"79,50 m²",494
7,3 953 Ft,267 ezer Ft / hó,"75,25 m²",332
8,3 357 Ft,198 ezer Ft / hó,"62,25 m²",161
9,3 232 Ft,160 ezer Ft / hó,52 m²,137


## Step two - cleaning the table
The data in the table is not very straightforward and there are unnecessary parts included - along with Hungarian words. I will drop the last column as I am not interested in the number of available listings, and I will also remove the Ft tag from the first column - to make sure that all I have is integers.

Then, I will replace the string in the monthly rent column with zeros to indicate the price as a number. I will also remove the m2 marker from the end of the Size data, as the unit is indicated in the column.

Last but not least, the District names column will be added to the dataframe on index 0 so it will be the first one visible. To make it more clear, the Hungarian word for district will be removed and I will leave the Roman numbers as indicators.

In [12]:
#Dropping last column
rent_data = database.drop("Listings", axis=1)

#Replacing string with zeros to indicate price
better = rent_data.replace('ezer Ft / hó', '000', regex=True)

#Removing currency from size column
better['Price per m2 in HUF'] = better['Price per m2 in HUF'].map(lambda x: str(x)[:-2])

#Removing m2 from size column
better['Size in m2'] = better['Size in m2'].map(lambda x: str(x)[:-2])

In [13]:
#I am adding the new column on the first index
better.insert(loc=0, column='District number', value=district_name)
better.head()

Unnamed: 0,District number,Price per m2 in HUF,Monthly rent in HUF,Size in m2
0,Budapesti átlag,3 478,232 000,6925
1,I. kerület,3 948,321 000,8175
2,II. kerület,3 624,353 000,9325
3,III. kerület,2 788,166 000,5550
4,IV. kerület,2 663,100 000,3650


In [14]:
#removing the Hungarian word for 'district' 
getting_there = better.replace(' kerület', ' ', regex=True)

#The first row is an average of the whole city, so we will not need that one, I am dropping it. 
getting_there = getting_there.drop(getting_there.index[0])

#Since we have no data available from district 23, I am dropping the row 
getting_there = getting_there.drop(getting_there.index[22])

## Step three - Postcodes
To be more precise when it comes to fetching coordinates, I looked up the postal codes of the districts we are analyzing. I will scrape the data from a [Wikipedia page](https://en.wikipedia.org/wiki/List_of_postal_codes_in_Hungary), and add the result to the DataFrame after a bit of cleaning.

In [15]:
web_url = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_in_Hungary").text
soup2 = BeautifulSoup(web_url,"lxml")

In [16]:
#Extracting the relevant 'li' tags
code_table = soup2.find_all("li")[12:35]
code_table

[<li>1010 - <a href="/wiki/V%C3%A1rker%C3%BClet" title="Várkerület">Várkerület</a> (I.)</li>,
 <li>1020 - <a href="/wiki/2nd_district_of_Budapest" title="2nd district of Budapest">2nd district</a> (II.)</li>,
 <li>1030 - <a href="/wiki/%C3%93buda-B%C3%A9k%C3%A1smegyer" title="Óbuda-Békásmegyer">Óbuda-Békásmegyer</a> (III.)</li>,
 <li>1040 - <a href="/wiki/%C3%9Ajpest" title="Újpest">Újpest</a> (IV.)</li>,
 <li>1050 - <a href="/wiki/Belv%C3%A1ros-Lip%C3%B3tv%C3%A1ros" title="Belváros-Lipótváros">Belváros-Lipótváros</a> (V.)</li>,
 <li>1060 - <a href="/wiki/Ter%C3%A9zv%C3%A1ros" title="Terézváros">Terézváros</a> (VI.)</li>,
 <li>1070 - <a href="/wiki/Erzs%C3%A9betv%C3%A1ros" title="Erzsébetváros">Erzsébetváros</a> (VII.)</li>,
 <li>1080 - <a href="/wiki/J%C3%B3zsefv%C3%A1ros" title="Józsefváros">Józsefváros</a> (VIII.)</li>,
 <li>1090 - <a href="/wiki/Ferencv%C3%A1ros" title="Ferencváros">Ferencváros</a> (IX.)</li>,
 <li>1100 - <a href="/wiki/K%C5%91b%C3%A1nya" title="Kőbánya">Kőbánya</a

In [17]:
#Creating a list object for postal codes and looping through to get the data
postal_codes = []
for row in code_table:
    postal_codes.append(row.text)

In [18]:
#Since district 23 is not examined, I am removing it 
codes = postal_codes[:22]

In [19]:
#Let's insert the postal codes into the dataframe on index 1 - so it comes right after the district number
getting_there.insert(loc=1, column='Postal code', value=codes)

In [20]:
getting_there.head(7)

Unnamed: 0,District number,Postal code,Price per m2 in HUF,Monthly rent in HUF,Size in m2
1,I.,1010 - Várkerület (I.),3 948,321 000,8175
2,II.,1020 - 2nd district (II.),3 624,353 000,9325
3,III.,1030 - Óbuda-Békásmegyer (III.),2 788,166 000,5550
4,IV.,1040 - Újpest (IV.),2 663,100 000,3650
5,V.,1050 - Belváros-Lipótváros (V.),3 336,198 000,6325
6,VI.,1060 - Terézváros (VI.),4 477,322 000,7950
7,VII.,1070 - Erzsébetváros (VII.),3 953,267 000,7525


In [21]:
#Let's take the pandas series from the postal code column and keep the first 4 digits only. Then I will insert it back into the dataframe.
new = getting_there['Postal code'].str.slice(stop=4)
getting_there['Postal code']=new
getting_there.head(7)

Unnamed: 0,District number,Postal code,Price per m2 in HUF,Monthly rent in HUF,Size in m2
1,I.,1010,3 948,321 000,8175
2,II.,1020,3 624,353 000,9325
3,III.,1030,2 788,166 000,5550
4,IV.,1040,2 663,100 000,3650
5,V.,1050,3 336,198 000,6325
6,VI.,1060,4 477,322 000,7950
7,VII.,1070,3 953,267 000,7525


In [22]:
#Just a final move, as I don't think I will actually need the size data, I am dropping that column and creating the finalized Budapest df below. 
budapest_df = getting_there.drop('Size in m2', axis=1)

In [23]:
budapest_df.head()

Unnamed: 0,District number,Postal code,Price per m2 in HUF,Monthly rent in HUF
1,I.,1010,3 948,321 000
2,II.,1020,3 624,353 000
3,III.,1030,2 788,166 000
4,IV.,1040,2 663,100 000
5,V.,1050,3 336,198 000


## Step four - Latitude and Longitude
I could not find a table to scrape with all the latitude and longitude information I needed so I used Google maps to look up the districts one by one and now I will be adding them to the dataframe manually. 

In [24]:
latitude = ['47.4968','47.5393','47.5672','47.5649','47.5002','47.5099','47.5027','47.4894','47.4649', '47.4821', '47.4593','47.4991','47.5355','47.5225', '47.5589', '47.5183', '47.4803', '47.4281', '47.4457', '47.4333', '47.4244', '47.4250']

In [25]:
longitude = ['19.0375','18.9869','19.0369','19.0913','19.0520','19.0626','19.0734','19.0707','19.0916','19.1575','19.0187','18.9905','19.0709','19.1147','19.1193','19.1919','19.2667','19.2098','19.1430','19.1193','19.0661','19.0317']

In [26]:
#index needs to be reset so the two new columns match the length of the dataframe
budapest_df = budapest_df.reset_index(drop=True)

#Th two lists need to be added as new columns
budapest_df['Latitude'] = latitude
budapest_df['Longitude'] = longitude
budapest_df.head()

Unnamed: 0,District number,Postal code,Price per m2 in HUF,Monthly rent in HUF,Latitude,Longitude
0,I.,1010,3 948,321 000,47.4968,19.0375
1,II.,1020,3 624,353 000,47.5393,18.9869
2,III.,1030,2 788,166 000,47.5672,19.0369
3,IV.,1040,2 663,100 000,47.5649,19.0913
4,V.,1050,3 336,198 000,47.5002,19.052


In [27]:
#to avoid any difficulties let's convert the latitude and longitude values into numeric values
budapest_df[["Latitude", "Longitude"]] = budapest_df[["Latitude", "Longitude"]].apply(pd.to_numeric)

In [28]:
address = 'Budapest, Hungary'

geolocator = Nominatim(user_agent="bp_explorer")
location = geolocator.geocode(address)
latitude2 = location.latitude
longitude2 = location.longitude
print('The geograpical coordinates of Budapest are {}, {}.'.format(latitude2, longitude2))

The geograpical coordinates of Budapest are 47.4983815, 19.0404707.


## Step five - Utilizing the Foursquare API
I have all the preliminary data I need, so I can get going with the Foursquare analysis. I will use what we learned of the Toronto example and cluster Budapest's districts into 6 groups. I will also create a map that shows cluster labels, postal codes and average monthly rent prices for each district. 

In [29]:
# @hidden_cell
CLIENT_ID = 'EFDRFA52VG03DWVWXMDQEUBWP4LKOKI20ELI2TTAOU2E5CXQ' # my Foursquare ID
CLIENT_SECRET = 'KQMRJ0EU2Z24TJGTHYNVWO5BIKX0Z25IAFJPH1BXD1MS2OGS' # my Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + 'Secret')
print('CLIENT_SECRET:' + 'Super Secret')

Your credentials:
CLIENT_ID: Secret
CLIENT_SECRET:Super Secret


In [30]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [31]:
# function that gets the venues nearby
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url3 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url3).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal code', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [32]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

In [33]:
budapest_venues = getNearbyVenues(names=budapest_df['Postal code'],
                                   latitudes=budapest_df['Latitude'],
                                   longitudes=budapest_df['Longitude']
                                  )

1010
1020
1030
1040
1050
1060
1070
1080
1090
1100
1110
1120
1130
1140
1150
1160
1170
1180
1190
1200
1210
1220


In [34]:
budapest_venues.groupby('Postal code').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1010,58,58,58,58,58,58
1020,4,4,4,4,4,4
1030,4,4,4,4,4,4
1040,26,26,26,26,26,26
1050,100,100,100,100,100,100
1060,98,98,98,98,98,98
1070,32,32,32,32,32,32
1080,62,62,62,62,62,62
1090,4,4,4,4,4,4
1100,13,13,13,13,13,13


In [35]:
print('There are {} unique categories.'.format(len(budapest_venues['Venue Category'].unique())))

There are 159 unique categories.


In [36]:
# one hot encoding
budapest_onehot = pd.get_dummies(budapest_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
budapest_onehot['Postal code'] = budapest_venues['Postal code'] 

# move neighborhood column to the first column
fixed_columns = [budapest_onehot.columns[-1]] + list(budapest_onehot.columns[:-1])
budapest_onehot = budapest_onehot[fixed_columns]

budapest_onehot.head()

Unnamed: 0,Postal code,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Baseball Field,Beer Bar,Beer Garden,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Butcher,Café,Camera Store,Castle,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gymnastics Gym,Health & Beauty Service,Health Food Store,Heliport,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Lebanese Restaurant,Light Rail Station,Lottery Retailer,Massage Studio,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Miscellaneous Shop,Modern European Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Music Store,Music Venue,Office,Optical Shop,Paintball Field,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Plaza,Print Shop,Pub,Radio Station,Record Shop,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Shoe Repair,Shopping Mall,Skate Park,Skating Rink,Snack Place,Soccer Stadium,Soup Place,Sporting Goods Shop,Stadium,Stationery Store,Steakhouse,Student Center,Supermarket,Sushi Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Track,Train Station,Tram Station,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,1010,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1010,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1010,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1010,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,1010,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [37]:
budapest_grouped = budapest_onehot.groupby('Postal code').mean().reset_index()
budapest_grouped

Unnamed: 0,Postal code,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Baseball Field,Beer Bar,Beer Garden,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Butcher,Café,Camera Store,Castle,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Dog Run,Donut Shop,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Food & Drink Shop,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gymnastics Gym,Health & Beauty Service,Health Food Store,Heliport,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Lebanese Restaurant,Light Rail Station,Lottery Retailer,Massage Studio,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Miscellaneous Shop,Modern European Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Music Store,Music Venue,Office,Optical Shop,Paintball Field,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Plaza,Print Shop,Pub,Radio Station,Record Shop,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Shoe Repair,Shopping Mall,Skate Park,Skating Rink,Snack Place,Soccer Stadium,Soup Place,Sporting Goods Shop,Stadium,Stationery Store,Steakhouse,Student Center,Supermarket,Sushi Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Track,Train Station,Tram Station,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,1010,0.0,0.017241,0.0,0.0,0.017241,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.086207,0.0,0.017241,0.0,0.017241,0.0,0.0,0.0,0.051724,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.017241,0.0,0.034483,0.0,0.068966,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.051724,0.0,0.0,0.034483,0.0,0.0,0.0,0.051724,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.017241,0.017241,0.0,0.0
1,1020,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1030,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1040,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.038462,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1050,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.02,0.0,0.01,0.0,0.04,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.05,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.08,0.01,0.07,0.03,0.01,0.0,0.0,0.08,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.07,0.02,0.0,0.03,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0
5,1060,0.0,0.0,0.0,0.0,0.010204,0.0,0.0,0.010204,0.010204,0.0,0.0,0.010204,0.0,0.0,0.010204,0.010204,0.0,0.010204,0.010204,0.0,0.0,0.010204,0.010204,0.0,0.0,0.010204,0.0,0.0,0.040816,0.0,0.05102,0.0,0.0,0.0,0.040816,0.0,0.020408,0.0,0.020408,0.010204,0.0,0.010204,0.0,0.020408,0.010204,0.0,0.0,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010204,0.0,0.0,0.0,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.0,0.020408,0.0,0.0,0.0,0.010204,0.020408,0.020408,0.0,0.010204,0.010204,0.030612,0.010204,0.0,0.030612,0.010204,0.010204,0.0,0.0,0.0,0.0,0.010204,0.0,0.0,0.010204,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010204,0.0,0.010204,0.0,0.0,0.020408,0.010204,0.05102,0.0,0.010204,0.0,0.010204,0.010204,0.010204,0.0,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.010204,0.0,0.010204,0.0,0.010204,0.010204,0.0,0.030612,0.020408,0.010204,0.0,0.010204,0.0,0.010204,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.0,0.0,0.030612,0.010204
6,1070,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.09375,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.09375,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0625,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,1080,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.032258,0.016129,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.032258,0.016129,0.0,0.016129,0.0,0.0,0.032258,0.0,0.129032,0.016129,0.0,0.0,0.016129,0.0,0.0,0.0,0.016129,0.032258,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.016129,0.0,0.0,0.0,0.048387,0.0,0.0,0.016129,0.0,0.016129,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.064516,0.0,0.016129,0.016129,0.032258,0.0,0.0,0.048387,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.016129,0.016129,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.032258,0.0,0.016129,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,1090,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,1100,0.0,0.0,0.0,0.076923,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.230769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [38]:
#Let's print the top 5 venues for each postal code and show their frequency
num_top_venues = 5

for hood in budapest_grouped['Postal code']:
    print("----"+hood+"----")
    temp = budapest_grouped[budapest_grouped['Postal code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----1010----
                  venue  freq
0                  Café  0.09
1  Hungarian Restaurant  0.07
2                 Plaza  0.07
3         Historic Site  0.07
4        Scenic Lookout  0.05


----1020----
         venue  freq
0         Park  0.50
1        Track  0.25
2   Playground  0.25
3  Art Gallery  0.00
4       Office  0.00


----1030----
            venue  freq
0         Dog Run  0.25
1        Heliport  0.25
2  Baseball Field  0.25
3   Train Station  0.25
4             Pub  0.00


----1040----
                  venue  freq
0              Pharmacy  0.08
1                  Park  0.08
2          Tram Station  0.08
3         Grocery Store  0.08
4  Hungarian Restaurant  0.04


----1050----
                  venue  freq
0                 Hotel  0.08
1    Italian Restaurant  0.08
2  Hungarian Restaurant  0.07
3            Restaurant  0.07
4           Coffee Shop  0.05


----1060----
             venue  freq
0      Coffee Shop  0.05
1              Pub  0.05
2   Cosmetics Shop  0.04
3 

In [39]:
#for simplicity let's create a function that list the most common venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [40]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Postal code'] = budapest_grouped['Postal code']

for ind in np.arange(budapest_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(budapest_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Postal code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1010,Café,Hungarian Restaurant,Plaza,Historic Site,Coffee Shop,Scenic Lookout,Pub,Bistro,Hotel,Park
1,1020,Park,Track,Playground,Food,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Fountain,Food Truck
2,1030,Dog Run,Heliport,Train Station,Baseball Field,Electronics Store,Eastern European Restaurant,Furniture / Home Store,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint
3,1040,Grocery Store,Pharmacy,Tram Station,Park,Hungarian Restaurant,Fast Food Restaurant,Stationery Store,Convenience Store,Performing Arts Venue,Farmers Market
4,1050,Italian Restaurant,Hotel,Hungarian Restaurant,Restaurant,Coffee Shop,Bistro,Wine Bar,Dessert Shop,Plaza,Sandwich Place


In [41]:
# set number of clusters
kclusters = 6

budapest_grouped_clustering = budapest_grouped.drop('Postal code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(budapest_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 3, 5, 1, 1, 1, 1, 1, 4, 2], dtype=int32)

In [42]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

budapest_merged = budapest_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
budapest_merged = budapest_merged.join(neighborhoods_venues_sorted.set_index('Postal code'), on='Postal code')

budapest_merged.head() # check the last columns!

Unnamed: 0,District number,Postal code,Price per m2 in HUF,Monthly rent in HUF,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,I.,1010,3 948,321 000,47.4968,19.0375,1,Café,Hungarian Restaurant,Plaza,Historic Site,Coffee Shop,Scenic Lookout,Pub,Bistro,Hotel,Park
1,II.,1020,3 624,353 000,47.5393,18.9869,3,Park,Track,Playground,Food,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Fountain,Food Truck
2,III.,1030,2 788,166 000,47.5672,19.0369,5,Dog Run,Heliport,Train Station,Baseball Field,Electronics Store,Eastern European Restaurant,Furniture / Home Store,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint
3,IV.,1040,2 663,100 000,47.5649,19.0913,1,Grocery Store,Pharmacy,Tram Station,Park,Hungarian Restaurant,Fast Food Restaurant,Stationery Store,Convenience Store,Performing Arts Venue,Farmers Market
4,V.,1050,3 336,198 000,47.5002,19.052,1,Italian Restaurant,Hotel,Hungarian Restaurant,Restaurant,Coffee Shop,Bistro,Wine Bar,Dessert Shop,Plaza,Sandwich Place


## Step six - Time to visualize it all

In [43]:
# create map
map_clusters = folium.Map(location=[latitude2, longitude2], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, price in zip(budapest_merged['Latitude'], budapest_merged['Longitude'], budapest_merged['Postal code'], budapest_merged['Cluster Labels'], budapest_merged['Monthly rent in HUF']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster) + ", " + price + "HUF/month", parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster 0

In [44]:
budapest_merged.loc[budapest_merged['Cluster Labels'] == 0, budapest_merged.columns[[1] + list(range(5, budapest_merged.shape[1]))]]

Unnamed: 0,Postal code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,1190,19.143,0,Bus Stop,Salon / Barbershop,Yoga Studio,Food & Drink Shop,Furniture / Home Store,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Fountain


### Cluster 1

In [45]:
budapest_merged.loc[budapest_merged['Cluster Labels'] == 1, budapest_merged.columns[[1] + list(range(5, budapest_merged.shape[1]))]]

Unnamed: 0,Postal code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1010,19.0375,1,Café,Hungarian Restaurant,Plaza,Historic Site,Coffee Shop,Scenic Lookout,Pub,Bistro,Hotel,Park
3,1040,19.0913,1,Grocery Store,Pharmacy,Tram Station,Park,Hungarian Restaurant,Fast Food Restaurant,Stationery Store,Convenience Store,Performing Arts Venue,Farmers Market
4,1050,19.052,1,Italian Restaurant,Hotel,Hungarian Restaurant,Restaurant,Coffee Shop,Bistro,Wine Bar,Dessert Shop,Plaza,Sandwich Place
5,1060,19.0626,1,Pub,Coffee Shop,Cosmetics Shop,Clothing Store,Women's Store,Italian Restaurant,Indian Restaurant,Thai Restaurant,Theater,Gym
6,1070,19.0734,1,Hotel,Coffee Shop,Bar,Café,Restaurant,Music Store,Burger Joint,Supermarket,Garden,Cocktail Bar
7,1080,19.0707,1,Coffee Shop,Hotel,Gastropub,Italian Restaurant,Plaza,Clothing Store,Café,Electronics Store,Hostel,Bakery
10,1110,19.0187,1,Food,Plaza,Playground,Pizza Place,Gym,Pharmacy,Fast Food Restaurant,Bistro,Yoga Studio,Fried Chicken Joint
11,1120,18.9905,1,Park,Platform,Hungarian Restaurant,Grocery Store,Playground,Dessert Shop,Pizza Place,Pharmacy,Bus Stop,Eastern European Restaurant
12,1130,19.0709,1,Grocery Store,Café,Coffee Shop,Restaurant,Diner,Sporting Goods Shop,Indian Restaurant,Fried Chicken Joint,Soup Place,Chinese Restaurant
13,1140,19.1147,1,Bus Stop,Eastern European Restaurant,Thrift / Vintage Store,Pizza Place,Burger Joint,Lottery Retailer,Music Venue,Tennis Court,Farmers Market,Cosmetics Shop


### Cluster 2

In [46]:
budapest_merged.loc[budapest_merged['Cluster Labels'] ==2, budapest_merged.columns[[1] + list(range(5, budapest_merged.shape[1]))]]

Unnamed: 0,Postal code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,1100,19.1575,2,Tram Station,Bus Stop,Brewery,Grocery Store,Fruit & Vegetable Store,Arts & Entertainment,Deli / Bodega,Athletics & Sports,Paper / Office Supplies Store,Yoga Studio
14,1150,19.1193,2,Bus Stop,Grocery Store,Supermarket,Restaurant,Pizza Place,Gymnastics Gym,Concert Hall,Food & Drink Shop,Garden Center,Soccer Stadium
17,1180,19.2098,2,Bus Stop,Playground,Tram Station,Food & Drink Shop,Furniture / Home Store,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Fountain
19,1200,19.1193,2,Bus Station,Tram Station,Grocery Store,Bus Stop,Fountain,Garden,Gaming Cafe,Furniture / Home Store,Fruit & Vegetable Store,Frozen Yogurt Shop


### Cluster 3

In [47]:
budapest_merged.loc[budapest_merged['Cluster Labels'] == 3, budapest_merged.columns[[1] + list(range(5, budapest_merged.shape[1]))]]

Unnamed: 0,Postal code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,1020,18.9869,3,Park,Track,Playground,Food,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Fountain,Food Truck


### Cluster 4


In [48]:
budapest_merged.loc[budapest_merged['Cluster Labels'] == 4, budapest_merged.columns[[1] + list(range(5, budapest_merged.shape[1]))]]

Unnamed: 0,Postal code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,1090,19.0916,4,Train Station,Electronics Store,Office,Yoga Studio,Food & Drink Shop,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Fountain


### Cluster 5

In [49]:
budapest_merged.loc[budapest_merged['Cluster Labels'] == 5, budapest_merged.columns[[1] + list(range(5, budapest_merged.shape[1]))]]

Unnamed: 0,Postal code,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,1030,19.0369,5,Dog Run,Heliport,Train Station,Baseball Field,Electronics Store,Eastern European Restaurant,Furniture / Home Store,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint


## Results

From the budapest_merged DataFrame it is clearly visible that pretty much each postcode has a number of convenience services nearby - mostly grocery shops, parks and public transport stops - so technically we can base our decision to move anywhere based on the rent price. For example in the outskirts  and bordering districts monthly rent can cost as much as 50% less than in the city center - so if we are very tight on the budget/ work somewhere around the edge of the city, moving into one of these districts would be a great option. 

Now if we also consider classic leisure opportunities, we will have to move in to cluster 1 - aka the city center in order to find valuable ways of spending time. Bars, restaurants, cinemas etc are more frequently located in the city center and surrounding area, however rent prices here can be crazy high. 

Based on the results of this analysis, my suggestion for the area that best matches out price vs quality of life standards, I would suggest to look for apartments in the 8th, 9th or 13th districts - so the price is relatively lower but leisure opportunities are still not far away and approachable. In case the person considering to move has a car, they should also consider the 3rd, 19th and 14th districts, as these are pretty conveniently located by car but have lower prices for bigger apartments. 


## Discussion

Since one of the 23 districts had no sufficient data on file regarding the average rent prices - the nearby venues have also been excluded for the 1230 postcode. Hence, in order to provide a truly accurate picture of all the potential housing options, further research would be required. It would be more efficient to consider extra factors, such as the person's future office location, their means of transport - car ownership, public transport, bicycle, etc., along with their leisure preferences. 

The rent database should be continuously updated in order to reflect the most current state. The main idea could actually be developed into an application in order to provide the best tailor made advise for expats seeking information about the city.

## Conclusion

The project intended to offer some information to those who would need to consider moving to Budapest, Hungary. To conduct the right analysis, data has been obtained from multiple sources, and analysed in a dataframe and by clustering. The basic results show that depending on one's preferences, there are a number of suitable districts to choose from when it comes to prices, however leisure opportunities get more and more limited as we leave the core of the city. 

The report raised a few questions and further data collection and development the basic idea could be finetuned and made into a phone application r a website, through which people can tailor their search based on their preferences.