# Day22 Airbnb in Berlin 3/5 the ring zone 
# 柏林Airbnb 3/5 蛋黃區
![Title](2201.png)

到柏林旅遊，會發現市區交通票券由放射狀分為A、B、C三個區塊，想買長期票券分法只有A+B區、B+C區、A+B+C區，以一般旅遊民眾而言，移動範圍多會在A+B區，網路上沒有找到相關A+B區詳細劃分的資料，但可以這個[柏林低排放區](https://www.berlin.de/senuvk/umwelt/luftqualitaet/umweltzone/en/gebiet.shtml)為依據，在這個區域的房源交通易達度較高，故只取出這些地區的房源做分析。這個網站由於柏林以大約S-Bahn路面輕軌環狀電車以內規劃為低排放區，姑且就稱之為柏林蛋黃區XD。

The transportation in Berlin is devided into A, B, and C three zones. But if you want to buy a longterm ticket, you can only choose between A+B zone, B+C zone, and A+B+C zone. I couldn't find any data that specify the border of three zones but I found this [Low-emission Zone Area](https://www.berlin.de/senuvk/umwelt/luftqualitaet/umweltzone/en/gebiet.shtml) instead. The low-emission zone covers the centre of Berlin inside the S-Bahn ring so we will analyse the listings in this area.
![Title](2202.JPG)
![Title](2203.JPG)

## 載入常用套件並讀入我們要分析的資料

First, we need to import the packeges we need and read in the data we are about to analyse.

In [1]:
# 載入所需套件 import the packages we need
import pandas as pd 
import numpy as np 

In [2]:
ringzipcode = pd.read_csv('airbnb/ring_zipcode.csv') # 讀入ring_zipcode.csv檔案 read in the ring_zipcode.csv file
ringzipcode.columns = ['zipcode', 'info'] # 改下欄位名 change the name of the columns
ringzipcode.head(10) # 讀取前十筆資料看一下 call the top 10 rows to take a look

Unnamed: 0,zipcode,info
0,,
1,Zuordnung der Postleitzahlen zur Umweltzone Be...,
2,auf Basis der im Regionalen Bezugssystem Berli...,
3,,
4,Postleitzahl,Umweltzone Berlin
5,10115,innerhalb
6,10117,innerhalb
7,10119,innerhalb
8,10178,innerhalb
9,10179,innerhalb


In [3]:
ringzipcode = np.array(ringzipcode)
ringzipcode_list = ringzipcode.tolist()
print(ringzipcode_list[:10]) # 印出前10筆看看 print out the top 10 list 

[[nan, nan], ['Zuordnung der Postleitzahlen zur Umweltzone Berlin', nan], ['auf Basis der im Regionalen Bezugssystem Berlin (RBS) verorteten Adressen ', nan], [nan, nan], ['Postleitzahl', 'Umweltzone Berlin'], ['10115', 'innerhalb'], ['10117', 'innerhalb'], ['10119', 'innerhalb'], ['10178', 'innerhalb'], ['10179', 'innerhalb']]


In [4]:
# 我們只要存整個區域在環狀輕軌電車內的郵遞區號 we only want the postcode of areas that's in the S-Bahn ring
rz = []
for i in ringzipcode_list:
    if i[1] == 'innerhalb':
        rz.append(i[0])
print(rz)    

['10115', '10117', '10119', '10178', '10179', '10243', '10249', '10435', '10551', '10555', '10557', '10559', '10585', '10587', '10623', '10625', '10627', '10629', '10707', '10709', '10717', '10719', '10777', '10779', '10781', '10783', '10785', '10787', '10789', '10823', '10825', '10961', '10963', '10965', '10967', '10969', '10997', '10999', '12043', '12045', '12047', '12049', '12053', '12101', '13355']


In [5]:
import warnings # 忽略警告訊息 
warnings.filterwarnings("ignore") 

In [6]:
# 讀入listing檔案來分析 Read in the listing file
listing = pd.read_csv('airbnb/listings.csv') # 讀入listing檔案來分析 read in the listing file
print('There are', listing.id.nunique(), 'listings in the listing data.')
listing.info() # 查看資料細節 the info of data
listing.head(3) # 叫出前三筆資料看看 print out the top three rows of data

There are 24395 listings in the listing data.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24395 entries, 0 to 24394
Columns: 106 entries, id to reviews_per_month
dtypes: float64(23), int64(21), object(62)
memory usage: 19.7+ MB


Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,1944,https://www.airbnb.com/rooms/1944,20190711004031,2019-07-11,cafeheaven Pberg/Mitte/Wed for the summer 2019,"Private, bright and friendly room. You'd be sh...","The room is very large, private, cozy, bright,...","Private, bright and friendly room. You'd be sh...",none,near all the trendy cafés and flea markets and...,...,f,f,moderate,f,f,1,0,1,0,0.25
1,2015,https://www.airbnb.com/rooms/2015,20190711004031,2019-07-11,Berlin-Mitte Value! Quiet courtyard/very central,Great location! 30 of 75 sq meters. This wood...,A+++ location! This „Einliegerwohnung“ is an e...,Great location! 30 of 75 sq meters. This wood...,none,It is located in the former East Berlin area o...,...,f,f,moderate,f,f,4,4,0,0,3.18
2,3176,https://www.airbnb.com/rooms/3176,20190711004031,2019-07-11,Fabulous Flat in great Location,This beautiful first floor apartment is situa...,1st floor (68m2) apartment on Kollwitzplatz/ P...,This beautiful first floor apartment is situa...,none,The neighbourhood is famous for its variety of...,...,f,f,strict_14_with_grace_period,f,f,1,1,0,0,1.18


In [7]:
abzone = listing["zipcode"].isin(rz)
ablisting = listing[abzone]
ablisting.info()
print('篩過郵遞區號後少了一半以上的房源數量。')
print('The listing counts become less than a half after filtered with postcode in the S-Bahn ring area.')

<class 'pandas.core.frame.DataFrame'>
Int64Index: 11609 entries, 0 to 24394
Columns: 106 entries, id to reviews_per_month
dtypes: float64(23), int64(21), object(62)
memory usage: 9.5+ MB
篩過郵遞區號後少了一半以上的房源數量。
The listing counts become less than a half after filtered with postcode in the S-Bahn ring area.


In [8]:
# 篩選後數量前10名不一樣了，沒想到有些偏遠地區房源數量很多
# top 10 listings changed after filtering, didn't know that there were many listings in the regional area.
grouped_ab_df = ablisting.groupby('neighbourhood_cleansed').count()[['id']].sort_values('id', ascending=False).head(10) 
grouped_ab_df

Unnamed: 0_level_0,id
neighbourhood_cleansed,Unnamed: 1_level_1
Tempelhofer Vorstadt,1371
Alexanderplatz,1195
Reuterstraße,1004
Brunnenstr. Süd,835
südliche Luisenstadt,663
Schillerpromenade,481
Neuköllner Mitte/Zentrum,473
nördliche Luisenstadt,472
Schöneberg-Nord,456
Südliche Friedrichstadt,430


In [9]:
grouped_ab_df.index

Index(['Tempelhofer Vorstadt', 'Alexanderplatz', 'Reuterstraße',
       'Brunnenstr. Süd', 'südliche Luisenstadt', 'Schillerpromenade',
       'Neuköllner Mitte/Zentrum', 'nördliche Luisenstadt', 'Schöneberg-Nord',
       'Südliche Friedrichstadt'],
      dtype='object', name='neighbourhood_cleansed')

In [10]:
# 房源數量前10的區域 The areas with the top 10 listings
top10 = []
for i in range(10):
    top10.append(grouped_ab_df.index[i])
print(top10)

['Tempelhofer Vorstadt', 'Alexanderplatz', 'Reuterstraße', 'Brunnenstr. Süd', 'südliche Luisenstadt', 'Schillerpromenade', 'Neuköllner Mitte/Zentrum', 'nördliche Luisenstadt', 'Schöneberg-Nord', 'Südliche Friedrichstadt']


In [11]:
ablisting_iftop10 = ablisting["neighbourhood_cleansed"].isin(top10)
ab_top10_listing = ablisting[ablisting_iftop10]
ab_top10_listing.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7380 entries, 1 to 24394
Columns: 106 entries, id to reviews_per_month
dtypes: float64(23), int64(21), object(62)
memory usage: 6.0+ MB


In [12]:
ab_top10_listing.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
1,2015,https://www.airbnb.com/rooms/2015,20190711004031,2019-07-11,Berlin-Mitte Value! Quiet courtyard/very central,Great location! 30 of 75 sq meters. This wood...,A+++ location! This „Einliegerwohnung“ is an e...,Great location! 30 of 75 sq meters. This wood...,none,It is located in the former East Berlin area o...,...,f,f,moderate,f,f,4,4,0,0,3.18
3,3309,https://www.airbnb.com/rooms/3309,20190711004031,2019-07-11,BerlinSpot Schöneberg near KaDeWe,First of all: I prefer short-notice bookings. ...,"Your room is really big and has 26 sqm, is ver...",First of all: I prefer short-notice bookings. ...,none,"My flat is in the middle of West-Berlin, direc...",...,f,f,strict_14_with_grace_period,f,f,1,0,1,0,0.38
9,16644,https://www.airbnb.com/rooms/16644,20190711004031,2019-07-11,In the Heart of Berlin - Kreuzberg,Light and sunny 2-Room-turn of the century-fla...,Rent in the heart of Berlin - Kreuzberg Newly ...,Light and sunny 2-Room-turn of the century-fla...,none,Our Part of Kreuzberg is just the best. Good v...,...,f,f,strict_14_with_grace_period,f,t,2,2,0,0,0.43
10,17904,https://www.airbnb.com/rooms/17904,20190711004031,2019-07-11,Beautiful Kreuzberg studio/WiFi (reg. pend.),,- beautiful studio apt in downtown Berlin - br...,- beautiful studio apt in downtown Berlin - br...,none,"Die Wohnung liegt in Kreuzberg, das für seine ...",...,f,f,strict_14_with_grace_period,f,f,1,1,0,0,2.11
12,21869,https://www.airbnb.com/rooms/21869,20190711004031,2019-07-11,Studio in the Heart of Kreuzberg,Light and sunny 1-Room-turn of the century-fla...,The apartment has two very comfortable high en...,Light and sunny 1-Room-turn of the century-fla...,none,Our Part of Kreuzberg is just the best. Good v...,...,f,f,strict_14_with_grace_period,f,t,2,2,0,0,0.56


In [13]:
# 把位於蛋黃區房源數量前10名的存成一個新的csv檔，明天來好好分析 
# save the top 10 listings areas within the low-emission zone for further analysis
ab_top10_listing.to_csv('ab_top10_listing.csv')

本篇程式碼與範例檔案請參考[Github](https://github.com/tgnco1218/Data-Cleaning-and-Scraping-30Days)。The code and example files are available on [Github](https://github.com/tgnco1218/Data-Cleaning-and-Scraping-30Days).

文中若有錯誤還望不吝指正，感激不盡。
Please let me know if there’s any mistake in this article. Thanks for reading.

Reference 參考資料：

[1] [Inside Airbnb](http://insideairbnb.com/get-the-data.html)

[2] [利用Airbnb來更了解居住城市，以臺北為例 Python實作（上）](https://medium.com/finformation%E7%95%B6%E7%A8%8B%E5%BC%8F%E9%81%87%E4%B8%8A%E8%B2%A1%E5%8B%99%E9%87%91%E8%9E%8D/%E5%88%A9%E7%94%A8airbnb%E4%BE%86%E6%9B%B4%E4%BA%86%E8%A7%A3%E5%B1%85%E4%BD%8F%E5%9F%8E%E5%B8%82-%E4%BB%A5%E8%87%BA%E5%8C%97%E7%82%BA%E4%BE%8B-python%E5%AF%A6%E4%BD%9C-3f4903e8742)

[3] [Airbnb listings in Berlin](https://github.com/tgnco1218/Data-Cleaning-and-Scraping-30Days/blob/master/Day19_Airbnb_in_Berlin/Berlin_airbnb.ipynb)

[4] [Low-emission Zone Area](https://www.berlin.de/senuvk/umwelt/luftqualitaet/umweltzone/en/gebiet.shtml)

[5] [How To Use The Berlin Public Transport Without A Fine](https://smartstayberlin.com/how-to-use-the-berlin-public-transport-without-a-fine/)

