## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing:2px; color:#b57edc; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #b57edc">Libraries</p>

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm, tqdm_notebook

import warnings
warnings.filterwarnings('ignore')

import re
import requests

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing:2px; color:#b57edc; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #b57edc">Intro</p>

Identifying safe areas within Tokyo and recommending Airbnb accommodations in those areas for travelers

> We will conduct EDA to explore key areas in Tokyo and collect incidents related to Tokyo real estate in those areas.

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing:2px; color:#b57edc; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #b57edc">Data</p>

**The data** utilizes public information compiled from the Airbnb web-site including the availabiity calendar for 365 days in the future, and the reviews for each listing. 

There are 74 independent variables:
<ul>
<li><strong>listing_gz.csv</strong><ul>
<li><code>id</code> Airbnb's unique identifier for the listing</li>
<li><code>scarpe_id</code> Inside Airbnb "Scrape" this was part of</li>
<li><code>host_id</code> Airbnb's unique identifier for the host/user</li>
<li><code>listing_url</code></li>
<li><code>last_scraped</code> UTC. The date and time this listing was "scraped".</li>
<li><code>source</code> One of "neighbourhood search" or "previous scrape". "neighbourhood search" means that the listing was found by searching the city, while "previous scrape" means that the listing was seen in another scrape performed in the last 65 days, and the listing was confirmed to be still available on the Airbnb site.</li>
<li><code>description</code> Detailed description of the listing</li>
<li><code>neighborhood_overview</code> Host's description of the neighbourhood</li>
<li><code>picture_url</code> URL to the Airbnb hosted regular sized image for the listing</li>
<li><code>host_url</code> The Airbnb page for the host</li>
<li><code>host_name</code> Name of the host. Usually just the first name(s)</li>
<li><code>host_since</code> The date the host/user was created. For hosts that are Airbnb guests this could be the date they registered as a guest.</li>
<li><code>host_location</code> The host's self reported location</li>
<li><code>host_about</code > Description about the host</li>
<li><code>host_response_time</code></li>
<li><code>host_response_rate</code></li>
<li><code>host_acceptance_rate</code> That rate at which a host accepts booking requests.</li>
<li><code>host_is_superhost</code></li>
<li><code>host_thumbnail_url</code></li>
<li><code>host_picture_url</code></li>
<li><code>host_listings_count</code> The number of listings the host has (per Airbnb calculations)</li>
<li><code>host_total_listings_count</code> The number of listings the host has (per Airbnb calculations)</li>
<li><code>host_verifications</code></li>
<li><code>host_has_profile_pic</code></li>
<li><code>host_identity_verified</code></li>
<li><code>neighbourhood</code></li>
<li><code>neighbourhood_cleansed</code> The neighbourhood as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles.</li>
<li><code>neighbourhood_group_cleansed</code> The neighbourhood group as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles.</li>
<li><code>latitude</code> Uses the World Geodetic System (WGS84) projection for latitude and longitude.</li>
<li><code>longitude</code> Uses the World Geodetic System (WGS84) projection for latitude and longitude.</li>
<li><code>property_type</code> Self selected property type. Hotels and Bed and Breakfasts are described as such by their hosts in this field</li>
<li><code>room_type</code> Entire home/apt|Private room|Shared room|Hotel</li>
<li><code>accommodates</code> The maximum capacity of the listing</li>
<li><code>bathrooms</code> The number of bathrooms in the listing</li>
<li><code>bathrooms_text</code> The number of bathrooms in the listing.</li>
<li><code>bedrooms</code> The number of bedrooms</li>
<li><code>beds</code> The number of bed(s)</li>
<li><code>price</code> daily price in local currency</li>
<li><code>minimum_nights</code> minimum number of night stay for the listing (calendar rules may be different)</li>
<li><code>maximum_nights</code> maximum number of night stay for the listing (calendar rules may be different)</li>
<li><code>minimum_minimum_nights</code> the smallest minimum_night value from the calender (looking 365 nights in the future)</li>
<li><code>maximum_minimum_nights</code> the largest minimum_night value from the calender (looking 365 nights in the future)</li>
<li><code>minimum_maximum_nights</code> the smallest maximum_night value from the calender (looking 365 nights in the future)</li>
<li><code>maximum_maximum_nights</code> the largest maximum_night value from the calender (looking 365 nights in the future)</li>
<li><code>minimum_nights_avg_ntm</code> the average minimum_night value from the calender (looking 365 nights in the future)</li>
<li><code>maximum_nights_avg_ntm</code> the average maximum_night value from the calender (looking 365 nights in the future)</li>
<li><code>calendar_updated</code></li>
<li><code>has_availability</code></li>
<li><code>availability_30</code> avaliability_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may not be available because it has been booked by a guest or blocked by the host.</li>
<li><code>availability_60</code> avaliability_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may not be available because it has been booked by a guest or blocked by the host.</li>
<li><code>availability_90</code> avaliability_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may not be available because it has been booked by a guest or blocked by the host.</li>
<li><code>availability_365</code> avaliability_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may not be available because it has been booked by a guest or blocked by the host.</li>
<li><code>number_of_reviews</code> The number of reviews the listing has</li>
<li><code>number_of_reviews_ltm</code> The number of reviews the listing has (in the last 12 months)</li>
<li><code>number_of_reviews_l30d</code> The number of reviews the listing has (in the last 30 days)</li>
<li><code>first_review</code> The date of the first/oldest review</li>
<li><code>last_review</code> The date of the last/newest review</li>
<li><code>review_scores_rating</code></li>
<li><code>review_scores_accuracy</code></li>
<li><code>review_scores_cleanliness</code></li>
<li><code>review_scores_checkin</code></li>
<li><code>review_scores_communication</code></li>
<li><code>review_scores_location</code></li>
<li><code>review_scores_value</code></li>
<li><code>license</code> The licence/permit/registration number</li>
<li><code>calculated_host_listings_count</code> The number of listings the host has in the current scrape, in the city/region geography.</li>
<li><code>calculated_host_listings_count_entire_homes</code> The number of Entire home/apt listings the host has in the current scrape, in the city/region geography</li>
<li><code>calculated_host_listings_count_private_rooms</code> The number of Private room listings the host has in the current scrape, in the city/region geography</li>
<li><code>calculated_host_listings_count_shared_rooms</code> The number of Shared room listings the host has in the current scrape, in the city/region geography</li>
<li><code>reviews_per_month</code> The number of reviews the listing has over the lifetime of the listing</li>

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing:2px; color:#b57edc; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #b57edc">EDA</p>

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing:2px; color:#1A5D1A; font-size:75%; text-align:left;padding: 0px; border-bottom: 3px solid #1A5D1A">Input Data</p>

In [2]:
listing = pd.read_csv('/Users/genie/PycharmProjects/PROJECT/AirbnbWise/Oshimaland_data/listings_gz.csv')
listing.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,...,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,197677,https://www.airbnb.com/rooms/197677,20230629055629,2023-06-29,city scrape,Rental unit in Sumida · ★4.78 · 1 bedroom · 2 ...,<b>The space</b><br />We are happy to welcome ...,,https://a0.muscache.com/pictures/38437056/d27f...,964081,...,4.83,4.53,4.79,M130003350,f,1,1,0,0,1.21
1,776070,https://www.airbnb.com/rooms/776070,20230629055629,2023-06-29,city scrape,Home in Kita-ku · ★4.98 · 1 bedroom · 1 bed · ...,We have been in airbnb since 2011 and it has g...,We love Nishinippori because is nearer to Toky...,https://a0.muscache.com/pictures/efd9f039-dbd2...,801494,...,4.98,4.83,4.91,,f,1,0,1,0,1.89
2,905944,https://www.airbnb.com/rooms/905944,20230629055629,2023-06-29,city scrape,Rental unit in Shibuya · ★4.76 · 2 bedrooms · ...,NEWLY RENOVATED property entirely for you & yo...,Hatagaya is a great neighborhood located 4 min...,https://a0.muscache.com/pictures/miso/Hosting-...,4847803,...,4.9,4.77,4.77,Hotels and Inns Business Act | 渋谷区保健所長 | 31渋健生...,t,5,5,0,0,1.49
3,1016831,https://www.airbnb.com/rooms/1016831,20230629055629,2023-06-29,city scrape,Home in Setagaya · ★4.94 · 1 bedroom · 2 beds ...,"Hi there, I am Wakana and I live with my two f...",The location is walkable distance to famous Sh...,https://a0.muscache.com/pictures/airflow/Hosti...,5596383,...,4.98,4.92,4.89,,f,1,0,1,0,1.96
4,1196177,https://www.airbnb.com/rooms/1196177,20230629055629,2023-06-29,city scrape,Home in 足立区 · ★4.71 · 1 bedroom · 1.5 shared b...,Ｓtay with host.We can help your travel.<br />B...,There are shopping mall near Senjuohashi stati...,https://a0.muscache.com/pictures/72890882/05ec...,5686404,...,4.88,4.67,4.75,,f,1,0,1,0,0.79


In [3]:
#* listing.csv.gz 에서 input data로 사용할 칼럼 지정
inputDF = listing[['latitude', 'longitude', 'price', 'room_type', 'accommodates', 'bedrooms', 'beds', 'review_scores_rating']] #* bathrooms 칼럼엔 데이터 값이 없음
inputDF

Unnamed: 0,latitude,longitude,price,room_type,accommodates,bedrooms,beds,review_scores_rating
0,35.717070,139.826080,"$11,000.00",Entire home/apt,2,1.0,2.0,4.78
1,35.738440,139.769170,"$7,208.00",Private room,1,,1.0,4.98
2,35.678780,139.678470,"$23,066.00",Entire home/apt,6,2.0,4.0,4.76
3,35.658000,139.671340,"$16,000.00",Private room,2,,2.0,4.94
4,35.744731,139.797384,"$10,000.00",Private room,4,,,4.71
...,...,...,...,...,...,...,...,...
11172,35.697773,139.706543,"$12,000.00",Entire home/apt,4,1.0,3.0,
11173,35.698980,139.694320,"$16,000.00",Entire home/apt,3,1.0,2.0,
11174,35.700080,139.695020,"$16,000.00",Entire home/apt,4,1.0,2.0,
11175,35.699860,139.693340,"$40,000.00",Entire home/apt,9,3.0,6.0,


In [4]:
#* 특수문자 제거
def remove_special_characters(text):
    #* \w는 숫자와 문자를, \s는 공백을, ^는 이들을 제외한 모든 문자를 의미
    pattern = r'[^\w\s]'
    return re.sub(pattern, '', text)

#* objet type 칼럼 중 price 칼럼과 room_type 칼럼의 특수 문자 제거
inputDF['price'] = inputDF['price'].apply(remove_special_characters)
#* price 칼럼 타입 float로 변경
inputDF['price'] = inputDF['price'].astype('float64')
inputDF['room_type'] = inputDF['room_type'].str.replace('/', ' ')
inputDF.head()

Unnamed: 0,latitude,longitude,price,room_type,accommodates,bedrooms,beds,review_scores_rating
0,35.71707,139.82608,1100000.0,Entire home apt,2,1.0,2.0,4.78
1,35.73844,139.76917,720800.0,Private room,1,,1.0,4.98
2,35.67878,139.67847,2306600.0,Entire home apt,6,2.0,4.0,4.76
3,35.658,139.67134,1600000.0,Private room,2,,2.0,4.94
4,35.744731,139.797384,1000000.0,Private room,4,,,4.71


In [5]:

def get_address_from_latlng(latitude, longitude, api_key):
    url = f'https://maps.googleapis.com/maps/api/geocode/json?latlng={latitude},{longitude}&key={api_key}'
    response = requests.get(url)
    data = response.json()
    if data['status'] == 'OK':
        return data['results'][0]['formatted_address']
    else:
        return None

# Google Maps API 키
api_key = 'AIzaSyAbPJzcE8aKus-zTk45YZJdLwP9I9Zo01w'

addressList = []
for latitude, longitude in tqdm_notebook(zip(inputDF['latitude'],inputDF['longitude'])):
     address = get_address_from_latlng(latitude, longitude, api_key)
     if address:
         #print(f'주소 : {address}')
         addressList.append(address)
     else:
         #print(f'해당 위치의 주소를 찾을 수 없습니다')
         addressList.append(None)

0it [00:00, ?it/s]

주소 : 2-chōme-27-16 Yahiro, Sumida City, Tokyo 131-0041, Japan
주소 : 1-chōme-26-7 Tabatashinmachi, Kita City, Tokyo 114-0012, Japan
주소 : 2-chōme-1-2-1 Hatagaya, Shibuya City, Tokyo 151-0072, Japan
주소 : Japan, 〒155-0032 Tokyo, Setagaya City, Daizawa, 2-chōme−16, 池ノ上フラット
주소 : 1-2 Senjumiyamotochō, Adachi City, Tokyo 120-0043, Japan
주소 : 5-chōme-4-20 Hiroo, Shibuya City, Tokyo 150-0012, Japan
주소 : 2-chōme-46-14 Okusawa, Setagaya City, Tokyo 158-0083, Japan
주소 : 8-chōme-14-18 Tateishi, Katsushika City, Tokyo 124-0012, Japan
주소 : 2-2 Tsukudochō, Shinjuku City, Tokyo 162-0821, Japan
주소 : Japan, 〒152-0002 Tokyo, Meguro City, Megurohonchō, 5-chōme−19−６ セザール武蔵小山
주소 : 4-chōme-37-13 Higashitateishi, Katsushika City, Tokyo 124-0013, Japan
주소 : Japan, 〒169-0072 Tokyo, Shinjuku City, Ōkubo, 1-chōme−11−３ 大東ビル １階
주소 : Japan, 〒169-0072 Tokyo, Shinjuku City, Ōkubo, 1-chōme−11−１ ŌMORI BLD
주소 : Japan, 〒169-0072 Tokyo, Shinjuku City, Ōkubo, 1-chōme−11−１ ŌMORI BLD
주소 : Japan, 〒169-0072 Tokyo, Shinjuku City, Ō

KeyboardInterrupt: 