## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing:2px; color:#b57edc; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #b57edc">Libraries</p>

In [2]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.auto import tqdm
from pandas.io.formats.style import Styler

import warnings
warnings.filterwarnings('ignore')

pd.options.display.max_columns = 999
pd.options.display.max_colwidth = 999

tqdm.pandas()

rc = {
    "axes.facecolor": "#F8F8F8", 
    "figure.facecolor": "#F8F8F8", 
    "axes.edgecolor": "#000000",  
    "grid.color": "#EBEBE7" + "30",
    "font.family": "serif",
    "axes.labelcolor": "#000000",
    "xtick.color": "#000000", 
    "ytick.color": "#000000",
    "grid.alpha": 0.4 
}

sns.set(rc=rc) 
palette = ['#ff7f50', '#ffd700', '#ffdab9', '#9fe2bf',
           '#d2b48c', '#008080', '#98ff98', '#000080']


from colorama import Style, Fore 
blk = Style.BRIGHT + Fore.BLACK
gld = Style.BRIGHT + Fore.YELLOW
grn = Style.BRIGHT + Fore.GREEN
red = Style.BRIGHT + Fore.RED
blu = Style.BRIGHT + Fore.BLUE
res = Style.RESET_ALL

import json

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing:2px; color:#b57edc; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #b57edc">Intro</p>

Identifying safe areas within Tokyo and recommending Airbnb accommodations in those areas for travelers

> We will conduct EDA to explore key areas in Tokyo and collect incidents related to Tokyo real estate in those areas.

## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing:2px; color:#b57edc; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #b57edc">Data</p>

**The data** utilizes public information compiled from the Airbnb web-site including the availabiity calendar for 365 days in the future, and the reviews for each listing. 

There are 74 independent variables:
<ul>
<li><strong>listing_gz.csv</strong><ul>
<li><code>id</code> Airbnb's unique identifier for the listing</li>
<li><code>scarpe_id</code> Inside Airbnb "Scrape" this was part of</li>
<li><code>host_id</code> Airbnb's unique identifier for the host/user</li>
<li><code>listing_url</code></li>
<li><code>last_scraped</code> UTC. The date and time this listing was "scraped".</li>
<li><code>source</code> One of "neighbourhood search" or "previous scrape". "neighbourhood search" means that the listing was found by searching the city, while "previous scrape" means that the listing was seen in another scrape performed in the last 65 days, and the listing was confirmed to be still available on the Airbnb site.</li>
<li><code>description</code> Detailed description of the listing</li>
<li><code>neighborhood_overview</code> Host's description of the neighbourhood</li>
<li><code>picture_url</code> URL to the Airbnb hosted regular sized image for the listing</li>
<li><code>host_url</code> The Airbnb page for the host</li>
<li><code>host_name</code> Name of the host. Usually just the first name(s)</li>
<li><code>host_since</code> The date the host/user was created. For hosts that are Airbnb guests this could be the date they registered as a guest.</li>
<li><code>host_location</code> The host's self reported location</li>
<li><code>host_about</code > Description about the host</li>
<li><code>host_response_time</code></li>
<li><code>host_response_rate</code></li>
<li><code>host_acceptance_rate</code> That rate at which a host accepts booking requests.</li>
<li><code>host_is_superhost</code></li>
<li><code>host_thumbnail_url</code></li>
<li><code>host_picture_url</code></li>
<li><code>host_listings_count</code> The number of listings the host has (per Airbnb calculations)</li>
<li><code>host_total_listings_count</code> The number of listings the host has (per Airbnb calculations)</li>
<li><code>host_verifications</code></li>
<li><code>host_has_profile_pic</code></li>
<li><code>host_identity_verified</code></li>
<li><code>neighbourhood</code></li>
<li><code>neighbourhood_cleansed</code> The neighbourhood as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles.</li>
<li><code>neighbourhood_group_cleansed</code> The neighbourhood group as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles.</li>
<li><code>latitude</code> Uses the World Geodetic System (WGS84) projection for latitude and longitude.</li>
<li><code>longitude</code> Uses the World Geodetic System (WGS84) projection for latitude and longitude.</li>
<li><code>property_type</code> Self selected property type. Hotels and Bed and Breakfasts are described as such by their hosts in this field</li>
<li><code>room_type</code> Entire home/apt|Private room|Shared room|Hotel</li>
<li><code>accommodates</code> The maximum capacity of the listing</li>
<li><code>bathrooms</code> The number of bathrooms in the listing</li>
<li><code>bathrooms_text</code> The number of bathrooms in the listing.</li>
<li><code>bedrooms</code> The number of bedrooms</li>
<li><code>beds</code> The number of bed(s)</li>
<li><code>price</code> daily price in local currency</li>
<li><code>minimum_nights</code> minimum number of night stay for the listing (calendar rules may be different)</li>
<li><code>maximum_nights</code> maximum number of night stay for the listing (calendar rules may be different)</li>
<li><code>minimum_minimum_nights</code> the smallest minimum_night value from the calender (looking 365 nights in the future)</li>
<li><code>maximum_minimum_nights</code> the largest minimum_night value from the calender (looking 365 nights in the future)</li>
<li><code>minimum_maximum_nights</code> the smallest maximum_night value from the calender (looking 365 nights in the future)</li>
<li><code>maximum_maximum_nights</code> the largest maximum_night value from the calender (looking 365 nights in the future)</li>
<li><code>minimum_nights_avg_ntm</code> the average minimum_night value from the calender (looking 365 nights in the future)</li>
<li><code>maximum_nights_avg_ntm</code> the average maximum_night value from the calender (looking 365 nights in the future)</li>
<li><code>calendar_updated</code></li>
<li><code>has_availability</code></li>
<li><code>availability_30</code> avaliability_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may not be available because it has been booked by a guest or blocked by the host.</li>
<li><code>availability_60</code> avaliability_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may not be available because it has been booked by a guest or blocked by the host.</li>
<li><code>availability_90</code> avaliability_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may not be available because it has been booked by a guest or blocked by the host.</li>
<li><code>availability_365</code> avaliability_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may not be available because it has been booked by a guest or blocked by the host.</li>
<li><code>number_of_reviews</code> The number of reviews the listing has</li>
<li><code>number_of_reviews_ltm</code> The number of reviews the listing has (in the last 12 months)</li>
<li><code>number_of_reviews_l30d</code> The number of reviews the listing has (in the last 30 days)</li>
<li><code>first_review</code> The date of the first/oldest review</li>
<li><code>last_review</code> The date of the last/newest review</li>
<li><code>review_scores_rating</code></li>
<li><code>review_scores_accuracy</code></li>
<li><code>review_scores_cleanliness</code></li>
<li><code>review_scores_checkin</code></li>
<li><code>review_scores_communication</code></li>
<li><code>review_scores_location</code></li>
<li><code>review_scores_value</code></li>
<li><code>license</code> The licence/permit/registration number</li>
<li><code>calculated_host_listings_count</code> The number of listings the host has in the current scrape, in the city/region geography.</li>
<li><code>calculated_host_listings_count_entire_homes</code> The number of Entire home/apt listings the host has in the current scrape, in the city/region geography</li>
<li><code>calculated_host_listings_count_private_rooms</code> The number of Private room listings the host has in the current scrape, in the city/region geography</li>
<li><code>calculated_host_listings_count_shared_rooms</code> The number of Shared room listings the host has in the current scrape, in the city/region geography</li>
<li><code>reviews_per_month</code> The number of reviews the listing has over the lifetime of the listing</li>

In [3]:
PATH_DF = '/Users/genie/PycharmProjects/PROJECT/AirbnbWise/Oshimaland_data/listings_gz.csv'
df = pd.read_csv(PATH_DF)

In [4]:
print(f'{blk}[INFO] Shapes:'
      f'{blk}\n listngs_gz.csv --> {red}{df.shape}')

[1m[30m[INFO] Shapes:[1m[30m
 listngs_gz.csv --> [1m[31m(11177, 75)


**Note**

* There are some missing values in the data, let's explore where and how many

In [5]:
missing = df.isna().sum().reset_index()
missing.columns = ['columns', 'missing_count']

print(f'{blk}[INFO] Any missing values:'
      f'\n\n{red}{missing}{res}')

[1m[30m[INFO] Any missing values:

[1m[31m                                         columns  missing_count
0                                             id              0
1                                    listing_url              0
2                                      scrape_id              0
3                                   last_scraped              0
4                                         source              0
..                                           ...            ...
70                calculated_host_listings_count              0
71   calculated_host_listings_count_entire_homes              0
72  calculated_host_listings_count_private_rooms              0
73   calculated_host_listings_count_shared_rooms              0
74                             reviews_per_month           1252

[75 rows x 2 columns][0m


In [6]:
def magnify(is_test : bool = False): 
    base_color = '#b57edc'
    if is_test:
        highlight_target_row = []
    else:
        highlight_target_row = dict(selector = 'tr:last-child',
                            props = [('background-color', f'{base_color}' + '20')]) 
    
    return [dict(selector="th", 
                props=[("font-size", "11pt"),
                    ('background-color', f'{base_color}'),
                    ('color', 'white'),
                    ('font-weight', 'bold'),
                    ('border-bottom', '0.1px solid white'), 
                    ('border-left', '0.1px solid white'), 
                    ('text-align', 'right')]),
        
            dict(selector='th.blank.level0', 
                props=[('font-weight', 'bold'),
                        ('border-left', '1.7px solid white'),
                        ('background-color', 'white')]),

            dict(selector="td", 
                    props=[('padding', "0.5em 1em"), 
                        ('text-align', 'right')]),

            dict(selector="th:hover",
                    props=[("font-size", "14pt")]),

            dict(selector="tr:hover td:hover",
                    props=[('max-width', '250px'),
                        ('font-size', '14pt'),
                        ('color', f'{base_color}'),
                        ('font-weight', 'bold'),
                        ('background-color', 'white'),
                        ('border', f'1px dashed {base_color}')]),
            
            dict(selector="caption", 
                props=[(('caption-side', 'bottom'))])] + highlight_target_row

def stylize_simple(df: pd.DataFrame, caption: str) -> Styler:
    """
        Args:
            df: any dataframe (train/test/origin)

        Returns:
            s: the dataframe wrapped into Styler.
    """
    s = df
    s = s.style.set_table_styles(magnify(True)).set_caption(f"{caption}")
    return s

display(stylize_simple(df.head(1), 'listing dataset 1 top row (hover to magnify).'))

Unnamed: 0,id,listing_url,scrape_id,last_scraped,source,name,description,neighborhood_overview,picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,latitude,longitude,property_type,room_type,accommodates,bathrooms,bathrooms_text,bedrooms,beds,amenities,price,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,license,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,197677,https://www.airbnb.com/rooms/197677,20230629055629,2023-06-29,city scrape,Rental unit in Sumida · ★4.78 · 1 bedroom · 2 beds · 1 bath,"The space We are happy to welcome you to our apartment, located in the heart of Tokyo downtown. This is an authentic Japanese apartment with Tatami mattress room and sleeping on Japanese Futon, like Ryokan style. Fully equipped and convienient kitchen will give you oportunity to feel like at home. Automatic bath tub. Separate toilet with heating seat and washlet. Direct acces from both Narita and Haneda airports. Easy access to most of Tokyo attractions. 10min walk from Oshiage Station, 7min walk from Tobu Hikifune Station, 8min walk from Heisei Hikifune Station. Free internet access. Air conditioning, 2 semi-double futon bed (for 2 person each), LCD 32 inch TV, full kitchen, microwave, toster, electric pot, refrigerator, coffee maker, iron, hair dryer, washing machine, bathroom with a bath tub and shower, gas grill. Cooking utensils and linens provided. Our apartment is locate",,https://a0.muscache.com/pictures/38437056/d27fa43f_original.jpg,964081,https://www.airbnb.com/users/show/964081,Yoshimi & Marek,2011-08-13,"Tokyo, Japan",Would love to travel all over the world and meet and feel the different people and cultures.,within a few hours,100%,88%,t,https://a0.muscache.com/im/users/964081/profile_pic/1319512318/original.jpg?aki_policy=profile_small,https://a0.muscache.com/im/users/964081/profile_pic/1319512318/original.jpg?aki_policy=profile_x_medium,Sumida District,1,2,"['email', 'phone']",t,t,,Sumida Ku,,35.71707,139.82608,Entire rental unit,Entire home/apt,2,,1 bath,1.0,2.0,"[""Dryer"", ""Free washer \u2013 In unit"", ""Smoke alarm"", ""Shampoo"", ""Carbon monoxide alarm"", ""Air conditioning"", ""Wifi"", ""TV"", ""Kitchen"", ""Fire extinguisher"", ""Hair dryer"", ""Iron"", ""Refrigerator"", ""Essentials"", ""Dishes and silverware"", ""Heating"", ""Hangers"", ""Hot water"", ""Self check-in"", ""Lockbox"", ""Microwave""]","$11,000.00",3,1125,3.0,3.0,1125.0,1125.0,3.0,1125.0,,t,0,0,0,24,2023-06-29,173,8,1,2011-09-21,2023-05-30,4.78,4.74,4.92,4.84,4.83,4.53,4.79,M130003350,f,1,1,0,0,1.21


**The data** is Detailed Review Data

There are 4 independent variables (excluding `id`, 'listing_id'):
<ul>
<li><strong>reviews.csv</strong><ul>
<li><code>'date'</code></li>
<li><code>'reviewer_id'</code></li>
<li><code>'reviewer_name'</code></li>
<li><code>'comments'</code></li>

In [7]:
PATH_REVIEW = '/Users/genie/PycharmProjects/PROJECT/AirbnbWise/Oshimaland_data/reviews.csv'
reviews = pd.read_csv(PATH_REVIEW)
display(stylize_simple(reviews.head(4), 'The reviews dataset 4 top rows (hover to magnify).'))

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,197677,554285,2011-09-21,1002142,Pablo,"Couldn’t get any better! The apartment itself is great; it has everything you could need. Besides that the neighborhood itself is very friendly fill with family life energy. Really close to a supermarket, some convenience stores and train stations, you will feel like home from the first moment and if you forget to pack something you could easily get it."
1,197677,627651,2011-10-14,1031940,Ana & Ricardo,"The apartment is bigger than it looks in the pictures. Perfect for a couple. Clean, well maintained, safe and with lot of useful information. As Westerners we found the tatami floor to have a strange smell but we later discovered that every tatami room have the same smell :) It is located about 10 min on foot from subway station (JR pass not valid on this line) and within walking distance from supermarket, shops and restaurants. Overall we had a great experience and we'd recommend it to anybody traveling to Tokyo."
2,197677,733040,2011-11-21,1097040,Samuel,"The appartement is perfect for a couple! It is a bit small but the really furniture which is really complete makes up for it, everything you need in tne everyday life is provided! We also enjoyed the neighbourhood a lot, it is a charming place close to the principal cities of tokyo. Thanks for letting us stay, we will visit again!"
3,197677,755841,2011-11-30,1183674,Lisa,"We had a terrific stay at Yoshimi and Marek's lovely apartment. Everything was brand new, organized and tidy, and we had all the comforts and conveniences we needed. The ""Survival Kit"" they left for us was packed with information on the home and the neighborhood, and even included emergency numbers and other details that we had not seen at other rentals. The apartment is Japanese in scale, small by our American standards, but everything is designed so well that it was quite comfortable for our family of 4. The neighborhood wasn't particularly picturesque, but it was definitely convenient and interesting. Great value, and a great stay."


**The data** is Neighbourhood list for geo filter. Sourced from city or open source GIS files.

There are 2 independent variables:
<ul>
<li><strong>neighbourhoods.csv</strong><ul>
<li><code>'neighbourhood_group'</code></li>
<li><code>'neighbourhood'</code></li>


In [8]:
PATH_NEIGHBOR = '/Users/genie/PycharmProjects/PROJECT/AirbnbWise/Oshimaland_data/neighbourhoods.csv'

neighbor = pd.read_csv(PATH_NEIGHBOR)
display(stylize_simple(neighbor.head(4), 'neighbor dataset 4 top rows (hover to magnify).'))

Unnamed: 0,neighbourhood_group,neighbourhood
0,,Adachi Ku
1,,Akiruno Shi
2,,Akishima Shi
3,,Aogashima Mura


**The data** is GeoJSON file of neighbourhoods of the city.

In [9]:
PATH_JSON = '/Users/genie/PycharmProjects/PROJECT/AirbnbWise/Oshimaland_data/neighbourhoods.geojson'
with open(PATH_JSON) as f:
    json_f = json.loads(f.read()) 
jsondf = pd.DataFrame(json_f)
display(stylize_simple(jsondf.head(1), 'json dataset 1 top row (hover to magnify).'))

Unnamed: 0,type,features
0,FeatureCollection,"{'type': 'Feature', 'geometry': {'type': 'MultiPolygon', 'coordinates': [[[[139.857803, 35.635799], [139.855301, 35.636868], [139.850296, 35.637665], [139.849899, 35.639267], [139.850494, 35.639999], [139.858398, 35.638134], [139.857895, 35.635864], [139.857803, 35.635799]]], [[[139.889999, 35.750801], [139.890396, 35.750401], [139.899796, 35.743732], [139.900101, 35.7356], [139.897095, 35.726665], [139.901398, 35.721466], [139.912109, 35.711266], [139.916107, 35.707733], [139.918701, 35.6978], [139.905899, 35.683731], [139.889099, 35.680599], [139.885803, 35.6768], [139.885895, 35.671131], [139.886597, 35.669868], [139.886993, 35.653866], [139.882095, 35.644001], [139.875504, 35.640533], [139.872894, 35.638535], [139.871796, 35.639332], [139.863495, 35.638401], [139.849701, 35.642666], [139.845215, 35.642979], [139.845215, 35.65062], [139.845505, 35.656666], [139.846497, 35.669868], [139.848206, 35.679066], [139.846207, 35.682064], [139.847397, 35.695465], [139.840195, 35.701065], [139.835205, 35.703133], [139.835999, 35.707001], [139.839508, 35.710667], [139.833206, 35.714867], [139.836105, 35.717934], [139.839294, 35.717133], [139.842209, 35.7192], [139.851898, 35.711266], [139.8564, 35.714397], [139.860703, 35.714333], [139.864807, 35.713001], [139.867706, 35.712467], [139.867203, 35.716999], [139.868195, 35.723], [139.870605, 35.724667], [139.871109, 35.729065], [139.871796, 35.729797], [139.873703, 35.736866], [139.874802, 35.7374], [139.876495, 35.737064], [139.877594, 35.741135], [139.880005, 35.7402], [139.879807, 35.7416], [139.884903, 35.746601], [139.880493, 35.748867], [139.889999, 35.750801]]], [[[139.867004, 35.633598], [139.866699, 35.633801], [139.861008, 35.636002], [139.862198, 35.637199], [139.863708, 35.637135], [139.867599, 35.634201], [139.867004, 35.633598]]]]}, 'properties': {'neighbourhood': 'Edogawa Ku', 'neighbourhood_group': None}}"


**The data** is Detailed Calendar Data

In [10]:
PATH_CALENDAR = '/Users/genie/PycharmProjects/PROJECT/AirbnbWise/Oshimaland_data/calendar.csv'
calender = pd.read_csv(PATH_CALENDAR)
display(stylize_simple(calender.head(4), 'calendar dataset 4 top rows (hover to magnify).'))

Unnamed: 0,listing_id,date,available,price,adjusted_price,minimum_nights,maximum_nights
0,197677,2023-06-29,f,"$11,000.00","$11,000.00",3.0,1125.0
1,197677,2023-06-30,f,"$11,000.00","$11,000.00",3.0,1125.0
2,197677,2023-07-01,f,"$11,000.00","$11,000.00",3.0,1125.0
3,197677,2023-07-02,f,"$11,000.00","$11,000.00",3.0,1125.0


**The data** is Summary information and metrics for listings in Tokyo (good for visualisations)

There are 19 independent variables:
<ul>
<li><strong>listings.csv</strong><ul>
<li><code>id</code> Airbnb's unique identifier for the listing</li>
<li><code>name</code></li>
<li><code>host_id</code></li>
<li><code>host_name</code></li>
<li><code>neighbourhood_group</code> The neighbourhood group as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles.</li>
<li><code>neighbourhood</code> The neighbourhood as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles.</li>
<li><code>district</code> </li>
<li><code>latitude</code> Uses the World Geodetic System (WGS84) projection for latitude and longitude.</li>
<li><code>longitude</code> Uses the World Geodetic System (WGS84) projection for latitude and longitude.</li>
<li><code>room_type</code> The Airbnb page for the host</li>
<li><code>price</code> daily price in local currency. Note, $ sign may be used despite locale</li>
<li><code>minimum_nights</code> minimum number of night stay for the listing (calendar rules may be different)</li>
<li><code>number_of_reviews</code> The number of reviews the listing has</li>
<li><code>last_review</code > The date of the last/newest review</li>
<li><code>reviews_per_month</code > Description about the host</li>
<li><code>calculated_host_listings_count</code > The number of listings the host has in the current scrape, in the city/region geography.</li>
<li><code>availability_365</code > avaliability_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may be available because it has been booked by a guest or blocked by the host.</li>
<li><code>number_of_reviews_ltm</code > The number of reviews the listing has (in the last 12 months)</li>
<li><code>license</code ></li>

In [11]:
print(f'{blk}[INFO] Shapes:'
      f'{blk}\n listngs.csv --> {red}{df.shape}')

[1m[30m[INFO] Shapes:[1m[30m
 listngs.csv --> [1m[31m(11177, 75)


In [12]:
PATH_DF2 = '/Users/genie/PycharmProjects/PROJECT/AirbnbWise/Oshimaland_data/listings.csv'

df2 = pd.read_csv(PATH_DF2)
display(stylize_simple(df2.head(4), 'listings dataset 4 top rows (hover to magnify).'))

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,district,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
0,197677.0,Rental unit in Sumida · ★4.78 · 1 bedroom · 2 beds · 1 bath,964081,Yoshimi & Marek,,Sumida Ku,___ (Hong Kong Sen),35.71707,139.82608,Entire home/apt,11000,3,173,2023-05-30,1.21,1,24,8,M130003350
1,776070.0,Home in Kita-ku · ★4.98 · 1 bedroom · 1 bed · 1 shared bath,801494,Kei,,Kita Ku,____ (Toki Sen),35.73844,139.76917,Private room,7208,3,243,2023-06-20,1.89,1,67,15,
2,3427384.0,Rental unit in Edogawa · ★4.82 · 1 bedroom · 2 beds · 1.5 baths,13018876,Masakatsu,,Edogawa Ku,___ (Edo Sen),35.68374,139.85971,Entire home/apt,7847,2,100,2023-05-22,0.93,2,231,19,Hotels and Inns Business Act | 東京都江戸川区保健所 | 18江衛環01第42号
3,905944.0,Rental unit in Shibuya · ★4.76 · 2 bedrooms · 4 beds · 1 bath,4847803,Best Stay In Tokyo!,,Shibuya Ku,___ (New York Sen),35.67878,139.67847,Entire home/apt,23066,3,186,2023-06-26,1.49,5,229,1,Hotels and Inns Business Act | 渋谷区保健所長 | 31渋健生収第4972号


**Note**

* There are some missing values in the data, let's explore where and how many

In [13]:
missing = df2.isna().sum().reset_index()
missing.columns = ['columns', 'missing_count']

print(f'{blk}[INFO] Any missing values:'
      f'\n\n{red}{missing}{res}')

[1m[30m[INFO] Any missing values:

[1m[31m                           columns  missing_count
0                               id              0
1                             name              0
2                          host_id              0
3                        host_name              0
4              neighbourhood_group          11177
5                    neighbourhood              0
6                         district              0
7                         latitude              0
8                        longitude              0
9                        room_type              0
10                           price              0
11                  minimum_nights              0
12               number_of_reviews              0
13                     last_review           1252
14               reviews_per_month           1252
15  calculated_host_listings_count              0
16                availability_365              0
17           number_of_reviews_ltm              0
18  

**The data** is tokyo district dataset

In [14]:
PATH_TOKYO_DISTRICT = '/Users/genie/PycharmProjects/PROJECT/AirbnbWise/Oshimaland_data/tokyo_district.xlsx'
tokyo_district = pd.read_excel(PATH_TOKYO_DISTRICT)
tokyo_district = tokyo_district.drop(columns = ['Unnamed: 2'])
display(stylize_simple(tokyo_district.head(10), 'tokyo_district data'))

Unnamed: 0,아다치구,足立区
0,中央本町,츄오혼쵸
1,伊興,이코
2,梅田,우메다
3,本木,모토키
4,小台,오다이
5,江北,고호쿠
6,佐野,사노
7,鹿浜,시카하마
8,新田,신덴
9,千住,센주


## <p style="font-family:JetBrains Mono; font-weight:normal; letter-spacing:2px; color:#b57edc; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #b57edc">EDA</p>

In [15]:
main_district_df = df['neighbourhood'].value_counts().reset_index()
main_district_df.columns = ['neighbourhood', 'count']
display(stylize_simple(main_district_df.head(10), 'main neighbourhood 관련 data'))

Unnamed: 0,neighbourhood,count
0,"Shinjuku City, Tokyo, Japan",623
1,"Taito City, Tokyo, Japan",458
2,"Sumida City, Tokyo, Japan",394
3,"Shinjuku City, Tōkyō-to, Japan",334
4,"Toshima City, Tokyo, Japan",321
5,"Taito City, Tōkyō-to, Japan",310
6,"Shinjuku-ku, Tōkyō-to, Japan",301
7,"Sumida City, Tōkyō-to, Japan",268
8,"Shibuya City, Tokyo, Japan",222
9,"Taitō-ku, Tōkyō-to, Japan",160


In [16]:
main_district_df2 = df['neighbourhood_cleansed'].value_counts().reset_index()
main_district_df2.columns = ['neighbourhood_cleansed', 'count']
display(stylize_simple(main_district_df2.head(10), 'main neighbourhood_cleansed 관련 data'))

Unnamed: 0,neighbourhood_cleansed,count
0,Shinjuku Ku,2278
1,Taito Ku,1597
2,Sumida Ku,1290
3,Toshima Ku,1002
4,Shibuya Ku,660
5,Minato Ku,409
6,Setagaya Ku,394
7,Ota Ku,360
8,Nakano Ku,316
9,Chuo Ku,290


In [17]:
main_district_df3 = df2['district'].value_counts().reset_index()
main_district_df3.columns = ['district', 'count']
display(stylize_simple(main_district_df2, '특별구 구 구청 data'))

Unnamed: 0,neighbourhood_cleansed,count
0,Shinjuku Ku,2278
1,Taito Ku,1597
2,Sumida Ku,1290
3,Toshima Ku,1002
4,Shibuya Ku,660
5,Minato Ku,409
6,Setagaya Ku,394
7,Ota Ku,360
8,Nakano Ku,316
9,Chuo Ku,290
