## Compare superhost vs regular_host

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
pd.set_option("display.max.columns", None)
pd.set_option('min_rows', 30)
# pd.set_option('display.precision',4)

### 1. 1 Load the data

In [3]:
df_boston = pd.read_csv('results\df_listings_boston.csv', sep=',', index_col='id')
df_seattle = pd.read_csv('results\df_listings_seattle.csv', sep=',', index_col='id')
print("Boston listing data shape:  {}".format(df_boston.shape))
print("Seattle listing data shape: {}".format(df_seattle.shape))

Boston listing data shape:  (3560, 127)
Seattle listing data shape: (3798, 187)


In [39]:
# Counts of regular_host vs superhost in Boston
df_boston.groupby('host_is_superhost_t').size().reset_index(name='counts')

Unnamed: 0,host_is_superhost_t,counts
0,0,3157
1,1,403


In [24]:
# Counts of regular host vs super_host in Seattle
df_seattle.groupby('host_is_superhost_t').size().reset_index(name='counts')

Unnamed: 0,host_is_superhost_t,counts
0,0,3024
1,1,774


### 1.2 Compare superhost vs regular_host in Boston
Boston has 403 superhosts and 3157 regular hosts, while Seattle has 774 superhosts and 3024 regular hosts. Seattle has a higher proportion of superhosts than Boston.
We will compare the mean of each data column between superhosts and regular hosts, and find out those that have a percentage difference higher than 30%, ignoring all neighbourhood columns.

In [42]:
# Boston regular_host vs Super_host
df_Boston_groupby_host = df_boston.groupby(['host_is_superhost_t']).mean().T.reset_index()
df_Boston_groupby_host.rename(columns={'index':'column', 0:'regular_host', 1:'superhost'}, inplace=True)

df_Boston_groupby_host['host_diff_%'] = (df_Boston_groupby_host.superhost - df_Boston_groupby_host.regular_host)\
                                        /df_Boston_groupby_host.regular_host*100

df_Boston_groupby_host[(df_Boston_groupby_host['host_diff_%']>30) & 
                       (~df_Boston_groupby_host.column.str.startswith('neighbourhood'))]\
                        .sort_values(by='superhost', ascending=False)

host_is_superhost_t,column,regular_host,superhost,host_diff_%
25,reviews_per_month,1.861839,2.877515,54.552314
61,amenity_Shampoo,0.652835,0.868486,33.033062
45,amenity_Hair Dryer,0.485588,0.665012,36.950044
40,amenity_Fire Extinguisher,0.419385,0.60794,44.959818
11,extra_people,0.36332,0.555831,52.986862
27,amenity_24-Hour Check-in,0.33006,0.491315,48.856227
41,amenity_First Aid Kit,0.272411,0.491315,80.358359
8,security_deposit,0.361419,0.473945,31.134589
121,cancellation_policy_moderate,0.248337,0.334988,34.892325
42,amenity_Free Parking on Premises,0.219512,0.330025,50.344637


### discussion
In Boston, compared to regular hosts, superhosts tend to:
1. have more reviews_per_month.
2. charge extra_people. 
3. offer better services, such as more chances of having Shampoo, Hair Dryer, Fire_Extinguisher, First Aid Kit, 24-Hour Check-in, cancellation_policy_moderate, Safety Card, Breakfast, Free Parking on Premises, Indoor Fireplace
4. be more pet friendly. They tend to have Dogs, Cat(s), Other Pet(s), Pets live on this property.
5. require security_deposit, and guest_profile_picture.
6. have more chances to offer Houses.

### 1.3 Compare superhost vs regular_host in Seattle

In [44]:
df_seattle_groupby_host = df_seattle.groupby(['host_is_superhost_t']).mean().T.reset_index()
df_seattle_groupby_host.rename(columns={'index':'column', 0:'regular_host', 1:'superhost'}, inplace=True)

df_seattle_groupby_host['host_diff_%'] = (df_seattle_groupby_host.superhost - df_seattle_groupby_host.regular_host)\
                                        /df_seattle_groupby_host.regular_host*100

df_seattle_groupby_host[(df_seattle_groupby_host['host_diff_%']>30) & 
                        (~df_seattle_groupby_host.column.str.startswith('neighbourhood'))]\
                        .sort_values(by='superhost', ascending=False)

host_is_superhost_t,column,regular_host,superhost,host_diff_%
25,reviews_per_month,1.865071,2.945044,57.905165
11,extra_people,0.435847,0.569767,30.726612
41,amenity_First Aid Kit,0.409722,0.556848,35.908553
56,amenity_Pets live on this property,0.210979,0.310078,46.970912
58,amenity_Safety Card,0.169643,0.270026,59.173127
44,amenity_Hair Dryer,0.189153,0.254522,34.558465
181,instant_bookable_t,0.13955,0.21447,53.686763
27,amenity_24-Hour Check-in,0.151124,0.200258,32.51234
34,amenity_Dog(s),0.119048,0.186047,56.27907
185,require_guest_phone_verification_t,0.082672,0.157623,90.660465


### discussion
In Seattle, compared to regular hosts, superhosts tend to:
1. have more reviews_per_month.
2. charge extra_people. 
3. offer better services, such as more chances of having Doorman, Safety Card, First Aid Kit, Hair Dryer, instant_bookable, 24-Hour Check-in etc.. 
4. be more pet friendly. They tend to have Dogs, Cat(s), Pets live on this property, 
5. require guest_profile_picture and guest_phone_verification.
6. have more chances to offer exotic property types, such as Yurt, Cabin, Treehouse, Camper/RV, Loft etc.. Keep in mind though, the overall number of superhosts offering these kinds of property types are slim though. 