This project extracts Airbnb reviews from MongoDB and uses TextBlob to generate a sentiment score from the review comments and plots a scatterplot between the sentiment score and review score. Histograms of sentiment score and review score are also plotted. These plots are created in a dashboard with Dash and additional graphs are created with Google Data Studio. The data is then finally loaded into Google Cloud Storage.

In [1]:
!pip install dnspython

Collecting dnspython
  Downloading dnspython-2.1.0-py3-none-any.whl (241 kB)
[?25l[K     |█▍                              | 10 kB 22.4 MB/s eta 0:00:01[K     |██▊                             | 20 kB 27.3 MB/s eta 0:00:01[K     |████                            | 30 kB 14.0 MB/s eta 0:00:01[K     |█████▍                          | 40 kB 10.0 MB/s eta 0:00:01[K     |██████▊                         | 51 kB 7.2 MB/s eta 0:00:01[K     |████████▏                       | 61 kB 7.4 MB/s eta 0:00:01[K     |█████████▌                      | 71 kB 6.3 MB/s eta 0:00:01[K     |██████████▉                     | 81 kB 7.0 MB/s eta 0:00:01[K     |████████████▏                   | 92 kB 6.5 MB/s eta 0:00:01[K     |█████████████▌                  | 102 kB 6.8 MB/s eta 0:00:01[K     |███████████████                 | 112 kB 6.8 MB/s eta 0:00:01[K     |████████████████▎               | 122 kB 6.8 MB/s eta 0:00:01[K     |█████████████████▋              | 133 kB 6.8 MB/s eta 0:00:0

In [2]:
import dns

In [3]:
# IP address
 
!curl ipecho.net/plain

34.125.254.178

Restart runtime and setup new database user and network connection on Mongodb using above IP address

In [4]:
import pymongo
import pprint
import json
import warnings
import dns
import pandas as pd

In [5]:
warnings.filterwarnings('ignore')

In [6]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated


In [7]:
client = pymongo.MongoClient("mongodb+srv://colab:m8YGRfAn2FFBrG8D@cluster0.oacne.mongodb.net/sample_airbnb?retryWrites=true&w=majority")

In [8]:
db = client['sample_airbnb']
db

Database(MongoClient(host=['cluster0-shard-00-00.oacne.mongodb.net:27017', 'cluster0-shard-00-01.oacne.mongodb.net:27017', 'cluster0-shard-00-02.oacne.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, retrywrites=True, w='majority', authsource='admin', replicaset='atlas-9qbpww-shard-0', ssl=True), 'sample_airbnb')

In [9]:
# Data frame of airbnb. Columns will be name, description, house rules, property type, reviews and review scores
 
df = pd.DataFrame(list(db.listingsAndReviews.find({}, {"name" : 1, "description" : 1, "house_rules" : 1, "property_type" : 1, "review_scores": { "review_scores_rating": 1}, "reviews": { "comments" : 1}, '_id' : 0})))

In [10]:
# Not all units have reviews
df.head(3)

Unnamed: 0,name,description,house_rules,property_type,review_scores,reviews
0,Catete's Colonial Big Hause Room B,"Old floor, intirely reformed, destined to rece...",,House,{'review_scores_rating': 80},[{'comments': 'A Beatriz foi bastante atencios...
1,Ótimo Apto proximo Parque Olimpico,Apartamento próximo ao centro dos Jogos Olímpi...,,Apartment,{},[]
2,Ribeira Charming Duplex,Fantastic duplex apartment with three bedrooms...,Make the house your home...,House,{'review_scores_rating': 89},[{'comments': 'A casa da Ana e do Gonçalo fora...


In [11]:
len(df)

5555

In [12]:
# The second listed unit has no reviews
df['reviews'][1]

[]

In [13]:
# The seventh listed unit has one review

df['reviews'][6]

[{'comments': "Zeynep was a most welcoming and generous host, with a gorgeous, comfortable flat - as advertised! The flat is light and spacious, kitchen well-equipped, bed comfortable (both beds actually), and bathroom clean, with great shower pressure. Zeynep prepared a note with key information about the flat, which was great to have for reference. I especially appreciated the ground coffee and coffee maker. The fact that there was a desk in the house made my stay all the more comfortable - I had a proper place to sit down at my computer.\r\n\r\nIt's clear that Zeynep has put a lot of care into making her flat a home - it's an awesome flat! \r\n\r\nZeynep lives a five min walk to the sea, with a great park along the water front. There are plenty of hip cafes and coffee shops in the neighborhood (Moda), all a short walk from the flat. And it's only a 15 mins walk to the Kadikoy ferry, which offers easy access to the rest of Istanbul."}]

In [14]:
# Return number of different units

df['name'].nunique()

5538

In [15]:
print('Number of different property types:', df['property_type'].nunique())
df['property_type'].unique()

Number of different property types: 36


array(['House', 'Apartment', 'Loft', 'Condominium', 'Serviced apartment',
       'Bed and breakfast', 'Guesthouse', 'Hostel', 'Treehouse',
       'Bungalow', 'Guest suite', 'Townhouse', 'Villa', 'Cabin', 'Other',
       'Farm stay', 'Chalet', 'Boutique hotel', 'Boat', 'Cottage',
       'Earth house', 'Aparthotel', 'Resort', 'Tiny house',
       'Nature lodge', 'Casa particular (Cuba)', 'Hotel', 'Barn', 'Hut',
       'Camper/RV', 'Heritage hotel (India)', 'Pension (South Korea)',
       'Campsite', 'Castle', 'Houseboat', 'Train'], dtype=object)

In [16]:
# Take a look at all reviews for the third unit
df['reviews'][2]

[{'comments': 'A casa da Ana e do Gonçalo foram o local escolhido para a passagem de ano com um grupo de amigos. Fomos super bem recebidos com uma grande simpatia e predisposição a ajudar com qualquer coisa que fosse necessário.\r\nA casa era ainda melhor do que parecia nas fotos, totalmente equipada, com mantas, aquecedor e tudo o que pudessemos precisar.\r\nA localização não podia ser melhor! Não há melhor do que acordar de manhã e ao virar da esquina estar a ribeira do Porto.'},
 {'comments': "We are french's students, we traveled some days in Porto, this space was good and we can cooking easly. It was rainning so we eard every time the water fall to the ground in the street when we sleeping. But It was cool and or was well received by Ana et Gonçalo"},
 {'comments': "We had a spledid time in the old centre of Porto.\r\nThe appartment is very well situated next to the old Ribeira square. It's perfect to have such an appartment to your disposal, you feel home, and have a place to rel

In [17]:
# Create a column for number of reviews per unit

df['rev_len'] = df['reviews'].apply(lambda review: len(review))

In [18]:
# 3 of the first 5 entries have more than one review

df.head()

Unnamed: 0,name,description,house_rules,property_type,review_scores,reviews,rev_len
0,Catete's Colonial Big Hause Room B,"Old floor, intirely reformed, destined to rece...",,House,{'review_scores_rating': 80},[{'comments': 'A Beatriz foi bastante atencios...,1
1,Ótimo Apto proximo Parque Olimpico,Apartamento próximo ao centro dos Jogos Olímpi...,,Apartment,{},[],0
2,Ribeira Charming Duplex,Fantastic duplex apartment with three bedrooms...,Make the house your home...,House,{'review_scores_rating': 89},[{'comments': 'A casa da Ana e do Gonçalo fora...,51
3,New York City - Upper West Side Apt,"Murphy bed, optional second bedroom available....",No smoking is permitted in the apartment. All ...,Apartment,{'review_scores_rating': 94},[{'comments': 'i had a really pleasant stay at...,70
4,Nice room in Barcelona Center,Hi! Cozy double bed room in amazing flat next...,,Apartment,{},[],0


In [19]:
# Create a new data frame containing only units with reviews

df_airbnb = df[df['rev_len']>1]

In [20]:
# New length of the data frame with reviews 

len(df_airbnb)

3459

In [21]:
df_airbnb.head(7)

Unnamed: 0,name,description,house_rules,property_type,review_scores,reviews,rev_len
2,Ribeira Charming Duplex,Fantastic duplex apartment with three bedrooms...,Make the house your home...,House,{'review_scores_rating': 89},[{'comments': 'A casa da Ana e do Gonçalo fora...,51
3,New York City - Upper West Side Apt,"Murphy bed, optional second bedroom available....",No smoking is permitted in the apartment. All ...,Apartment,{'review_scores_rating': 94},[{'comments': 'i had a really pleasant stay at...,70
8,Deluxe Loft Suite,Loft Suite Deluxe @ Henry Norman Hotel Located...,Guest must leave a copy of credit card with fr...,Apartment,{'review_scores_rating': 88},[{'comments': 'I could not have found a better...,5
10,Be Happy in Porto,Be Happy Apartment is an amazing space. Renova...,. No smoking inside the apartment. . Is forbid...,Loft,{'review_scores_rating': 97},[{'comments': 'Fábio has everything you can lo...,178
13,"Soho Cozy, Spacious and Convenient","Clean, fully furnish, Spacious 1 bedroom flat ...",,Apartment,{'review_scores_rating': 100},[{'comments': 'The host canceled this reservat...,3
15,Copacabana Apartment Posto 6,"The Apartment has a living room, toilet, bedro...",Entreguem o imóvel conforme receberam e respei...,Apartment,{'review_scores_rating': 98},"[{'comments': 'Bom, foi uma experiencia incrív...",70
17,Ocean View Waikiki Marina w/prkg,A short distance from Honolulu's billion dolla...,The general welfare and well being of all the ...,Condominium,{'review_scores_rating': 84},[{'comments': 'Our stay was excellent. The pl...,96


In [22]:
# Remove the key review_scores_rating from review_scores column

df_airbnb['review_scores'] = df_airbnb['review_scores'].apply(lambda review_scores_rating: review_scores_rating.get('review_scores_rating'))

In [23]:
df_airbnb.head()

Unnamed: 0,name,description,house_rules,property_type,review_scores,reviews,rev_len
2,Ribeira Charming Duplex,Fantastic duplex apartment with three bedrooms...,Make the house your home...,House,89.0,[{'comments': 'A casa da Ana e do Gonçalo fora...,51
3,New York City - Upper West Side Apt,"Murphy bed, optional second bedroom available....",No smoking is permitted in the apartment. All ...,Apartment,94.0,[{'comments': 'i had a really pleasant stay at...,70
8,Deluxe Loft Suite,Loft Suite Deluxe @ Henry Norman Hotel Located...,Guest must leave a copy of credit card with fr...,Apartment,88.0,[{'comments': 'I could not have found a better...,5
10,Be Happy in Porto,Be Happy Apartment is an amazing space. Renova...,. No smoking inside the apartment. . Is forbid...,Loft,97.0,[{'comments': 'Fábio has everything you can lo...,178
13,"Soho Cozy, Spacious and Convenient","Clean, fully furnish, Spacious 1 bedroom flat ...",,Apartment,100.0,[{'comments': 'The host canceled this reservat...,3


In [24]:
df_airbnb.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3459 entries, 2 to 5552
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   name           3459 non-null   object 
 1   description    3459 non-null   object 
 2   house_rules    3459 non-null   object 
 3   property_type  3459 non-null   object 
 4   review_scores  3451 non-null   float64
 5   reviews        3459 non-null   object 
 6   rev_len        3459 non-null   int64  
dtypes: float64(1), int64(1), object(5)
memory usage: 216.2+ KB


In [25]:
# Convert house rules and reviews to string type

df_airbnb['house_rules'] = df['house_rules'].astype(str)
df_airbnb['reviews'] = df['reviews'].astype(str)

In [26]:
# Create a column for sentiment analysis
# The polarity score is listed first and ranges from -1 to 1, with -1 being the most negative sentiment and 1 being the most positive statement
# The subjectivity score is listed second and ranges from 0 to 1, with a score of 0 implying that the statement is factual, whereas a score of 1 implies a highly subjective statement
from textblob import TextBlob

df_airbnb['sen_analysis'] = df_airbnb['reviews'].apply(lambda reviews : TextBlob(reviews).sentiment)

In [27]:
df_airbnb.head()

Unnamed: 0,name,description,house_rules,property_type,review_scores,reviews,rev_len,sen_analysis
2,Ribeira Charming Duplex,Fantastic duplex apartment with three bedrooms...,Make the house your home...,House,89.0,[{'comments': 'A casa da Ana e do Gonçalo fora...,51,"(0.3490009920634923, 0.5599203296703297)"
3,New York City - Upper West Side Apt,"Murphy bed, optional second bedroom available....",No smoking is permitted in the apartment. All ...,Apartment,94.0,"[{'comments': ""i had a really pleasant stay at...",70,"(0.36191643882433294, 0.5949511063984751)"
8,Deluxe Loft Suite,Loft Suite Deluxe @ Henry Norman Hotel Located...,Guest must leave a copy of credit card with fr...,Apartment,88.0,[{'comments': 'I could not have found a better...,5,"(0.22561573178594457, 0.5468181818181818)"
10,Be Happy in Porto,Be Happy Apartment is an amazing space. Renova...,. No smoking inside the apartment. . Is forbid...,Loft,97.0,[{'comments': 'Fábio has everything you can lo...,178,"(0.38881534146580443, 0.6138009599649817)"
13,"Soho Cozy, Spacious and Convenient","Clean, fully furnish, Spacious 1 bedroom flat ...",,Apartment,100.0,[{'comments': 'The host canceled this reservat...,3,"(0.39999999999999997, 0.45)"


In [28]:
# Converts the sentiment anlaysis from a tuple into a list

df_airbnb['sen_analysis'] = df_airbnb['sen_analysis'].apply(lambda review : list(review))

In [29]:
df_airbnb.head(3)

Unnamed: 0,name,description,house_rules,property_type,review_scores,reviews,rev_len,sen_analysis
2,Ribeira Charming Duplex,Fantastic duplex apartment with three bedrooms...,Make the house your home...,House,89.0,[{'comments': 'A casa da Ana e do Gonçalo fora...,51,"[0.3490009920634923, 0.5599203296703297]"
3,New York City - Upper West Side Apt,"Murphy bed, optional second bedroom available....",No smoking is permitted in the apartment. All ...,Apartment,94.0,"[{'comments': ""i had a really pleasant stay at...",70,"[0.36191643882433294, 0.5949511063984751]"
8,Deluxe Loft Suite,Loft Suite Deluxe @ Henry Norman Hotel Located...,Guest must leave a copy of credit card with fr...,Apartment,88.0,[{'comments': 'I could not have found a better...,5,"[0.22561573178594457, 0.5468181818181818]"


In [30]:
# Extracts the negative/positive scores from the sentiment analysis and creates a new column called comments_score 

df_airbnb['comments_score'] = df_airbnb['sen_analysis'].apply(lambda review : review[0])

In [32]:
df_airbnb.head()

Unnamed: 0,name,description,house_rules,property_type,review_scores,reviews,rev_len,sen_analysis,comments_score
2,Ribeira Charming Duplex,Fantastic duplex apartment with three bedrooms...,Make the house your home...,House,89.0,[{'comments': 'A casa da Ana e do Gonçalo fora...,51,"[0.3490009920634923, 0.5599203296703297]",0.349001
3,New York City - Upper West Side Apt,"Murphy bed, optional second bedroom available....",No smoking is permitted in the apartment. All ...,Apartment,94.0,"[{'comments': ""i had a really pleasant stay at...",70,"[0.36191643882433294, 0.5949511063984751]",0.361916
8,Deluxe Loft Suite,Loft Suite Deluxe @ Henry Norman Hotel Located...,Guest must leave a copy of credit card with fr...,Apartment,88.0,[{'comments': 'I could not have found a better...,5,"[0.22561573178594457, 0.5468181818181818]",0.225616
10,Be Happy in Porto,Be Happy Apartment is an amazing space. Renova...,. No smoking inside the apartment. . Is forbid...,Loft,97.0,[{'comments': 'Fábio has everything you can lo...,178,"[0.38881534146580443, 0.6138009599649817]",0.388815
13,"Soho Cozy, Spacious and Convenient","Clean, fully furnish, Spacious 1 bedroom flat ...",,Apartment,100.0,[{'comments': 'The host canceled this reservat...,3,"[0.39999999999999997, 0.45]",0.4


In [33]:
# Pearson correlation score between reviews and comments

corr1 = df_airbnb['comments_score'].corr(df_airbnb['review_scores'],method='pearson', min_periods=1)
print(corr1)

0.3387435966950198


In [34]:
# Scatterplot between comments scores and reivew scores
 
import plotly.express as px
 
fig = px.scatter(df_airbnb, x="comments_score", y="review_scores")
fig.update_layout(title='Scatterplot between comments scores and reivew scores')
fig.show()

In [35]:
# Histogram of comments_score

fig2 = px.histogram(df_airbnb, x="comments_score")
fig2.update_layout(title='Histogram of comments_score')
fig2.show()

In [36]:
# Histogram of review_scores

fig3 = px.histogram(df_airbnb, x="review_scores")
fig3.update_layout(title='Histogram of review_scores')
fig3.show()

In [None]:
# Dashboard with Dash
 
# Scatterplot between comments scores and reivew scores
# Histogram of comments
# Histogram of review_scores
 
!pip install jupyter-dash
 
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
 
app = JupyterDash(__name__)
 
app.layout = html.Div([
    dcc.Graph(figure=fig),
    dcc.Graph(figure=fig2),
    dcc.Graph(figure=fig3)
])
 
app.run_server(mode='external')

In [None]:
# The following few cells use ngrok to share the dashboard in a url
 
! pip install pyngrok

In [None]:
! ngrok authtoken 1vXlcIP1hk2NmY9UBuuSHU5gLHX_72emhyJmaGcTmoF6TQfKk

Authtoken saved to configuration file: /root/.ngrok2/ngrok.yml


In [None]:
from pyngrok import ngrok
 
# Open a HTTP tunnel on the default port 80
public_url = ngrok.connect(addr = '8050')

In [None]:
# url for sharing the dashboard
 
public_url

<NgrokTunnel: "http://fca1e3715085.ngrok.io" -> "http://localhost:8050">

In [None]:
ngrok.kill()

In [37]:
# Perfect reviews and comments

df_airbnb[(df_airbnb['comments_score']==1) & (df_airbnb['review_scores']==100)]

Unnamed: 0,name,description,house_rules,property_type,review_scores,reviews,rev_len,sen_analysis,comments_score
251,Beautiful in Ipanema for your Family,Just family.,"REGRAS O não cumprimento das regras, listadas...",Apartment,100.0,[{'comments': 'La estadía fue hermosa. Viajamo...,4,"[1.0, 1.0]",1.0
2069,Sydney CBD Unique Loft Apartment with WiFi,You will be comfortable in this unique warehou...,- Quiet hours from 11pm to 7am.,Apartment,100.0,[{'comments': 'Best that you could ever asked ...,3,"[1.0, 0.3]",1.0
2657,Private room at Barra / Quarto privativo na Barra,"Private room located in Barra da Tijuca, only ...",,Condominium,100.0,[{'comments': 'Foi uma excelente estadia! Volt...,2,"[1.0, 0.6666666666666666]",1.0
3049,Cozy Luxury Apartment Downtown MTL,*****REVEW: There are 4 pools and great outdoo...,,Condominium,100.0,"[{'comments': 'Excellent emplacement, hôte trè...",2,"[1.0, 1.0]",1.0


In [38]:
df_airbnb[(df_airbnb['comments_score']==1) & (df_airbnb['review_scores']==100)]['reviews']

251     [{'comments': 'La estadía fue hermosa. Viajamo...
2069    [{'comments': 'Best that you could ever asked ...
2657    [{'comments': 'Foi uma excelente estadia! Volt...
3049    [{'comments': 'Excellent emplacement, hôte trè...
Name: reviews, dtype: object

In [39]:
df_airbnb['reviews'][2657]

"[{'comments': 'Foi uma excelente estadia! Voltarei mais vezes! Paula é um pessoa incrível, alto astral, super prestativa e acolhedora! Me senti em casa. O apartamento é lindo e muito limpo e organizado! Recomendo a estadia!'}, {'comments': 'Paula e sua irmã foram prestativas pra me ajudar em tudo. Paula me mostrou as redondezas, restaurantes, mercado, salão (rssss) tudo em volta. Facilitou muito a minha vida!!! O quarto é muuuito limpinho e aconchegante, o chuveiro é uma delicia!'}]"

In [40]:
# Places with low comments score and high review scores

df_airbnb[(df_airbnb['comments_score'] <-0.2) & (df_airbnb['review_scores']>70)]

Unnamed: 0,name,description,house_rules,property_type,review_scores,reviews,rev_len,sen_analysis,comments_score
2786,Tranquillité en ville près de tous,Joli appartement dans un quartier calme mais d...,,Condominium,100.0,"[{'comments': 'Location extrêmement propre, tr...",3,"[-0.2296875, 0.85]",-0.229687
2886,Ruby Charm Houses 7,Acesso a toda a área comum: -Lavandaria -Barbe...,Silêncio após as 22h00,House,100.0,[{'comments': 'El apartamento está totalmente ...,3,"[-0.4069010416666666, 0.3333333333333333]",-0.406901
4962,2 BedRooms Apartment with Terrace #1,"The apartment is new, with 1 room with twin be...","No parties, not noise... This is a residential...",Apartment,80.0,"[{'comments': ""Merci à Eduardo et son équipe p...",3,"[-0.25, 0.6]",-0.25
5395,Triple studio apartment in Taksim (K5),Welcome to the Hotel Element Taksim... Hotel E...,"Сheck-in at 14:00, Check out at 12:00. Every 3...",Serviced apartment,75.0,[{'comments': 'Отличное место! Очень рекоменду...,4,"[-0.23333333333333325, 0.5472222222222222]",-0.233333


In [43]:
df_airbnb['reviews'][2886]

'[{\'comments\': \'El apartamento está totalmente reformado.\\nTiene aire acondicionado y calefacción.\\nUna gran ducha con hidromasaje.\\n\\nNos dejaron cosas para desayunar, y vino típico de oporto.\\nTiene una zona chill out chulisima.\\n\\nLimpísimo y súper bien ubicado ya que había parada de metro a 300 metros.\\n\\nLo recomiendo 100%.\'}, {\'comments\': "Maria, Miguel et leur fils sont des hôtes (Website hidden by Airbnb) d\'une générosité, d\'une sympathie inimaginable....les appartements quand à eux sont tout simplement parfaits, décorés avec énormément de goût, l\'endroit est encore plus beau que les photos, tout est prévu pour un séjour inoubliable. Le quartier est parfait pour partir visiter Porto à pied mais aussi à 5 minutes des bus, du métro, des commerces.\\n RUBY CHARM HOUSES est l\'adresse à ne surtout pas rater pour réussir son séjour à Porto !!!"}, {\'comments\': "Logement parfait!\\nNous avons passé 8 jours exceptionnels chez Maria & Miguel.\\nIls sont très gentils,

In [45]:
# Places with high comments score and low review scores

df_airbnb[(df_airbnb['comments_score']>0.6) & (df_airbnb['review_scores']<80)]

Unnamed: 0,name,description,house_rules,property_type,review_scores,reviews,rev_len,sen_analysis,comments_score
720,Twinbeds room with shared bathroom,"我的房源靠近適合家庭的活動､市中心､夜生活｡因為舒適的床和溫馨,您一定會愛上我的房源｡我的房...",大廈內設有24小時保安 住客24小時都可辦理入住手續(24 hr Check-in) 禁止...,Hostel,76.0,[{'comments': '가격대비 만족 합니다. 위치가 바로 몽콕 야시장과 가까워...,5,"[0.75, 0.8194444444444444]",0.75
1987,Chammbre lit queen,Chambre dans un 61/2 en collocation. Apparteme...,On peu fumer sur les balcons,Apartment,60.0,[{'comments': 'The host canceled this reservat...,2,"[0.8666666666666667, 1.0]",0.866667
2319,Tranquillità,Apartment is situated in Causeway Bay is conve...,,Apartment,70.0,[{'comments': 'Great location and great value ...,2,"[0.8, 0.75]",0.8
2869,Queen Yataklı Oda & Queen Room,Queen Yataklı Oda & Queen Room,- Gece 01:00 - Sabah 10:00,Hotel,70.0,"[{'comments': '.'}, {'comments': 'Very good lo...",2,"[0.9099999999999999, 0.7800000000000001]",0.91
3247,Lovely Room in the Heart of the Plateau,This charming 2 bedroom apartment is located i...,"- Must be respectful of Kieran, the summer ten...",Apartment,70.0,[{'comments': 'Appartement très bien situé à M...,2,"[1.0, 1.0]",1.0


In [46]:
# Sentiment score 0.86 seems to be high given the reviews here

df_airbnb['reviews'][1987]

"[{'comments': 'The host canceled this reservation 27 days before arrival. This is an automated posting.'}, {'comments': 'Wonderful place, wonderful location, really nice people, but the internet was a problem.  Lots of interference, it seems.'}]"

In [49]:
# Check what each review sentiment score is. Longer reviews have heavier weight. 
print(TextBlob('The host canceled this reservation 27 days before arrival.').sentiment)
print(TextBlob('Wonderful place, wonderful location, really nice people, but the internet was a problem. Lots of interference, it seems.').sentiment)

Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=0.8666666666666667, subjectivity=1.0)


In [50]:
# Creates a column that performs sentiment analysis on house rules
df_airbnb['sen_house_rules'] = df_airbnb['house_rules'].apply(lambda rules : TextBlob(rules).sentiment)

In [51]:
# Converts the sen_house_rules from a tuple into a list

df_airbnb['sen_house_rules'] = df_airbnb['sen_house_rules'].apply(lambda review : list(review))

# Extracts the negative/positive scores from the sen_house_rules

df_airbnb['sen_house_rules'] = df_airbnb['sen_house_rules'].apply(lambda review : review[0])

In [52]:
df_airbnb.head(3)

Unnamed: 0,name,description,house_rules,property_type,review_scores,reviews,rev_len,sen_analysis,comments_score,sen_house_rules
2,Ribeira Charming Duplex,Fantastic duplex apartment with three bedrooms...,Make the house your home...,House,89.0,[{'comments': 'A casa da Ana e do Gonçalo fora...,51,"[0.3490009920634923, 0.5599203296703297]",0.349001,0.0
3,New York City - Upper West Side Apt,"Murphy bed, optional second bedroom available....",No smoking is permitted in the apartment. All ...,Apartment,94.0,"[{'comments': ""i had a really pleasant stay at...",70,"[0.36191643882433294, 0.5949511063984751]",0.361916,0.0
8,Deluxe Loft Suite,Loft Suite Deluxe @ Henry Norman Hotel Located...,Guest must leave a copy of credit card with fr...,Apartment,88.0,[{'comments': 'I could not have found a better...,5,"[0.22561573178594457, 0.5468181818181818]",0.225616,0.0


In [53]:
# No correlation between the comments scores and length of reviews

corr2 = df_airbnb['comments_score'].corr(df_airbnb['rev_len'],method='pearson', min_periods=1)
print(corr2)

-0.004114809810730125


In [54]:
fig4 = px.scatter(df_airbnb, x="comments_score", y="rev_len")
fig4.show()

In [55]:
# Save df to Google drive
 
from google.colab import drive
drive.mount('drive')

df_airbnb.to_csv('airbnb_reviews.csv')
!cp airbnb_reviews.csv "drive/My Drive/datasets/"

Mounted at drive


In [56]:
project_id = 'sixth-flag-316719'

In [57]:
!gcloud config set project {project_id}

import uuid

bucket_name = 'mongodb_airbnb_analysis'

!gsutil mb gs://{bucket_name}

!gsutil cp airbnb_reviews.csv gs://{bucket_name}/

Updated property [core/project].


To take a quick anonymous survey, run:
  $ gcloud survey

Creating gs://mongodb_airbnb_analysis/...
ServiceException: 409 A Cloud Storage bucket named 'mongodb_airbnb_analysis' already exists. Try another name. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.
Copying file://airbnb_reviews.csv [Content-Type=text/csv]...
\
Operation completed over 1 objects/49.8 MiB.                                     


Data Studio link
https://datastudio.google.com/reporting/d1c399f4-08df-49dc-9dff-51ff5d9f851a