# Prediction Modelling for Hotel Ratings from Reviews, Reviewer Nationality and Hotel Location



## Business Problem

#### Goals
- Determine a relationship between reviews and the rating a reviewer gives.
- Reviewer Nationality can affect ratings

#### Other Impacts on Review Ratings
- Stay Duration
- Reviewer Nationality
- Location of the Hotel
- Number of Reviews
- Tags associated with the Trip

#### Why?
- This could allow the development of review apps and websites that could allow pre-filled rating based on a review.
- Allow improvements of Hotels based on reviews


## Risks and Limitations
- Assumption that Reviews are Honest. 
- Hotel is treated as constant.
- Low Score is solely due factors discussed in written reivew.
- Each review is treated as seperate.
- External Factors are ignored

### Data Access
Data was obtained from [Kaggle](https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe)

## The Review Data

#### Acknowledgements
The data was scraped from Booking.com by [Jason Liu](https://www.kaggle.com/jiashenliu)

#### Data Context
- 515,000 customer reviews
- 1493 luxury hotels within Europe

#### Data Content
- **Hotel_Address**: Address of hotel.
- **Review_Date:** Date when reviewer posted the corresponding review.
- **Average_Score:** Average Score of the hotel, calculated based on the latest comment in the last year.
- **Hotel_Name:** Name of Hotel
- **Reviewer_Nationality:** Nationality of Reviewer
- **Negative_Review:** Negative Review the reviewer gave to the hotel. If the reviewer does not give the negative review, then it should be: 'No Negative'
- **Review_Total_Negative_Word_Counts:** Total number of words in the negative review.
- **Positive_Review:** Positive Review the reviewer gave to the hotel. If the reviewer does not give the negative review, then it should be: 'No Positive'

- **Review_Total_Positive_Word_Counts:** Total number of words in the positive review.
- **Reviewer_Score:** Score the reviewer has given to the hotel, based on his/her experience
- **Total_Number_of_Reviews_Reviewer_Has_Given:** Number of Reviews the reviewers has given in the past.
- **Total_Number_of_Reviews:** Total number of valid reviews the hotel has.
- **Tags:** Tags reviewer gave the hotel.
- **days_since_review:** Duration between the review date and scrape date.
- **Additional_Number_of_Scoring:** There are also some guests who just made a scoring on the service rather than a review. This number indicates how many valid scores without review in there.
- **lat:** Latitude of the hotel
- **lng:** longtitude of the hotel

In [5]:
# Script Name: EDA of Hotel Reviews Data
# Author: Rahul Kumar
# Date: 2-Jan-20
# Description: The purpose is to clean up data in preperation for Model running

import pandas as pd
import numpy as np
from math import sqrt
import seaborn as sns
import scipy as sp
from textblob import TextBlob, Word
from nltk.stem.snowball import SnowballStemmer

#this supposedly extracts country from a text line
import pycountry

import matplotlib.pyplot as plt
%matplotlib inline

# This actually sets the pandas display to show all rows and columns 
# when you are showing a dataframe, without skipping the center
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
hotels = pd.read_csv('../Hotel_Reviews.csv')

### Preview of the Data
- Note the Review Score is the label to be predicted

In [7]:
hotels.head(4)

Unnamed: 0,Hotel_Address,Additional_Number_of_Scoring,Review_Date,Average_Score,Hotel_Name,Reviewer_Nationality,Negative_Review,Review_Total_Negative_Word_Counts,Total_Number_of_Reviews,Positive_Review,Review_Total_Positive_Word_Counts,Total_Number_of_Reviews_Reviewer_Has_Given,Reviewer_Score,Tags,days_since_review,lat,lng
0,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,8/3/2017,7.7,Hotel Arena,Russia,I am so angry that i made this post available...,397,1403,Only the park outside of the hotel was beauti...,11,7,2.9,"[' Leisure trip ', ' Couple ', ' Duplex Double...",0 days,52.360576,4.915968
1,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,8/3/2017,7.7,Hotel Arena,Ireland,No Negative,0,1403,No real complaints the hotel was great great ...,105,7,7.5,"[' Leisure trip ', ' Couple ', ' Duplex Double...",0 days,52.360576,4.915968
2,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,7/31/2017,7.7,Hotel Arena,Australia,Rooms are nice but for elderly a bit difficul...,42,1403,Location was good and staff were ok It is cut...,21,9,7.1,"[' Leisure trip ', ' Family with young childre...",3 days,52.360576,4.915968
3,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,7/31/2017,7.7,Hotel Arena,United Kingdom,My room was dirty and I was afraid to walk ba...,210,1403,Great location in nice surroundings the bar a...,26,1,3.8,"[' Leisure trip ', ' Solo traveler ', ' Duplex...",3 days,52.360576,4.915968


## Explatory Data Analysis