# Obtain Data

The Main data source was scraped from TripAdvisor, a popular travel review website, using Scrapy. I decided on scraping all hotel/resort reviews from Punta Cana, a Caribbean vacation destination that is rising in popularity. 

The Scrapy spider crawled and scraped all the data into a JSON format, although the framework allows for item pipelining into a MongoDB database.

Please see /src/tripdadvisor_reviews for the Scrapy source code.

# Scrub Data

This project is an NLP project, and therefore scrubbing the data takes on a different path than a supervised learning project. 

After importing the data, I have to take these steps:

1. Clean the data
2. Tokenize the data
3. Vectorize the data

In [3]:
# Imports
import pandas as pd
import numpy as np
import json

In [38]:
# Decorator Functions
from functools import wraps

def my_logger(orig_func):
    import logging
    logging.basicConfig(filename='{}.log'.format(orig_func.__name__), level=logging.INFO)

    @wraps(orig_func)
    def wrapper(*args, **kwargs):
        logging.info(
            'Ran with args: {}, and kwargs: {}'.format(args, kwargs))
        return orig_func(*args, **kwargs)

    return wrapper


def my_timer(orig_func):
    import time

    @wraps(orig_func)
    def wrapper(*args, **kwargs):
        t1 = time.time()
        result = orig_func(*args, **kwargs)
        t2 = time.time() - t1
        print('{} ran in: {} sec'.format(orig_func.__name__, t2))
        return result

    return wrapper

def remove_list(df):
    columns = list(df.columns)
    for feature in columns:
        df[feature] = df[feature].apply(lambda x: ','.join([i for i in x]))
    return df

def rearrange(df):
    return df[["hotel","title","content"]]

In [32]:
reviews = pd.read_json("../data/raw/all_reviews.json")

In [33]:
reviews

Unnamed: 0,content,hotel,title
0,"[Upon checking in, fruit juice was provided. C...",[Iberostar Punta Cana],"[Great service, very average food at buffets]"
1,[I was very impressed with the resort. The oc...,[Iberostar Punta Cana],[Birthday Trip]
2,"[Nice hotel, great beach location, nice garden...",[Iberostar Punta Cana],[Nice Hotel with best sports ever!]
3,"[Here is an honest, real review., Checkin: 1/5...",[Iberostar Punta Cana],"[A real, honest review]"
4,[Our party of 11 was well taken care by Julio ...,[Hotel Riu Palace Punta Cana],"[Relaxing, fun and no worries!]"
5,[Firstly rooms don’t have air conditioning wit...,[Iberostar Punta Cana],[Ok stay. Great time had but lots of issues....]
6,[Samuel Moron was one of the BEST wait staff p...,[BlueBay Grand Punta Cana],"[Service with a smile. ""It's my pleasure"". Wha..."
7,[The resort was immaculate! The food was over ...,[Hotel Riu Palace Punta Cana],[Our family vacation]
8,[We are currently staying at the Iberostar Pun...,[Iberostar Punta Cana],[Horrible]
9,[the french and brasilian restaurant were amaz...,[BlueBay Grand Punta Cana],[amazing]


## Clean the text

This requires some preprocessing to do.

In [34]:
reviews = remove_list(reviews)

In [39]:
reviews = rearrange(reviews)

In [40]:
reviews

Unnamed: 0,hotel,title,content
0,Iberostar Punta Cana,"Great service, very average food at buffets","Upon checking in, fruit juice was provided. Ch..."
1,Iberostar Punta Cana,Birthday Trip,I was very impressed with the resort. The oce...
2,Iberostar Punta Cana,Nice Hotel with best sports ever!,"Nice hotel, great beach location, nice garden ..."
3,Iberostar Punta Cana,"A real, honest review","Here is an honest, real review.,Checkin: 1/5. ..."
4,Hotel Riu Palace Punta Cana,"Relaxing, fun and no worries!",Our party of 11 was well taken care by Julio O...
5,Iberostar Punta Cana,Ok stay. Great time had but lots of issues....,Firstly rooms don’t have air conditioning with...
6,BlueBay Grand Punta Cana,"Service with a smile. ""It's my pleasure"". What...",Samuel Moron was one of the BEST wait staff pe...
7,Hotel Riu Palace Punta Cana,Our family vacation,The resort was immaculate! The food was over t...
8,Iberostar Punta Cana,Horrible,We are currently staying at the Iberostar Punt...
9,BlueBay Grand Punta Cana,amazing,the french and brasilian restaurant were amazi...


# Explore Data

## CountVectorizer

## TD-IDF

## Reduce Dimensionality

# Modeling Data

## Word2Vec

# Interpret Data