# Amtrak Reviews Analysis

## Executive Summary

The goal of this analysis is to look at customer reviews of Amtrak's Northeastern Regional service from Washington D.C. to Boston to analyze how customers feel about the service as well as recommend where Amtrak invests its finances to better improve customer experiences! By scraping TripAdvisor, Yelp, and other discussion forums, I aim to look at how the customer experience is on Amtrak and what the company can do better to enhance the experience to bring back the train transport system in America!

## Data Scraping & Dataset Formation

In [1]:
from scrape import add_reviews_to_dict, print_review_count
import pandas as pd
from pprint import pprint

In [2]:
stops = ['2 S Station', 'Boston, MA 02111', 'Yelp users haven’t asked any questions yet about Amtrak.', 'Start your review of Amtrak.']
reviews = {}

In [3]:
# Scrape reviews off Yelp and add valid reviews to list
add_reviews_to_dict(0, 90, step=10, base_url='https://www.yelp.com/biz/amtrak-boston-3?start={}',
                    tag_type='span', class_names='raw__09f24__T4Ezm', stop_list=stops, review_dct=reviews)

In [4]:
# Loop through each page fo Trustpilot and add valid reviews to the list
add_reviews_to_dict(1, 30, base_url="https://www.trustpilot.com/review/www.amtrak.com?page={}",
                              tag_type='p', review_dct=reviews,
                              class_names='typography_body-l__KUYFJ typography_appearance-default__AAY17 typography_color-black__5LYEn',
                              stop_list=stops,
                              attrs={'data-service-review-text-typography': 'true'})

In [5]:
# Loop through each page fo Viewpoints.com and add valid reviews to the list
add_reviews_to_dict(0, 1, base_url="https://www.viewpoints.com/Amtrak-Train-TRavel-reviews",
                              tag_type='p', review_dct=reviews,
                              class_names=['pr-review-faceoff-review', 'pr-comments', 'pr-review-faceoff-review-full'],
                              stop_list=stops)

In [6]:
# Loop through each page fo Reddit.com and add valid reviews to the list
add_reviews_to_dict(0, 1, base_url="https://www.sitejabber.com/reviews/amtrak.com",
                              tag_type='p', review_dct=reviews,
                              stop_list=stops)

In [7]:
# Export data to dataframe
rows = []
# Loop through the dictionary and extract website_name and review
for website_name, review_list in reviews.items():
    for r in review_list:
        rows.append({'Website': website_name, 'Review': r})

# Create the DataFrame
df = pd.DataFrame(rows)

In [10]:
# Store as CSV file
df.to_csv('/Users/srihariraman/PycharmProjects/Amtrak Twitter Analysis/Amtrak-Twitter-Analysis/reviews.csv')

## EDA

In [None]:
df.shape

In [None]:
df.nunique()