#### CSC 180 Intelligent Systems 

#### William Lorence, Ajaydeep Singh, 

#### California State University, Sacramento


# Project 1: Yelp Business Rating Prediction using Tensorflow

The following block of code sets up the ability to read the Yelp dataset and shows the first 5 rows of both the business dataframe and review dataframe separately. Note that businesses with less than 20 reviews are dropped from the dataframe.

In [17]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
path = "./yelp_dataset/"
save_path = "./models/"

df_business = pd.read_json('./yelp_dataset/yelp_academic_dataset_business.json', lines=True, nrows = 1000000)
df_business = df_business[df_business['review_count'] >= 20]

df_business.head()

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
2,tUFrWirKiKi_TAnsVWINQQ,Target,5255 E Broadway Blvd,Tucson,AZ,85711,32.223236,-110.880452,3.5,22,0,"{'BikeParking': 'True', 'BusinessAcceptsCredit...","Department Stores, Shopping, Fashion, Home & G...","{'Monday': '8:0-22:0', 'Tuesday': '8:0-22:0', ..."
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,1,"{'RestaurantsDelivery': 'False', 'OutdoorSeati...","Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ..."
12,il_Ro8jwPlHresjw9EGmBg,Denny's,8901 US 31 S,Indianapolis,IN,46227,39.637133,-86.127217,2.5,28,1,"{'RestaurantsReservations': 'False', 'Restaura...","American (Traditional), Restaurants, Diners, B...","{'Monday': '6:0-22:0', 'Tuesday': '6:0-22:0', ..."
14,0bPLkL0QhhPO5kt1_EXmNQ,Zio's Italian Market,2575 E Bay Dr,Largo,FL,33771,27.916116,-82.760461,4.5,100,0,"{'OutdoorSeating': 'False', 'RestaurantsGoodFo...","Food, Delis, Italian, Bakeries, Restaurants","{'Monday': '10:0-18:0', 'Tuesday': '10:0-20:0'..."
15,MUTTqe8uqyMdBl186RmNeA,Tuna Bar,205 Race St,Philadelphia,PA,19106,39.953949,-75.143226,4.0,245,1,"{'RestaurantsReservations': 'True', 'Restauran...","Sushi Bars, Restaurants, Japanese","{'Tuesday': '13:30-22:0', 'Wednesday': '13:30-..."


In [18]:
df_review = pd.read_json('./yelp_dataset/yelp_academic_dataset_review.json', lines=True, nrows = 1000000)
df_review.head()

Unnamed: 0,review_id,user_id,business_id,stars,useful,funny,cool,text,date
0,KU_O5udG6zpxOg-VcAEodg,mh_-eMZ6K5RLWhZyISBhwA,XQfwVwDr-v0ZS3_CbbE5Xw,3,0,0,0,"If you decide to eat here, just be aware it is...",2018-07-07 22:09:11
1,BiTunyQ73aT9WBnpR9DZGw,OyoGAe7OKpv6SyGZT5g77Q,7ATYjTIgM3jUlt4UM3IypQ,5,1,0,1,I've taken a lot of spin classes over the year...,2012-01-03 15:28:18
2,saUsX_uimxRlCVr67Z4Jig,8g_iMtfSiwikVnbP2etR0A,YjUWPpI6HXG530lwP-fb2A,3,0,0,0,Family diner. Had the buffet. Eclectic assortm...,2014-02-05 20:30:30
3,AqPFMleE6RsU23_auESxiA,_7bHUi9Uuf5__HHc_Q8guQ,kxX2SOes4o-D3ZQBkiMRfA,5,1,0,1,"Wow! Yummy, different, delicious. Our favo...",2015-01-04 00:01:03
4,Sx8TMOWLNuJBWer-0pcmoA,bcjbaE6dDog4jkNY91ncLQ,e4Vwtrqf-wpJfwesgvdgxQ,4,1,0,1,Cute interior and owner (?) gave us tour of up...,2017-01-14 20:54:15


The next block of code groups all reviews (text) by business_id and concatenates all the reviews for each business into a single string. This means that for each business, you'll get one entry where all its reviews are combined into one text entry.

df_ready_to_be_sent_to_sklearn converts the df_review_agg series into a **DataFrame**


In [22]:
df_review_agg = df_review.groupby('business_id')['text'].sum()
df_ready_to_be_sent_to_sklearn = pd.DataFrame({'business_id': df_review_agg.index, 'all_reviews': df_review_agg.values})

The below merges df_ready_to_be_sent_to_sklearn (which contains the concatenated reviews for each business) with df_business (which contains other information about each business such as name, category, location, etc).

df_review_business.shape = returns the shape of the resulting DataFrame (df_review_business), which tells you the number of rows and columns.

In [23]:
df_review_business = pd.merge(df_ready_to_be_sent_to_sklearn, df_business, on='business_id')
df_review_business.shape


(11927, 15)

In [24]:
df_review_business.head()

Unnamed: 0,business_id,all_reviews,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,--ZVrH2X2QXBFdCilbirsw,This place is sadly perm closed. I was hoping ...,Chris's Sandwich Shop,1531 W Wynnewood Rd,Ardmore,PA,19003.0,39.997299,-75.292207,4.5,32,0,"{'GoodForKids': 'True', 'RestaurantsAttire': '...","American (Traditional), Restaurants, Pizza, Sa...","{'Monday': '11:0-21:0', 'Tuesday': '11:0-21:0'..."
1,--sXnWH9Xm6_NvIjyuA99w,Ich war das erste mal in Philadelphia und ich ...,Philadelphia,,Philadelphia,PA,,39.952584,-75.165222,4.0,29,1,{'GoodForKids': 'True'},"Public Services & Government, Local Flavor",
2,-02xFuruu85XmDn2xiynJw,Dr. Curtis Dechant has an excellent chair-side...,Family Vision Center,7475 E Tanque Verde Rd,Tucson,AZ,85715.0,32.251039,-110.833173,4.5,109,1,"{'ByAppointmentOnly': 'True', 'BusinessParking...","Shopping, Ophthalmologists, Optometrists, Doct...","{'Monday': '0:0-0:0', 'Tuesday': '8:30-17:30',..."
3,-06OYKiIzxsdymBMDAKZug,Had catalytic converters replaced on our Subur...,Washoe Metal Fabricating,905 Bergin Way,Sparks,NV,89431.0,39.525558,-119.739221,4.5,34,1,"{'BusinessAcceptsCreditCards': 'True', 'ByAppo...","RV Dealers, Home Services, Shopping, Tires, Au...","{'Monday': '7:30-17:30', 'Tuesday': '7:30-17:3..."
4,-06ngMH_Ejkm_6HQBYxB7g,I have an old main line that really should be ...,Stewart's De Rooting & Plumbing,415 E Montecito St,Santa Barbara,CA,93101.0,34.419838,-119.688029,4.0,25,1,"{'BusinessAcceptsCreditCards': 'True', 'ByAppo...","Plumbing, Home Services","{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W..."
