# Web Scraping and Sentiment Analysis of 8 MilePi Pizza restaurant Reviews  

In this project I have web scraped the 8 MilePi Pizza reviews on Yelp. It is a website for finding restaurants, homeservices etc. First, I have scraped the reviews using BeautifulSoup. Then for sentiment analysis, I have passed them through the state of the art NLP model BERT. BERT(Bidirectional Encoder Representation from Transformer) is NLP machine learning model pretrained by Google. Here the model is already trained on big data and we are using it to predict the sentiment related to our reviews scraped from the Yelp.com website. This is a good case of "Transfer Learning".

## 1. Install and Import Dependencies

In [2]:
!pip install torch torchvision torchaudio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
!pip install transformers requests beautifulsoup4 pandas numpy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 5.2 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 52.7 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 69.3 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.11.1 tokenizers-0.13.2 transformers-4.25.1


In [4]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import numpy as np

## 2. Instantiate Model

In [5]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

Downloading:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/872k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/669M [00:00<?, ?B/s]

## 3. Encode and Calculate Sentiment

In [6]:
tokens = tokenizer.encode('I loved it, the pizza is very delicious', return_tensors = 'pt')

In [7]:
tokens

tensor([[  101,   151, 46747, 10197,   117, 10103, 59371, 10127, 12495, 27254,
         47838,   102]])

In [8]:
# we don't need this step but we can also decode the string
tokenizer.decode(tokens[0])

'[CLS] i loved it, the pizza is very delicious [SEP]'

In [9]:
result = model(tokens)
result

SequenceClassifierOutput(loss=None, logits=tensor([[-2.6552, -2.2700, -0.3486,  1.8020,  2.7059]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

The above output from the model is a one-hot encoded list of scores. The position with the highest score represents the sentiment rating.

In [10]:
result.logits

tensor([[-2.6552, -2.2700, -0.3486,  1.8020,  2.7059]],
       grad_fn=<AddmmBackward0>)

torch.argmax will return the position of highest value from the tensor. As count starts from 0, I am adding +1 for better understanding

In [11]:
int(torch.argmax(result.logits))+1



5

Now we have the number between 1 to 5. Higher the number the better is the sentiment and vice versa.

In [12]:
# lets try this on one more review
tokens_a = tokenizer.encode('It was the worst thing i have ever had', return_tensors = 'pt')
result_a = model(tokens_a)
int(torch.argmax(result_a.logits))+1

1

## 4. Import Dataset having reviews

In [23]:
r = requests.get('https://www.yelp.com/biz/8milepi-detroit-style-pizza-san-francisco-3')
soup = BeautifulSoup(r.text,'html.parser')
regex = re.compile('.*comment.*')
results = soup.find_all('p',{'class':regex})
reviews = [result.text for result in results]

In [24]:
results

[<p class="comment__09f24__gu0rG css-qgunke"><span class=" raw__09f24__T4Ezm" lang="en">If you love thick pizza, Detroit style pizza from 8MilePi is the way to go! So cheesy and filled with toppings. Smog pizza is my fav. Super convenient to pick up from this cloud kitchen location. They also have yummy wings and a packed chopped salad if you want other options. Their pizzas are huge and are so filling for lunch or dinner.</span></p>,
 <p class="comment__09f24__ZU8MN truncated__09f24__lSBbT css-qgunke"><span class=" css-qgunke"><span class=" raw__09f24__T4Ezm">Hi Farrah, great to hear that you enjoy our SMOG pizza! We appreciate the kind review.</span></span></p>,
 <p class="comment__09f24__gu0rG css-qgunke"><span class=" raw__09f24__T4Ezm" lang="en">First time trying Detroit-style pizza and I'm def a fan of it! These pizzas are thick with a chewy and slightly crisp crust. I was full from just eating two slices! I got the Smog and Fun Guy Forno pizza, both were delicious!<br/><br/>I al

In [25]:
results[0].text

'If you love thick pizza, Detroit style pizza from 8MilePi is the way to go! So cheesy and filled with toppings. Smog pizza is my fav. Super convenient to pick up from this cloud kitchen location. They also have yummy wings and a packed chopped salad if you want other options. Their pizzas are huge and are so filling for lunch or dinner.'

In [26]:
reviews

['If you love thick pizza, Detroit style pizza from 8MilePi is the way to go! So cheesy and filled with toppings. Smog pizza is my fav. Super convenient to pick up from this cloud kitchen location. They also have yummy wings and a packed chopped salad if you want other options. Their pizzas are huge and are so filling for lunch or dinner.',
 'Hi Farrah, great to hear that you enjoy our SMOG pizza! We appreciate the kind review.',
 "First time trying Detroit-style pizza and I'm def a fan of it! These pizzas are thick with a chewy and slightly crisp crust. I was full from just eating two slices! I got the Smog and Fun Guy Forno pizza, both were delicious!I also tried the cheesy bread sticks and bbq wings. Big fan of the cheesy bread sticks, especially the crispy cheese edges.",
 'Hi Kristine, so great to hear that you enjoyed our food! We appreciate the kind review.',
 "In the past year and a half, I've fallen in love with Detroit-style pizza and been on the hunt to try every place SF, w

## 5. Load Reviews into Dataframe and score

In [27]:
df = pd.DataFrame(np.array(reviews),columns = ['review'])

In [28]:
df['review'].iloc[0]

'If you love thick pizza, Detroit style pizza from 8MilePi is the way to go! So cheesy and filled with toppings. Smog pizza is my fav. Super convenient to pick up from this cloud kitchen location. They also have yummy wings and a packed chopped salad if you want other options. Their pizzas are huge and are so filling for lunch or dinner.'

In [30]:
# creating function for the steps we have carried out earlier
def sentiment_score(review):
  tokens = tokenizer.encode(review, return_tensors = 'pt')
  result = model(tokens)
  return int(torch.argmax(result.logits))+1


Above we have created a function that encapsulates the sentiment pipeline which will make it easier to process multiple strings. We will use it for each review in dataframe.

In [31]:
sentiment_score(df['review'].iloc[0])

5

In [32]:
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512]))

In [33]:
df

Unnamed: 0,review,sentiment
0,"If you love thick pizza, Detroit style pizza f...",5
1,"Hi Farrah, great to hear that you enjoy our SM...",4
2,First time trying Detroit-style pizza and I'm ...,5
3,"Hi Kristine, so great to hear that you enjoyed...",5
4,"In the past year and a half, I've fallen in lo...",4
5,3.5 starsOrdered via door dash and some of the...,3
6,The ultimate Detroit style pizzas with that de...,5
7,"Hi Sonam, thank you for sharing this great fee...",5
8,So we tried this again after a not so great fi...,3
9,This was a totally disappointing experience. Q...,1


We can run the same script for other restaurants or businesses just by copying the link from Yelp website and paste it into the 'r' variable in importing dataset section.   
  
Caution:- If the website structure changes in future this can throw an error.

If we want to run the pipeline as script or in an IDE like PyCharm, we need to drop the magic command(!pip) and do so. Just need to remember to install dependencies in our environment.

## I hope you liked this web scraping and sentiment analysis project.

# Thanks