In Machine Learning, **sentiment analysis** refers to the application of natural language processing, computational linguistics, and text analysis to identify and classify subjective opinions in source documents.

# Project Obejctive
In this project, I'm going to apply BERT to conduct sentiment analysis on Yelp reviews.

# Workflow
1. Install and import dependencies
2. Instantiate model
3. Encode and calculate sentiment
4. Collect Yelp reviews
5. Load reviews into Dataframe and score

## 1. Install and Import Dependencies

In [4]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

In [7]:
!pip install transformers beautifulsoup4

In [1]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
import numpy as np

## 2. Instantiate Model

In [2]:
# Load model directly

tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment") # from huggingface.co

model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment", num_labels = 5)

## 3. Encode and Calculate Sentiment

In [3]:
tokens = tokenizer.encode('I love it, absolutely the best', return_tensors = 'pt')

In [4]:
tokens

tensor([[  151, 11157, 10197,   117, 35925, 10563, 10103, 11146]])

In [5]:
tokenizer.decode(tokens[0].tolist())

'i love it, absolutely the best'

In [6]:
# perform sentiment analysis
result = model(tokens)

In [7]:
result

(tensor([[-1.9718, -2.4058, -1.2501,  0.6275,  4.3026]],
        grad_fn=<AddmmBackward0>),)

### Understanding results
The output from the model is a one-hot encoded list of scores. 

The position with the highest score represents the sentiment rating. eg, [0.9, 0.2, 0.1, -0.2, -0.5] is a rating of 1

Here we can see that the fifth number is the highest, so the rating for this result is 5.

In [8]:
# print out the position with highest score
int(torch.argmax(result[0])) + 1

5

In [9]:
sen = 'Meh, it was okay'

In [10]:
tokens_sen = tokenizer.encode(sen, return_tensors = 'pt')
result_sen = model(tokens_sen)
print(int(torch.argmax(result_sen[0])) + 1)

3


## 4. Collect Reviews from Yelp

Note:
Scraping with BeautifulSoup allows us to scrape just about anything. Pay attention to the elements we are trying to extract if using a different site.

In [13]:
r = requests.get('https://www.yelp.com/biz/lee-cafe-fort-mill-2')
soup = BeautifulSoup(r.text, 'html.parser')
regex = re.compile('.*comment.*')
results = soup.find_all('p', {'class': regex})
reviews = [result.text for result in results]

In [14]:
r

<Response [200]>

In [19]:
# r.text # get the text out of the page
results[0]

<p class="comment__09f24__D0cxf css-qgunke"><span class="raw__09f24__T4Ezm" lang="en">So far, the most authentic Chinese restaurant we've had in the Fort Mill area. We ordered:<br/>Beef chow fun<br/>Mongolian beef<br/>Pork fried rice<br/>Stir fried pea shoots<br/>Eggrolls <br/><br/>Everything came out blazing hot and freshly cooked. Everything was very good. We look forward to trying more of the menu.</span></p>

In [20]:
results[0].text

"So far, the most authentic Chinese restaurant we've had in the Fort Mill area. We ordered:Beef chow funMongolian beefPork fried riceStir fried pea shootsEggrolls Everything came out blazing hot and freshly cooked. Everything was very good. We look forward to trying more of the menu."

In [21]:
reviews[0]

"So far, the most authentic Chinese restaurant we've had in the Fort Mill area. We ordered:Beef chow funMongolian beefPork fried riceStir fried pea shootsEggrolls Everything came out blazing hot and freshly cooked. Everything was very good. We look forward to trying more of the menu."

## 5. Load Reviews into DataFrame and Calculate Scores

In [24]:
df = pd.DataFrame(np.array(reviews), columns = ['review'])

In [25]:
df.head()

Unnamed: 0,review
0,"So far, the most authentic Chinese restaurant ..."
1,Tucked away in a plaza in Fort Mill. This plac...
2,Ordered take out on 6/6. I wanted to try somet...
3,"Get the spicy fish and veggies, also don't be ..."
4,One of the best Chinese restaurants in the are...


In [26]:
df.shape

(11, 1)

In [27]:
df['review'].iloc[0]

"So far, the most authentic Chinese restaurant we've had in the Fort Mill area. We ordered:Beef chow funMongolian beefPork fried riceStir fried pea shootsEggrolls Everything came out blazing hot and freshly cooked. Everything was very good. We look forward to trying more of the menu."

In [30]:
# create a function to streamline the sentiment analysis on reviews
def sentiment_score(review):
    tokens = tokenizer.encode(review, return_tensors = 'pt')
    result = model(tokens)
    return int(torch.argmax(result[0])) + 1

In [31]:
sentiment_score(df['review'].iloc[0])

5

In [32]:
sentiment_score(df['review'].iloc[1])

3

In [33]:
df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512])) # limit to 512 tokens

In [34]:
df.head()

Unnamed: 0,review,sentiment
0,"So far, the most authentic Chinese restaurant ...",5
1,Tucked away in a plaza in Fort Mill. This plac...,4
2,Ordered take out on 6/6. I wanted to try somet...,3
3,"Get the spicy fish and veggies, also don't be ...",4
4,One of the best Chinese restaurants in the are...,5


In [35]:
df

Unnamed: 0,review,sentiment
0,"So far, the most authentic Chinese restaurant ...",5
1,Tucked away in a plaza in Fort Mill. This plac...,4
2,Ordered take out on 6/6. I wanted to try somet...,3
3,"Get the spicy fish and veggies, also don't be ...",4
4,One of the best Chinese restaurants in the are...,5
5,Got a late lunch here. The dishes were a hit o...,3
6,This is our go-to for Chinese in Ft. Mill. My ...,5
7,We all came here on a Sunday evening. This pla...,5
8,Wow! What a hidden gem in a strip mall next to...,5
9,If you are looking for good Americanized Chine...,2


In [36]:
df['review'].iloc[9]

'If you are looking for good Americanized Chinese restaurant, this will not be the place for you. \xa0Go to Baoding at Southpark or other Chinese restaurant for your beef and broccoli or favorite fried rice.This is the place you go if you are looking for some authentic Cantonese Chinese dishes that are not served elsewhere such as stir fry snow pea tips or braised pork belly.Service here reminded me so much of typical Chinatown restaurants LOL! , again very authentic so expect curt and indifference. \xa0I wish this place was close to my house so I can get takeout.'