# Summary

This Notebook demonstrates how the BERT model can analyze sentiment using reviews gathered from 'YELP' for a restaurant located in Stuttgart, Germany.

# Contents

1. Imports
2. BERT Model Initialisation
3. Playing with the Model
4. Collect the reviews from yelp
5. Score the reviews

# 1. Imports

In [None]:
import re

import pandas as pd
import requests
import torch
from bs4 import BeautifulSoup
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# 2. Model Initialisation

In [None]:
tokenizer = AutoTokenizer.from_pretrained(
    "nlptown/bert-base-multilingual-uncased-sentiment"
)

model = AutoModelForSequenceClassification.from_pretrained(
    "nlptown/bert-base-multilingual-uncased-sentiment"
)

# 3. Play With the model

In [None]:
tokens = tokenizer.encode(
    "Horrible experience, cockroaches found in the food. But it was a sweet :)",
    return_tensors="pt",
)

In [None]:
# Check how the tokens are generated
tokens

In [None]:
# How to decode the tokens
tokenizer.decode(tokens[0])

In [None]:
# Predict the sentiment of the text
result = model(tokens)
print(result)

In [None]:
# The output from the model is a one hot encoded vector of size 5. Higher the value in a list that is the sentiment of the text. In this case 1- Worst, 5 - Best
result.logits

In [None]:
int(torch.argmax(result.logits)) + 1

# 4. Collect the reviews from Yelp

If we navigate to the restaurant's webpage and perform a right-click on the comments section, then select 'Inspect', we'll notice that the comments begin with 'comment__'. Therefore, it's logical to use this pattern to filter the comments

In [None]:
r = requests.get("https://www.yelp.com/biz/block-house-eberhardstra%C3%9Fe-stuttgart-2")

In [None]:
restaurant_page_html = BeautifulSoup(r.text, "html.parser")

In [None]:
regex = re.compile(".*comment.*")

In [None]:
results = restaurant_page_html.find_all("p", {"class": regex})
reviews = [result.text for result in results]

In [None]:
reviews

# 5. Score the reviews

In [None]:
reviews_df = pd.DataFrame(reviews, columns=["review"])

In [None]:
def sentiment_score(review):
    tokens = tokenizer.encode(review, return_tensors="pt")
    result = model(tokens)
    return int(torch.argmax(result.logits)) + 1

In [None]:
reviews_df["sentiment"] = reviews_df["review"].apply(lambda x: sentiment_score(x[:512]))

In [None]:
reviews_df