# Installing and Importing libraries

In [2]:
!pip3 install torch torchvision torchaudio



In [2]:
!pip install transformers requests beautifulsoup4 pandas numpy

Collecting transformers
  Obtaining dependency information for transformers from https://files.pythonhosted.org/packages/c1/bd/f64d67df4d3b05a460f281defe830ffab6d7940b7ca98ec085e94e024781/transformers-4.34.1-py3-none-any.whl.metadata
  Downloading transformers-4.34.1-py3-none-any.whl.metadata (121 kB)
     ---------------------------------------- 0.0/121.5 kB ? eta -:--:--
     ---------------------------------------- 0.0/121.5 kB ? eta -:--:--
     --- ------------------------------------ 10.2/121.5 kB ? eta -:--:--
     --- ------------------------------------ 10.2/121.5 kB ? eta -:--:--
     --- ------------------------------------ 10.2/121.5 kB ? eta -:--:--
     ------------ ------------------------ 41.0/121.5 kB 217.9 kB/s eta 0:00:01
     ------------------------ ------------ 81.9/121.5 kB 353.1 kB/s eta 0:00:01
     ------------------------ ------------ 81.9/121.5 kB 353.1 kB/s eta 0:00:01
     ------------------------------------ 121.5/121.5 kB 374.8 kB/s eta 0:00:00
Collectin

### Transformers is going to leverage us to download and use BERT NLP model
### Requests will allow us to request Yelp site for scraping
### Beautifulsoup4 will allow us scrap the data from the site and will be used to extract the data that we need.

In [3]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

  from .autonotebook import tqdm as notebook_tqdm


### autotokenizer will allow us to pass trough a string and convert that in to a sequence of numbers which we we can pass to our NLP model. Auto model for sequence classification is going to give us the architecture to form transformers to be able to load in to nlp model.

In [4]:
import torch
import requests
from bs4 import BeautifulSoup
import re

# Instantiate the Model

In [5]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')
model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

# Encode and Calculate Sentiment

In [6]:
tokens = tokenizer.encode('Fine! It was ok', return_tensors='pt')

In [7]:
tokens

tensor([[  101, 12922,   106, 10197, 10140, 13563,   102]])

In [8]:
#now we can also decode this token
tokenizer.decode(tokens[0])

'[CLS] fine! it was ok [SEP]'

In [9]:
result = model(tokens)

In [10]:
result

SequenceClassifierOutput(loss=None, logits=tensor([[-2.6455, -1.2697,  1.6994,  1.7555,  0.2988]],
       grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [11]:
result.logits

tensor([[-2.6455, -1.2697,  1.6994,  1.7555,  0.2988]],
       grad_fn=<AddmmBackward0>)

In [12]:
int(torch.argmax(result.logits))+1

4

# Review collection from YELP

In [39]:
r = requests.get('https://www.yelp.com/biz/tommaso-ristorante-italiano-san-francisco-2')
soup = BeautifulSoup(r.text, 'html.parser')
regex = re.compile('.*comment.*')
results = soup.find_all('p', {'class':regex})
reviews = [result.text for result in results]

In [40]:
reviews[0]

"OMFG. FREAKING TO DIE FOR. If you like real Italian food, come. I got the lasagna and linguini with clams. Portions were perfect, not to the point you'd get food coma but still leave you more than satisfied. The educated guess glass of wine was $10 and very well worth. Well known for lasagna ($16.50). A taste of heaven, perfectly cheesy and the sauce isn't injected in the lasagna but poured on top, a delicious meat tomato sauce.The linguini, OMFG. I got the white sauce and it was seriously. So light, not the super creamy sauce that you get sick of halfway through. Definitely a hidden gem between hustler clubs and other strip clubs. It's a hole in the wall joint, very homey and small. Make reservations if you don't want to wait."

# Loding reviews in to dataframe

In [19]:
import numpy as np
import pandas as pd

In [20]:
df = pd.DataFrame(np.array(reviews), columns=['review'])

In [29]:
df['review'].iloc[2]

'Ambiance is historic given location and Italian heritage of the place.Pizza brick oven is out of this world, we loved it.The rest, the pasta is well made and tasteful.  The chicken parmigiana was good as well as the eggplant dish.Price to portions ratio is adequate.Finally service was fast and effective.'

In [26]:
def score_cal(review):
    tokens = tokenizer.encode(review, return_tensors='pt')
    result = model(tokens)
    return int(torch.argmax(result.logits))+1

In [30]:
score_cal(df['review'].iloc[2])

4

In [31]:
df['sentiment'] = df['review'].apply(lambda x: score_cal(x[:512]))
# 512 represents the first 512 number of tokens to be passed in the model 
#Because the NLP pipeline can accomodate 512 tensors at a time

In [32]:
df

Unnamed: 0,review,sentiment
0,OMFG. FREAKING TO DIE FOR. If you like real It...,1
1,OMFG. FREAKING TO DIE FOR. If you like real It...,1
2,Ambiance is historic given location and Italia...,4
3,This place is definitely a legacy restaurant (...,3
4,"Thank your for your review, Aubany. I do apolo...",4
5,There's so many things to love about North Bea...,5
6,"Daniel, thank you and your coworkers for payin...",5
7,"Not quite four stars, but slightly closer to f...",3
8,In a foodie city flush with 4 and 5 star resta...,4
9,"What a beautifully written review, Frances. It...",5


In [33]:
df['review'].iloc[3]

"This place is definitely a legacy restaurant (not sure if it officially is but it should be)- a historical destination. ***Come for the nostalgia and the pizza. This time we didn't have any pizza and opted for the plates. Not the greatest decision on our part.The ambiance is the fun part as it's still the original paintings from the 1930s. Service seems like they are just there to do what they've always done and they don't like their jobs. It felt like diner service instead of quality restaurant service.The plating and quality of the food could definitely be upped. We had calamari, Veal Rollettini, and Chicken Parmesan. The sauce that was served with the calamari was the same sauce on the chicken parm and pasta. It needed help. The calamari was edible, but nothing exciting. The Veal Rollettini was the favorite, although the plate looked sad. Three rolls on a large plate sitting atop just grease. The chicken parm was edible, but I never want it again. I was super disappointed in this m

In [34]:
df['review'].iloc[11]

'In a neighborhood full of Italian this was pretty mediocre - not terrible. Service was meh/fine. We were starving so we ate and got outta there.'