![NYPLogo.png](attachment:NYPLogo.png)

# Practical 8b: Aspect Sentiment Classification


## Objectives

- Construct aspect-based sentiment classification models and evaluate using appropriate performance metrics.


## Aspect-Based Sentiment Analysis
**Aspect-Based Sentiment Analysis (ABSA)**, also known as fine-grained opinion mining, is the task of determining the sentiment of a text with respect to a specific aspect. 

Traditional sentiment analysis methods treat a text as a whole and assign it a single sentiment label (e.g., positive, negative, or neutral). This is adequate for many tasks, but there are also many situations where it would be useful to know the sentiment of a text with respect to a specific aspect. For example, consider a review of a restaurant. The reviewer might give the restaurant a positive overall rating, but mention that the service was poor. In this case, it would be useful to know that the sentiment towards the aspect “service” is negative, even though the overall sentiment is positive.

ABSA is a difficult task because it requires the identification of “aspects” in text, as well as the assignment of sentiment labels to those aspects. There are a number of ways to approach ABSA, but one common approach is to first identify aspects in text, and then use an ABSA model to label the sentiment of each aspect.

**Aspect Identification** is a task that can be approached in a number of ways, but one common approach is to use a rule-based method, like using a dictionary. For example, every time we find the words “iPhone X” or “MacBook Pro” we may consider them as aspects.

Once aspects have been identified, we need to train an ABSA classifier to classify sentiment of an aspect relative to a sentence as context. There is a number of different approaches to building ABSA classifiers, but one common approach is to use a supervised machine learning method. In this case, we’d need a training dataset, i.e. a collection of texts that have been labeled with aspects and their sentiment. For example:
* The sentence “We had a great experience at the restaurant, food was delicious, but the service was kinda bad”, with aspect “service” and label “negative”.
* The sentence “We had a great experience at the restaurant, food was delicious, but the service was kinda bad”, with aspect “food” and label “positive”.

In the first exercise, we will be using spacy, a natural language processing library in Python along with Textblob which offers simple tools for sentiment analysis and text processing. 

You will need the en_core_web_sm model under spacy, if you haven't already, you need download by executing the command below:
> python -m spacy download en_core_web_sm


## Import libraries and download the packages

```Python
import spacy

nlp = spacy.load('en_core_web_sm')
```

In [None]:
# Enter code here


Let's define a few simple test sentences.
```Python
sentences = [
  'The food we had yesterday was delicious',
  'My time in Italy was very enjoyable',
  'I found the meal to be tasty',
  'The internet was slow.',
  'Our experience was suboptimal'
]
```

In [None]:
# Enter code here


First, we will split our sentences in a way so that we have the aspects (e.g. food) and their sentiment descriptions (e.g. delicious).
```Python
for sentence in sentences:
    doc = nlp(sentence)
    for token in doc:
        print(token.text, token.dep_, token.head.text, token.head.pos_,
              token.pos_,[child for child in token.children])
    print("\n")
```

In [None]:
# Enter code here


From the spacy's dependency parsing, we can see for each token inside our sentence the POS (Part-of-Speech) tags. Noted the child token, we are able to pick up the intensifiers such as "very", "quite", etc. 

*Note: This is a simplistic algorithm hence it may not be able to pick up semantically important information such as "not" in "not delicious".*

Next, let's see how to pick up the sentiment descriptions.
```Python
for sentence in sentences:
    doc = nlp(sentence)
    descriptive_term = ''
    
    for token in doc:
        if token.pos_ == 'ADJ':
            descriptive_term = token
    
    print(sentence)
    print(descriptive_term)
```

In [None]:
# Enter code here


You can see that the algorithm picks up all the descriptive adjectives such as delicious, enjoyable, and tasty. But the intensifiers like "very" are missing.

```Python
for sentence in sentences:
    doc = nlp(sentence)
    descriptive_term = ''
    
    for token in doc:
        if token.pos_ == 'ADJ':
            prepend = ''
            for child in token.children:
                if child.pos_ != 'ADV':
                    continue
                prepend += child.text + ' '
            descriptive_term = prepend + token.text
    
    print(sentence)
    print(descriptive_term)
```

In [None]:
# Enter code here


In a regular scenario, we will need to catch negations such as "not" as well. In this practical we didn't cover, however using the concept above, you can modify the codes to handle the negation. 

Now we are ready to identify the targets that are being described.
```Python
from pprint import pprint

aspects = []
for sentence in sentences:
    doc = nlp(sentence)
    descriptive_term = ''
    target = ''
    
    for token in doc:
        if token.dep_ == 'nsubj' and token.pos_ == 'NOUN':
            target = token.text
        if token.pos_ == 'ADJ':
            prepend = ''
            for child in token.children:
                if child.pos_ != 'ADV':
                    continue
                prepend += child.text + ' '
            descriptive_term = prepend + token.text
    
    aspects.append({'aspect': target,
                    'description': descriptive_term})

pprint(aspects)
```

In [None]:
# Enter code here


Now that we successfully extracted the aspects and descriptions, it’s time to classify them as positive or negative. The goal here is to help the computer understand that tasty food is positive, while slow internet is negative. Computers don’t understand English, so we will need to try a few things before we have a working solution.

TextBlob is a library that offers sentiment analysis out of the box. It has a bag-of-words approach, meaning that it has a list of words such as “good”, “bad”, and “great” that have a sentiment score attached to them. It is also able to pick up modifiers (such as “not”) and intensifiers (such as “very”) that affect the sentiment score.

In this practical, we will be using the default sentiment analysis in the TextBlob package. To install the TextBlob package, execute the following command:
> pip install textblob

```Python
from textblob import TextBlob

for aspect in aspects:
  aspect['sentiment'] = TextBlob(aspect['description']).sentiment

pprint(aspects)
```

In [None]:
# Enter code here


Looking at the result, we can see that it works pretty well, except that the word "tasty" and "suboptimal" are considered neutral. It seems that they are not part of TextBlob's dictionary and as such, they are not picked up. 

Another potential issue is that some descriptive terms or adjectives can be positive in some cases and negative in others, depending on the word they’re describing. The default algorithm used by TextBlob is not able to know that cold weather can be neutral, cold food can be negative while a cold drink can be positive.

TextBlob allow us to create a quick and initial analysis. To handle more complex and domain specify cases, you will need to train your own sentiment classifier.


## Public Pre-Trained Model
We can see from the above exercise that build a ABSA is time consuming. There are public pre-trained models that we can use for example **DeBERTa**. This aspect-based sentiment analysis model is trained with 30k+ ABSA sample datasets. 
https://huggingface.co/yangheng/deberta-v3-base-absa-v1.1

First, let's install the transformers library along with the SentencePiece tokenizer (which is needed by some models of the library, such as DeBERTa).
> pip install transformers[sentencepiece]

You will need the PyTorch and Tensorflow library to be installed as well.
> pip install torch

> pip install tensorflow --user

**Note you will need to restart your kernel after the installation before running the code below to load the two different models**

Next, we will import the necessary libraries and load two different models:
* The **absa_model** and **absa_tokenizer** to test the pre-trained ABSA model.
* The **sentiment_model** to test a standard sentiment model. We’ll try the twitter-xlm-roberta-base-sentiment model, trained on ~198M tweets and finetuned for sentiment analysis.

```Python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
from transformers import pipeline

# Load Aspect-Based Sentiment Analysis model
absa_tokenizer = AutoTokenizer.from_pretrained("yangheng/deberta-v3-base-absa-v1.1")
absa_model = AutoModelForSequenceClassification \
  .from_pretrained("yangheng/deberta-v3-base-absa-v1.1")

# Load a traditional Sentiment Analysis model
sentiment_model_path = "cardiffnlp/twitter-xlm-roberta-base-sentiment"
sentiment_model = pipeline("sentiment-analysis", model=sentiment_model_path,
                          tokenizer=sentiment_model_path)
```

In [None]:
# Enter code here


Given this sentence: **"We had a great experience at the restaurant, food was delicious, but the service was kinda bad"**

Let's compute the sentiment toward the aspect **"food"**. 
```Python
sentence = "We had a great experience at the restaurant, food was delicious, but the service was kinda bad"
print(f"Sentence: {sentence}")
print()

# ABSA of "food"
aspect = "food"
inputs = absa_tokenizer(f"[CLS] {sentence} [SEP] {aspect} [SEP]", return_tensors="pt")
outputs = absa_model(**inputs)
probs = F.softmax(outputs.logits, dim=1)
probs = probs.detach().numpy()[0]

print(f"Sentiment of aspect '{aspect}' is:")
for prob, label in zip(probs, ["negative", "neutral", "positive"]):
    print(f"Label {label}: {prob}")
print()
```

In [None]:
# Enter code here


For the food aspect, it results as **"positive"** with score ~0.997.

Next, let's compute the sentiment toward the aspect **"service"**.

In [None]:
# Enter code here


For the service aspect, it results as **"negative"** with score ~0.995.

Next, let's compute the overall sentiment of the sentence. 
```Python
# Overall sentiment of the sentence
sentiment = sentiment_model([sentence])[0]
print(f"Overall sentiment: {sentiment['label']} with score {sentiment['score']}")
```

In [None]:
# Enter code here


We can see that despite the overall sentiment of the sentence is negative, the ABSA model is able to correctly assign a positive sentiment to "food" and a negative sentiment to "service".