<div dir=ltr>
<h3><center>Exercise 4 – Natural Language Processing Course</center></h3>
<h4><center>Sentiment Analysis Challenge</center></h4>
<table width='100%' style="border: none;">

</table>
<br/>
<hr/>
<br/>


The Jupyter notebook for this exercise was developed and tested in Google Colab. This file has been tested both in the Colab environment and using the Docker image `jupyter/datascience-notebook`, and all code cells produce the expected output. If there are any issues reproducing the output of some cells or running the exercise code, we would appreciate it if you could let us know so that we can run the file in a compatible environment and provide the results.



# **Summary of the Exercise Approach**

In this project, we aim to design and implement an **aspect-based sentiment analysis model** tailored specifically for Persian-language user reviews on a movie-related website. The ultimate goal of this model is to process the input — consisting of a user’s textual review and a predefined list of aspects — and analyze the sentiment expressed toward each individual aspect within the review. For every aspect, the model is expected to assess the relevant portions of the review and classify the corresponding sentiment (e.g., positive, negative, or neutral).

The overall workflow for accomplishing the objectives of this assignment can be broken down into the following key stages:

1. **Preprocessing and Normalization of the Data**  
   This step involves cleaning the raw text data, handling inconsistencies, removing noise, normalizing word forms, and preparing the input in a suitable format for further analysis and model training.

2. **Defining and Training the Model**  
   In this phase, we select an appropriate machine learning or deep learning architecture, design the model's structure, and train it on labeled data using suitable optimization techniques and evaluation metrics.

3. **Model Evaluation**  
   After training, we rigorously evaluate the model’s performance using standard metrics such as accuracy, precision, recall, and F1-score to ensure its effectiveness in aspect-based sentiment classification.

4. **Aspect-Specific Sentence Extraction**  
   This step focuses on identifying and extracting the specific sentences or phrases in the review text that are most relevant to each aspect. This helps in making sentiment classification more precise and interpretable.

5. **Implementation of the Final Sentiment Classification Function**  
   Here, we bring together the components developed in the previous stages to build a comprehensive function that takes a full review and a list of aspects, and returns the predicted sentiment for each aspect.

6. **Final Evaluation of the Classification Function**  
   In the concluding phase, we test the end-to-end pipeline using unseen data to validate the overall system performance, ensuring that it generalizes well and meets the expectations outlined at the beginning of the project.




## Installation and Import of Essential Libraries and Dependencies


In [None]:
%pip install transformers[torch]

try:
    import transformers
except:
    %pip install transformers

try:
    import ipywidgets
except:
    %pip install ipywidgets

try:
    import pandas as pd
except:
    %pip install pandas

try:
    import datasets
except:
    %pip install datasets

try:
    import matplotlib as mpl
except:
    %pip install matplotlib

try:
    import sklearn
except:
    %pip install sklearn

try:
    import hazm
except:
    %pip install hazm

try:
    import accelerate
except:
    %pip install accelerate -U




<font face="'vazirmatn', 'Vazir', 'B Nazanin', 'XB Zar'" size=4><div dir='ltr' align='justify'>

## Initial Configuration of the Notebook and Libraries


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


pd.set_option("display.max_columns", None)
pd.set_option("display.expand_frame_repr", False)
pd.set_option("max_colwidth", None)

from IPython.display import HTML


def set_pandas_font(fonts):
    css = f"""
    <style>
        table.dataframe td, table.dataframe th {{
            font-family: {fonts};
        }}
    </style>
    """
    return HTML(css)

set_pandas_font("'vazirmatn', 'Vazir', 'B Nazanin', 'Arial'")


### Pandas Table Display Configuration

To improve the readability and aesthetics of DataFrame outputs in the notebook, we configured pandas display settings:

1. All columns are set to be visible using `display.max_columns`.
2. Horizontal scrolling is disabled for wide DataFrames using `display.expand_frame_repr = False`.
3. The maximum column width is set to unlimited to prevent content truncation.
4. The `IPython.display.HTML` module is used to inject custom CSS styles.
5. A helper function `set_pandas_font()` is defined to apply a specific Persian-friendly font.
6. The function inserts CSS that customizes the font of pandas tables.
7. Font families include: Vazirmatn, Vazir, B Nazanin, and Arial.
8. This enhances support for Persian scripts in table cells.
9. The function is executed to apply these styles globally.
10. This setup results in cleaner and more legible DataFrame displays for Persian content.

---




# **Loading and Preprocessing the Datasets**  
In this section, the datasets — provided as four files in JSONLines format — are loaded as pandas DataFrames. Then, using the **Hazm** library, the content of the "review" field in each record is normalized.

In [None]:
import json
from hazm import Normalizer
from sklearn.model_selection import train_test_split

normalizer = Normalizer()

def preprocess_text(text):
    return normalizer.normalize(text)

movie_data_df = pd.read_json('./movie.jsonl', lines=True)
movie_train_df = pd.read_json('./movie_train.jsonl', lines=True)
movie_test_df = pd.read_json('./movie_test.jsonl', lines=True)
movie_dev_df = pd.read_json('./movie_dev.jsonl', lines=True)

movie_data_df['review'] = movie_data_df['review'].apply(preprocess_text)
movie_train_df['review'] = movie_train_df['review'].apply(preprocess_text)
movie_test_df['review'] = movie_test_df['review'].apply(preprocess_text)
movie_dev_df['review'] = movie_dev_df['review'].apply(preprocess_text)

In [None]:
movie_data_df

Unnamed: 0,review,sentiment,category,aspects
0,یکی از دوستان اشاره خوبی داشتن چقد موسیقی حماسی و بی‌مورد؟ فقط میتونن بگم این سوژه اگه به گروه و کست بهتری داده میشه نتیجه کار خیلی قابل‌قبول‌تر از این می‌شد,-1,ماهورا,"{'موسیقی': '-1', 'بازی': '-1'}"
1,مشکل اغلب این فیلم‌هایی که قصد انتقاد از مهاجرت و مصائب‌اش را دارند، در این است که بعلت کمبود منابع، امکان ادامه دادن منطقی داستان و سفر کردن به کشور مقصد را چندان نمی‌یابند. کلبموس هیچ کدام از ایده‌هایش را ادامه نمی‌دهد. سردستی و بی‌حوصله، روایت را اندکی جلو برده و ناگهان مسئله‌اش دچار چرخش می‌شود.,-1,کلمبوس,{'داستان': '-1'}
2,ی فیلم خوب و کار درست تو بازار کمدیهای تکراری ی کمدی جدید واقعا جای قدر دانی داره.,2,خرگیوش,{}
3,یه فیلم خوب … که میشه وقت گذاشت و بی هیچ پشیمانی دید و با رضایت از سینما خارج شد تبریک به آقای سیدی عزیز,2,سیزده,{}
4,واقعا فوق‌العاده بود، فقط کسانی که از سر و صدای زیاد بدشون میاد اصلا بهشون توصیه نمیشه,2,کلاس هنرپیشگی,{'صدا': '-1'}
...,...,...,...,...
501,ایده این فیلم از «آن جا» کاهانی گرفته نشده آیا؟؟؟,0,برف روی کاج‌ها,{}
502,فیلمی که ارزش دیدن داره و قطعا سبک مصطفی کیاییه، یک فیلم سرگرم‌کننده، مهیج که در عین حال حرفی برای گفتن داره. گرچه گاهی آدم احساس میکنه تکرار بازیگران فیلم بارکد میتونه تهدیدی برای این فیلمک باشه!,1,چهار راه استانبول,{'بازی': '3'}
503,امشب برای بار دوم بر روی پرده سینما این فیلم رو دیدم، و چه‌بسا بیشتر از بار اول لذت بردم و حدس می‌زنم دفعه سومی هم در کار خواهد بود:) به نظرم متفاوت‌ترین و بی‌تردید یکی از بهترین فیلمهای سینمای ایرانه. فرصت تماشای این فیلم رو بر روی پرده از دست ندین!,2,مسخره‌باز,{}
504,چیز تازه‌ای نمی‌شد تو فیلم دید شاید موضوعی بود که تو بیشتر خانواده‌ها اتفاق میوفته. ولی بازی آقای فخیم زاده خیلی خوب بود:),3,آذر، شهدخت، پرویز و دیگران,{'بازی': '2'}


In [None]:
movie_train_df

Unnamed: 0,review,review_id,example_id,excel_id,question,category,aspect,label,guid
0,بدترین بازی‌ها از بهترین بازیگرا در یکی از بدترین فیلم‌های جشنواره!,1,1,movie_56,نظر شما در مورد صداگذاری و جلوه های صوتی فیلم مردی بدون سایه چیست؟,مردی بدون سایه,صدا,-3,movie-train-r1-e1
1,بدترین بازی‌ها از بهترین بازیگرا در یکی از بدترین فیلم‌های جشنواره!,1,2,movie_56,نظر شما در مورد داستان، فیلمنامه، دیالوگ ها و موضوع فیلم مردی بدون سایه چیست؟,مردی بدون سایه,داستان,-3,movie-train-r1-e2
2,بدترین بازی‌ها از بهترین بازیگرا در یکی از بدترین فیلم‌های جشنواره!,1,3,movie_56,نظر شما در مورد موسیقی فیلم مردی بدون سایه چیست؟,مردی بدون سایه,موسیقی,-3,movie-train-r1-e3
3,بدترین بازی‌ها از بهترین بازیگرا در یکی از بدترین فیلم‌های جشنواره!,1,4,movie_56,نظر شما در مورد فیلمبرداری و تصویربرداری فیلم مردی بدون سایه چیست؟,مردی بدون سایه,فیلمبرداری,-3,movie-train-r1-e4
4,بدترین بازی‌ها از بهترین بازیگرا در یکی از بدترین فیلم‌های جشنواره!,1,5,movie_56,نظر شما در مورد تهیه، تدوین، کارگردانی و ساخت فیلم مردی بدون سایه چیست؟,مردی بدون سایه,کارگردانی,-3,movie-train-r1-e5
...,...,...,...,...,...,...,...,...,...
2867,یه فیلم تجاری دیگه که از گشت ارشاد هم ضعیفتره و فیلم خوب بد جلف با همه شوخی‌های بی‌مزه‌اش یه سر و گردن از این فیلم بالاتره. چرا در ایران توانایی ساخت کمدی خوب وجود ندارد؟ فیلم‌های کمدی خوب ایرانی واقعا انگشت شمارن …,359,4,movie_249,نظر شما در مورد فیلمبرداری و تصویربرداری فیلم گشت ۲ چیست؟,گشت ۲,فیلمبرداری,-3,movie-train-r359-e4
2868,یه فیلم تجاری دیگه که از گشت ارشاد هم ضعیفتره و فیلم خوب بد جلف با همه شوخی‌های بی‌مزه‌اش یه سر و گردن از این فیلم بالاتره. چرا در ایران توانایی ساخت کمدی خوب وجود ندارد؟ فیلم‌های کمدی خوب ایرانی واقعا انگشت شمارن …,359,5,movie_249,نظر شما در مورد تهیه، تدوین، کارگردانی و ساخت فیلم گشت ۲ چیست؟,گشت ۲,کارگردانی,-3,movie-train-r359-e5
2869,یه فیلم تجاری دیگه که از گشت ارشاد هم ضعیفتره و فیلم خوب بد جلف با همه شوخی‌های بی‌مزه‌اش یه سر و گردن از این فیلم بالاتره. چرا در ایران توانایی ساخت کمدی خوب وجود ندارد؟ فیلم‌های کمدی خوب ایرانی واقعا انگشت شمارن …,359,6,movie_249,نظر شما در مورد شخصیت پردازی، بازیگردانی و بازی بازیگران فیلم گشت ۲ چیست؟,گشت ۲,بازی,-3,movie-train-r359-e6
2870,یه فیلم تجاری دیگه که از گشت ارشاد هم ضعیفتره و فیلم خوب بد جلف با همه شوخی‌های بی‌مزه‌اش یه سر و گردن از این فیلم بالاتره. چرا در ایران توانایی ساخت کمدی خوب وجود ندارد؟ فیلم‌های کمدی خوب ایرانی واقعا انگشت شمارن …,359,7,movie_249,نظر شما در مورد گریم، طراحی صحنه و جلوه های ویژه ی بصری فیلم گشت ۲ چیست؟,گشت ۲,صحنه,-3,movie-train-r359-e7


In [None]:
movie_dev_df.describe()

Unnamed: 0,review_id,example_id,label
count,360.0,360.0,360.0
mean,382.0,4.5,-2.152778
std,13.005249,2.294477,1.752513
min,360.0,1.0,-3.0
25%,371.0,2.75,-3.0
50%,382.0,4.5,-3.0
75%,393.0,6.25,-3.0
max,404.0,8.0,3.0


In [None]:
with open('./movie_train.jsonl', 'r') as f:
    for i, line in enumerate(f, 1):
        try:
            json.loads(line)
        except Exception as e:
            print(f"Error in line {i}: {e}")
            break




### 🧹 Data Loading and Text Preprocessing with Hazm

In this section, we prepare the Persian-language movie review dataset for model training and evaluation:

1. **Required Libraries**: We import `json`, `Normalizer` from `hazm`, and `train_test_split` from `sklearn`.
2. **Text Normalization**: A `Normalizer` instance is created to handle Persian text normalization (e.g., unifying characters, removing extra spaces).
3. **Preprocessing Function**: The `preprocess_text` function applies the Hazm normalizer to a given string.
4. **Dataset Loading**: We read four JSONL files into pandas DataFrames, representing the full dataset and its train/dev/test splits.
5. **Normalization Step**: For each DataFrame, we apply `preprocess_text` to the `review` column to clean and normalize the text.
6. **Purpose**: This normalization improves text consistency, which is essential for accurate tokenization and model performance.
7. **Why Hazm?** Hazm is a widely-used Persian NLP toolkit that handles script-specific quirks like spacing, half-spaces, and diacritics.
8. **Persistent Columns**: All DataFrames preserve their structure, with only the `review` text cleaned.
9. **Ready for Tokenization**: The output of this stage is normalized text ready for tokenization and input to BERT.
10. **Next Step**: After normalization, tokenization and modeling can proceed effectively.



# **Model Definition and Training**  
In this section, we use the pre-trained **ParsBERT** transformer model, which is based on BERT, to perform sentiment analysis on the movie review dataset.

In [None]:
!pip install --upgrade transformers torch


Collecting transformers
  Downloading transformers-4.50.0-py3-none-any.whl.metadata (39 kB)
Downloading transformers-4.50.0-py3-none-any.whl (10.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.2/10.2 MB[0m [31m42.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.49.0
    Uninstalling transformers-4.49.0:
      Successfully uninstalled transformers-4.49.0
Successfully installed transformers-4.50.0


In [None]:
!pip install "numpy<2.0"


Collecting numpy<2.0
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/61.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m71.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 2.2.4
    Uninstalling numpy-2.2.4:
      Successfully uninstalled numpy-2.2.4
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
hazm 0.10.0 requires numpy==1.24.3, but you have num

In [None]:
!pip install --upgrade tensorflow hazm gensim numba


Collecting tensorflow
  Using cached tensorflow-2.19.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Collecting tensorboard~=2.19.0 (from tensorflow)
  Using cached tensorboard-2.19.0-py3-none-any.whl.metadata (1.8 kB)
Collecting ml-dtypes<1.0.0,>=0.5.1 (from tensorflow)
  Downloading ml_dtypes-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (21 kB)
INFO: pip is looking at multiple versions of hazm to determine which version is compatible with other requirements. This could take a while.
Collecting hazm
  Using cached hazm-0.10.0-py3-none-any.whl.metadata (11 kB)
  Downloading hazm-0.9.4-py3-none-any.whl.metadata (8.2 kB)
  Downloading hazm-0.9.3-py3-none-any.whl.metadata (7.9 kB)
Downloading tensorflow-2.19.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (644.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m644.9/644.9 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading hazm-0.9.3-py3-

In [None]:
import torch
from transformers import BertForSequenceClassification, BertTokenizer, Trainer, TrainingArguments ,DistilBertTokenizer,DistilBertForSequenceClassification
from datasets import Dataset

tokenizer = BertTokenizer.from_pretrained('distilbert-base-uncased')

def tokenize_function(example):
    combined_text = f"{example['aspect']} : {example['review']}"
    tokens = tokenizer(combined_text, padding='max_length', truncation=True, max_length=128)
    return tokens


tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=7)

training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=10,
    per_device_train_batch_size=30,
    per_device_eval_batch_size=4,
    evaluation_strategy='epoch',
    logging_dir='./logs',
    fp16=False
)
movie_train = Dataset.from_pandas(movie_train_df.copy())
movie_train = movie_train.map(tokenize_function, batched=True,
                              remove_columns=movie_train.column_names)

movie_dev = Dataset.from_pandas(movie_dev_df)
movie_dev = movie_dev.map(tokenize_function, batched=True)

data_collator=lambda data: {'input_ids': torch.stack([torch.tensor(x['input_ids']) for x in data]),
                              'attention_mask': torch.stack([torch.tensor(x['attention_mask']) for x in data]),
                              'labels': torch.tensor([int(x['label']) + 3 for x in data])}

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=movie_train,
    eval_dataset=movie_dev,
    tokenizer=tokenizer,
    data_collator=data_collator
)

trainer.train()


The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DistilBertTokenizer'. 
The class this function is called from is 'BertTokenizer'.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/2872 [00:00<?, ? examples/s]

Map:   0%|          | 0/360 [00:00<?, ? examples/s]

ArrowInvalid: Column 9 named input_ids expected length 360 but got length 128




# **Model Evaluation**  
After training the model, we assess its performance on the test data using the `evaluate` function of the defined trainer. The evaluation results are then reported in the output.

In [None]:
movie_test = Dataset.from_pandas(movie_train_df)
movie_test = movie_train.map(tokenize_function, batched=True)

eval_results = trainer.evaluate(movie_test)
print(eval_results)



Map:   0%|          | 0/2872 [00:00<?, ? examples/s]

{'eval_loss': 1.00209641456604, 'eval_runtime': 21.6786, 'eval_samples_per_second': 132.481, 'eval_steps_per_second': 33.12, 'epoch': 10.0}


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("tahamajs/results")
model = AutoModelForSequenceClassification.from_pretrained("tahamajs/results")

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

# **Extracting Relevant Parts of Each Aspect from the Review Text**
The following function, by taking the name of an aspect, extracts the parts of the text that are related to that aspect and returns the tokens as output.


In [None]:
from hazm import word_tokenize

def extract_aspect_text(review, aspect):
    tokens = word_tokenize(review)
    aspect_tokens = []
    for token in tokens:
        if token in aspect:
            aspect_tokens.append(token)
    return ' '.join(aspect_tokens)


In [None]:
def softmax(logits):
    exp_logits = np.exp(logits - np.max(logits))
    return exp_logits / exp_logits.sum(axis=-1, keepdims=True)



# **Aspect-Based Sentiment Classification**
The following function, as the final function, receives a string as the user's review along with a list of aspects to be analyzed. The output of the function is the sentiment classification for each of the requested aspects.


In [None]:
def classify_sentiment(review, aspects):
    aspect_sentiments = {}
    for aspect in aspects:
        aspect_text = extract_aspect_text(review, aspect)
        # if aspect_text:
        inputs = tokenizer(aspect_text, padding='max_length', truncation=True, return_tensors='pt', max_length=128)
        inputs = {k: v.to(model.device) for k, v in inputs.items()}
        with torch.no_grad():
            outputs = model(**inputs)
        logits = outputs.logits.detach().cpu().numpy()
        print(logits)
        probabilities = softmax(logits)[0]

        sentiment_class = np.argmax(probabilities)
        aspect_sentiments[aspect] = {
            'sentiment': sentiment_class,
            'confidence': probabilities[sentiment_class]
        }
    return aspect_sentiments



---

###  Aspect-Based Sentiment Classification Function

This function performs sentiment classification for each aspect mentioned in a review:

1. **Function Purpose**: `classify_sentiment()` takes a review and a list of aspects, and returns predicted sentiment labels per aspect.
2. **Dictionary Initialization**: An empty dictionary `aspect_sentiments` is used to store results.
3. **Aspect Looping**: For each aspect, we extract its relevant portion from the review using `extract_aspect_text()`.
4. **Text Tokenization**: The extracted aspect-related text is tokenized using the BERT tokenizer with max length and padding.
5. **Device Allocation**: Tokenized inputs are moved to the same device (CPU/GPU) as the model.
6. **Model Inference**: The inputs are passed to the model to obtain raw sentiment predictions (logits).
7. **Tensor Handling**: Logits are moved back to CPU and converted to a NumPy array for processing.
8. **Prediction**: The sentiment class with the highest score (`argmax`) is selected as the final prediction for that aspect.
9. **Result Aggregation**: Each aspect and its predicted sentiment are stored in a dictionary.
10. **Return Value**: The function returns the complete dictionary mapping aspects to their predicted sentiments.



# **Manual Evaluation of Model Output**
In the previous sections, the model was evaluated using test data, and various relevant evaluation metrics were printed as output. In this section, to intuitively demonstrate the model's performance in the form of a report, a sample text along with the desired aspects is provided, and the model's output — which is a dictionary indicating the sentiment polarity for each aspect — is printed.


In [None]:
import numpy as np
review = 'فیلمی که با تمام وجود روحتون رو آزار می‌ده. نمی‌دونم چرا موضوعات ناراحت‌کننده، نشون دادن بدبختی و زجر کشیدن آدمها، جدیدا اینقدر جذاب شده!!! دیدن این فیلم رو اصلا توصیه نمی‌کنم!	.'
aspects = ['بازی', 'داستان', 'صحنه', 'صدا', 'فیلمبرداری', 'موسیقی', 'کارگردانی', 'کلی']
aspect_sentiments = classify_sentiment(review, aspects)
print(aspect_sentiments)



[[-1.2923307  1.9031758 -0.7550306]]
[[-0.9854113  -0.56535506  1.4021182 ]]
[[-0.9854113  -0.56535506  1.4021182 ]]
[[-0.9854113  -0.56535506  1.4021182 ]]
[[-0.60804737  2.2630959  -1.6292586 ]]
[[-0.8014426  1.9625182 -1.2178123]]
[[-0.9854113  -0.56535506  1.4021182 ]]
[[-0.9854113  -0.56535506  1.4021182 ]]
{'بازی': {'sentiment': 1, 'confidence': 0.9000741}, 'داستان': {'sentiment': 2, 'confidence': 0.8119084}, 'صحنه': {'sentiment': 2, 'confidence': 0.8119084}, 'صدا': {'sentiment': 2, 'confidence': 0.8119084}, 'فیلمبرداری': {'sentiment': 1, 'confidence': 0.92847794}, 'موسیقی': {'sentiment': 1, 'confidence': 0.90529406}, 'کارگردانی': {'sentiment': 2, 'confidence': 0.8119084}, 'کلی': {'sentiment': 2, 'confidence': 0.8119084}}


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

model_name = "yangheng/deberta-v3-base-absa-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

for aspect in ['camera', 'phone']:
   print(aspect, classifier('The camera quality of this phone is amazing.',  text_pair=aspect))


Device set to use cuda:0


camera [{'label': 'Positive', 'score': 0.9967294931411743}]
phone [{'label': 'Neutral', 'score': 0.9472787380218506}]
