# Aspect-Based Sentiment Analysis Pipeline for Comments

This notebook applies aspect-based sentiment analysis to YouTube comments using the `pyabsa` library. The workflow includes loading sentiment-scored comments, extracting aspects and their sentiments in batches, and saving the results for further analysis. The process enables fine-grained sentiment insights at the aspect level for large-scale comment datasets.

### Install Required Package: pyabsa

This cell installs the `pyabsa` package, which is used for aspect-based sentiment analysis in the following steps.

In [None]:
!pip install pyabsa

Collecting pyabsa
  Downloading pyabsa-2.4.2-py3-none-any.whl.metadata (13 kB)
Collecting findfile>=2.0.0 (from pyabsa)
  Downloading findfile-2.1.0.post2-py3-none-any.whl.metadata (607 bytes)
Collecting autocuda>=0.16 (from pyabsa)
  Downloading autocuda-0.16-py3-none-any.whl.metadata (326 bytes)
Collecting metric-visualizer>=0.9.6 (from pyabsa)
  Downloading metric_visualizer-0.9.17-py3-none-any.whl.metadata (14 kB)
Collecting boostaug>=2.3.5 (from pyabsa)
  Downloading boostaug-2.3.5-py3-none-any.whl.metadata (377 bytes)
Collecting seqeval (from pyabsa)
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting update-checker (from pyabsa)
  Downloading update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Collecting pytorch_warmup (from pyabsa)
  Downloading pytorch_warmup-0.2.0-py3-none-any.whl.metadata (11

### Import Libraries for Data Processing

This cell imports the necessary libraries for data manipulation (`pandas`) and progress tracking (`tqdm`).

In [None]:
from tqdm import tqdm
import pandas as pd

### Load Sentiment-Scored Comments

This cell loads the CSV file containing comments with sentiment scores and prepares the data for aspect-based sentiment analysis.

In [None]:
file_path = 'dataset/final_after_spam_eng_relevance_sentiment.csv'
sample = pd.read_csv(file_path)

### Load Aspect-Based Sentiment Model

This cell loads the aspect-based sentiment extraction model from the `pyabsa` library, which will be used to extract aspects and their sentiments from comments.

In [None]:
from pyabsa import ATEPCCheckpointManager

aspect_extractor = ATEPCCheckpointManager.get_aspect_extractor(checkpoint='english',
                                   auto_device=True  # False means load model on CPU
                                   )


  return datetime.utcnow().replace(tzinfo=utc)
  from click.parser import split_arg_string
  from click.parser import split_arg_string


[2025-09-10 06:00:44] (2.4.2) PyABSA(2.4.2): If your code crashes on Colab, please use the GPU runtime. Then run "pip install pyabsa[dev] -U" and restart the kernel.
Or if it does not work, you can use v1.x versions, e.g., pip install pyabsa<2.0 -U




Try to downgrade transformers<=4.29.0.




[2025-09-10 06:00:49] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-09-10 06:00:49] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-09-10 06:00:49] (2.4.2) Downloading checkpoint:english 
[2025-09-10 06:00:49] (2.4.2) Notice: The pretrained model are used for testing, it is recommended to train the model on your own custom datasets


Downloading checkpoint: 579MB [00:02, 251.35MB/s]                         


Find zipped checkpoint: ./checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43.zip, unzipping
Done.
[2025-09-10 06:01:02] (2.4.2) If the auto-downloading failed, please download it via browser: https://huggingface.co/spaces/yangheng/PyABSA/resolve/main/checkpoints/English/ATEPC/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43.zip 
[2025-09-10 06:01:02] (2.4.2) Load aspect extractor from checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43
[2025-09-10 06:01:02] (2.4.2) config: checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43/fast_lcf_atepc.config
[2025-09-10 06:01:02] (2.4.2) state_dict: checkpoints/ATEPC_ENGLISH_CHECKPOINT/fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43/fast_lcf_atepc.state_dict
[2025-09-10 06:01:02] (2.4.2) model: None
[2025-09-10 06:01:02] (2.4.2) tokenizer: checkpoints/ATEPC_ENGLISH_CHECKPO

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/579 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/371M [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/371M [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]



spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]



### Extract Aspects and Sentiments in Batches

This cell defines a function to extract aspects and their sentiments from comments in batches using the loaded aspect-based sentiment model. The results are added as new columns to the DataFrame.

In [None]:
import math
def batch_extract_aspects(texts, batch_size=5000):
    all_aspects = []
    all_sentiments = []

    n_batches = math.ceil(len(texts) / batch_size)

    for i in tqdm(range(0, len(texts), batch_size), total=n_batches, desc="Processing batches"):
        batch = texts[i:i+batch_size]

        atepc_results = aspect_extractor.extract_aspect(
            inference_source=batch,
            pred_sentiment=True,
            print_result=False # Add this line to suppress example printing
        )

        for result in atepc_results:
            all_aspects.append(result["aspect"])
            all_sentiments.append(result["sentiment"])

    return all_aspects, all_sentiments

# Run in batch and assign back
aspects, sentiments = batch_extract_aspects(sample["cleanedText"].tolist())
sample["aspect"] = aspects
sample["sentiment"] = sentiments

Processing batches:   0%|          | 0/8 [00:00<?, ?it/s]
preparing ate inference dataloader:   0%|          | 0/5000 [00:00<?, ?it/s]
preparing ate inference dataloader:   5%|▌         | 261/5000 [00:00<00:01, 2609.92it/s]
preparing ate inference dataloader:  10%|█         | 522/5000 [00:00<00:02, 1538.24it/s]
preparing ate inference dataloader:  14%|█▍        | 700/5000 [00:00<00:02, 1573.67it/s]
preparing ate inference dataloader:  17%|█▋        | 872/5000 [00:00<00:03, 1161.62it/s]
preparing ate inference dataloader:  20%|██        | 1006/5000 [00:00<00:03, 1175.68it/s]
preparing ate inference dataloader:  23%|██▎       | 1136/5000 [00:00<00:03, 1118.12it/s]
preparing ate inference dataloader:  25%|██▌       | 1256/5000 [00:01<00:03, 969.33it/s] 
preparing ate inference dataloader:  28%|██▊       | 1392/5000 [00:01<00:03, 1057.76it/s]
preparing ate inference dataloader:  30%|███       | 1506/5000 [00:01<00:03, 1062.14it/s]
preparing ate inference dataloader:  32%|███▏      | 1618/5

[2025-09-10 06:05:27] (2.4.2) The results of aspect term extraction have been saved in /content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


Processing batches:  12%|█▎        | 1/8 [04:08<29:02, 248.89s/it]
preparing ate inference dataloader:   0%|          | 0/5000 [00:00<?, ?it/s]
preparing ate inference dataloader:   6%|▌         | 278/5000 [00:00<00:01, 2775.03it/s]
preparing ate inference dataloader:  11%|█         | 556/5000 [00:00<00:01, 2553.25it/s]
preparing ate inference dataloader:  16%|█▋        | 820/5000 [00:00<00:01, 2587.23it/s]
preparing ate inference dataloader:  22%|██▏       | 1087/5000 [00:00<00:01, 2617.57it/s]
preparing ate inference dataloader:  27%|██▋       | 1350/5000 [00:00<00:01, 2536.32it/s]
preparing ate inference dataloader:  32%|███▏      | 1605/5000 [00:00<00:01, 2461.22it/s]
preparing ate inference dataloader:  38%|███▊      | 1885/5000 [00:00<00:01, 2561.90it/s]
preparing ate inference dataloader:  43%|████▎     | 2163/5000 [00:00<00:01, 2626.16it/s]
preparing ate inference dataloader:  49%|████▊     | 2427/5000 [00:00<00:00, 2587.05it/s]
preparing ate inference dataloader:  54%|█████▍  

[2025-09-10 06:09:15] (2.4.2) The results of aspect term extraction have been saved in /content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


Processing batches:  25%|██▌       | 2/8 [07:56<23:36, 236.13s/it]
preparing ate inference dataloader:   0%|          | 0/5000 [00:00<?, ?it/s]
preparing ate inference dataloader:   2%|▏         | 94/5000 [00:00<00:25, 188.92it/s]
preparing ate inference dataloader:   7%|▋         | 373/5000 [00:00<00:06, 768.69it/s]
preparing ate inference dataloader:  13%|█▎        | 657/5000 [00:00<00:03, 1268.16it/s]
preparing ate inference dataloader:  18%|█▊        | 880/5000 [00:00<00:02, 1514.74it/s]
preparing ate inference dataloader:  22%|██▏       | 1115/5000 [00:00<00:02, 1735.76it/s]
preparing ate inference dataloader:  27%|██▋       | 1362/5000 [00:01<00:01, 1934.33it/s]
preparing ate inference dataloader:  33%|███▎      | 1636/5000 [00:01<00:01, 2156.65it/s]
preparing ate inference dataloader:  38%|███▊      | 1902/5000 [00:01<00:01, 2298.40it/s]
preparing ate inference dataloader:  43%|████▎     | 2170/5000 [00:01<00:01, 2408.98it/s]
preparing ate inference dataloader:  49%|████▊     | 

[2025-09-10 06:13:03] (2.4.2) The results of aspect term extraction have been saved in /content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


Processing batches:  38%|███▊      | 3/8 [11:44<19:22, 232.43s/it]
preparing ate inference dataloader:   0%|          | 0/5000 [00:00<?, ?it/s]
preparing ate inference dataloader:   6%|▌         | 275/5000 [00:00<00:01, 2742.52it/s]
preparing ate inference dataloader:  11%|█         | 550/5000 [00:00<00:01, 2651.28it/s]
preparing ate inference dataloader:  16%|█▋        | 816/5000 [00:00<00:01, 2536.64it/s]
preparing ate inference dataloader:  21%|██▏       | 1071/5000 [00:00<00:01, 2519.03it/s]
preparing ate inference dataloader:  26%|██▋       | 1324/5000 [00:00<00:01, 2448.60it/s]
preparing ate inference dataloader:  32%|███▏      | 1601/5000 [00:00<00:01, 2551.74it/s]
preparing ate inference dataloader:  37%|███▋      | 1857/5000 [00:00<00:01, 2538.28it/s]
preparing ate inference dataloader:  42%|████▏     | 2122/5000 [00:00<00:01, 2565.04it/s]
preparing ate inference dataloader:  48%|████▊     | 2379/5000 [00:00<00:01, 2411.65it/s]
preparing ate inference dataloader:  52%|█████▏  

[2025-09-10 06:16:51] (2.4.2) The results of aspect term extraction have been saved in /content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


Processing batches:  50%|█████     | 4/8 [15:32<15:23, 230.95s/it]
preparing ate inference dataloader:   0%|          | 0/5000 [00:00<?, ?it/s]
preparing ate inference dataloader:   5%|▍         | 233/5000 [00:00<00:02, 2325.79it/s]
preparing ate inference dataloader:  10%|▉         | 477/5000 [00:00<00:01, 2386.32it/s]
preparing ate inference dataloader:  14%|█▍        | 716/5000 [00:00<00:01, 2297.28it/s]
preparing ate inference dataloader:  19%|█▉        | 947/5000 [00:00<00:01, 2287.52it/s]
preparing ate inference dataloader:  24%|██▎       | 1176/5000 [00:00<00:01, 1984.33it/s]
preparing ate inference dataloader:  28%|██▊       | 1399/5000 [00:00<00:01, 2057.90it/s]
preparing ate inference dataloader:  32%|███▏      | 1620/5000 [00:00<00:01, 2101.97it/s]
preparing ate inference dataloader:  37%|███▋      | 1834/5000 [00:00<00:01, 2073.07it/s]
preparing ate inference dataloader:  41%|████      | 2044/5000 [00:00<00:01, 2005.29it/s]
preparing ate inference dataloader:  45%|████▍    

[2025-09-10 06:20:36] (2.4.2) The results of aspect term extraction have been saved in /content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


Processing batches:  62%|██████▎   | 5/8 [19:17<11:26, 228.82s/it]
preparing ate inference dataloader:   0%|          | 0/5000 [00:00<?, ?it/s]
preparing ate inference dataloader:   5%|▍         | 242/5000 [00:00<00:01, 2418.09it/s]
preparing ate inference dataloader:  10%|▉         | 495/5000 [00:00<00:01, 2481.75it/s]
preparing ate inference dataloader:  15%|█▍        | 744/5000 [00:00<00:01, 2339.22it/s]
preparing ate inference dataloader:  20%|█▉        | 984/5000 [00:00<00:01, 2362.05it/s]
preparing ate inference dataloader:  25%|██▍       | 1238/5000 [00:00<00:01, 2424.16it/s]
preparing ate inference dataloader:  30%|██▉       | 1481/5000 [00:00<00:01, 2352.78it/s]
preparing ate inference dataloader:  34%|███▍      | 1720/5000 [00:00<00:01, 2364.24it/s]
preparing ate inference dataloader:  39%|███▉      | 1957/5000 [00:00<00:01, 2262.95it/s]
preparing ate inference dataloader:  44%|████▍     | 2196/5000 [00:00<00:01, 2293.62it/s]
preparing ate inference dataloader:  49%|████▉    

[2025-09-10 06:24:21] (2.4.2) The results of aspect term extraction have been saved in /content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


Processing batches:  75%|███████▌  | 6/8 [23:02<07:34, 227.49s/it]
preparing ate inference dataloader:   0%|          | 0/5000 [00:00<?, ?it/s]
preparing ate inference dataloader:   5%|▌         | 253/5000 [00:00<00:01, 2526.00it/s]
preparing ate inference dataloader:  10%|█         | 506/5000 [00:00<00:01, 2449.74it/s]
preparing ate inference dataloader:  15%|█▌        | 752/5000 [00:00<00:01, 2340.17it/s]
preparing ate inference dataloader:  20%|██        | 1011/5000 [00:00<00:01, 2434.07it/s]
preparing ate inference dataloader:  25%|██▌       | 1256/5000 [00:00<00:01, 2399.70it/s]
preparing ate inference dataloader:  30%|██▉       | 1497/5000 [00:00<00:01, 2225.91it/s]
preparing ate inference dataloader:  34%|███▍      | 1722/5000 [00:00<00:01, 2183.76it/s]
preparing ate inference dataloader:  39%|███▉      | 1945/5000 [00:00<00:01, 2197.18it/s]
preparing ate inference dataloader:  44%|████▎     | 2186/5000 [00:00<00:01, 2259.51it/s]
preparing ate inference dataloader:  48%|████▊   

[2025-09-10 06:28:09] (2.4.2) The results of aspect term extraction have been saved in /content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


Processing batches:  88%|████████▊ | 7/8 [26:50<03:47, 227.52s/it]
preparing ate inference dataloader:   0%|          | 0/5000 [00:00<?, ?it/s]
preparing ate inference dataloader:   3%|▎         | 148/5000 [00:00<00:03, 1476.16it/s]
preparing ate inference dataloader:   6%|▌         | 302/5000 [00:00<00:03, 1512.87it/s]
preparing ate inference dataloader:   9%|▉         | 461/5000 [00:00<00:02, 1547.29it/s]
preparing ate inference dataloader:  12%|█▏        | 616/5000 [00:00<00:03, 1447.15it/s]
preparing ate inference dataloader:  15%|█▌        | 762/5000 [00:00<00:03, 1393.18it/s]
preparing ate inference dataloader:  19%|█▊        | 929/5000 [00:00<00:02, 1480.75it/s]
preparing ate inference dataloader:  22%|██▏       | 1081/5000 [00:00<00:02, 1492.19it/s]
preparing ate inference dataloader:  25%|██▍       | 1240/5000 [00:00<00:02, 1521.82it/s]
preparing ate inference dataloader:  28%|██▊       | 1393/5000 [00:00<00:02, 1476.33it/s]
preparing ate inference dataloader:  31%|███       |

[2025-09-10 06:31:58] (2.4.2) The results of aspect term extraction have been saved in /content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


Processing batches: 100%|██████████| 8/8 [30:38<00:00, 229.87s/it]
  return datetime.utcnow().replace(tzinfo=utc)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  small["aspect"] = aspects
  return datetime.utcnow().replace(tzinfo=utc)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  small["sentiment"] = sentiments


### Preview Aspect-Based Sentiment Results

This cell displays the first few rows of the DataFrame after aspect and sentiment extraction, allowing inspection of the new columns.

In [None]:
sample.head()

Unnamed: 0,commentId,channelId,videoId,authorId,textOriginal,parentCommentId,likeCount,publishedAt,updatedAt,duplicatedFlag,...,regex_spam,predicted_spam,isSpam,is_english,relevance_score,negative,neutral,positive,aspect,sentiment
940000,3366412,45729,58551,509078,I love you for you ❤,,1,2025-06-05 04:30:35+00:00,2025-06-05 04:30:53+00:00,0,...,0,0.0,0,1,0.195588,0.044726,0.072665,0.882609,[],[]
940001,2082133,10895,46768,750829,She looks pretty matter what if someone call h...,,0,2025-06-05 04:31:03+00:00,2025-06-05 04:31:03+00:00,0,...,0,0.0,0,1,0.08999,0.79518,0.161583,0.043236,[look],[Negative]
940002,1239265,45729,58551,300126,mm my favorite is probably happy,,0,2025-06-05 04:31:09+00:00,2025-06-05 04:31:09+00:00,0,...,0,0.0,0,1,0.19032,0.066521,0.168984,0.764495,[],[]
940003,4113283,23845,84473,359133,My 4 year old brother said “Oh yeah I know wha...,,0,2025-06-05 04:31:38+00:00,2025-06-05 04:31:38+00:00,0,...,0,0.0,0,1,0.208737,0.294894,0.145905,0.559201,[candy],[Neutral]
940004,2851378,8388,65139,2545679,and you've got soooo many wrinkles don't you,,0,2025-06-05 04:33:03+00:00,2025-06-05 04:33:03+00:00,0,...,0,0.0,0,1,0.561001,0.712064,0.216603,0.071333,[wrinkle],[Negative]


  return datetime.utcnow().replace(tzinfo=utc)


### Save Aspect-Based Sentiment Results to CSV

This cell saves the DataFrame with extracted aspects and sentiments to a CSV file for further analysis or downstream tasks.

In [None]:
sample.to_csv("dataset/final_after_spam_eng_relevance_sentiment_aspect_final_2M.csv")