<a href="https://colab.research.google.com/github/yolandaazzahra/TaskWeek7/blob/main/TaskWeek7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Nama : Yolanda Azzahra

NPM : 2306319514

model name path : ayameRushia/roberta-base-indonesian-1.5G-sentiment-analysis-smsa

Install the necessary libraries:

In [22]:
!pip install transformers datasets torch




Load the model and tokenizer

In [23]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import pipeline

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("ayameRushia/roberta-base-indonesian-1.5G-sentiment-analysis-smsa")
model = AutoModelForSequenceClassification.from_pretrained("ayameRushia/roberta-base-indonesian-1.5G-sentiment-analysis-smsa")

# Initialize sentiment analysis pipeline
sentiment_analyzer = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

# Predict sentiment for a given sentence
sentence = "Produk ini sangat baik dan saya suka!"
result = sentiment_analyzer(sentence)
print(result)


[{'label': 'POSITIVE', 'score': 0.9997664093971252}]


Check the Model Accuracy Using a Synthetic Dataset

In [24]:
synthetic_data = [
    {"text": "Produk ini sangat baik dan saya suka!", "label": "positive"},
    {"text": "Pelayanan sangat buruk dan tidak memuaskan.", "label": "negative"},
    {"text": "Barangnya biasa saja, tidak terlalu istimewa.", "label": "neutral"},
]

correct = 0
total = len(synthetic_data)

for item in synthetic_data:
    prediction = sentiment_analyzer(item["text"])[0]["label"]
    if prediction.lower() == item["label"]:
        correct += 1

accuracy = correct / total
print(f"Accuracy: {accuracy * 100:.2f}%")


Accuracy: 66.67%


**Implement Attention Transformer**

Extract Attention Scores from the Pre-Trained Model

In [25]:
import torch

# Tokenize the input sentence
inputs = tokenizer(sentence, return_tensors="pt")
outputs = model(**inputs, output_attentions=True)

# Extract attention from the last layer
attentions = outputs.attentions[-1]
print(attentions.shape)  # Shape: (batch_size, num_heads, sequence_length, sequence_length)


torch.Size([1, 12, 10, 10])


Fine-Tune the Model Using Attention Mechanism

In [26]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("ayameRushia/roberta-base-indonesian-1.5G-sentiment-analysis-smsa")
model = AutoModelForSequenceClassification.from_pretrained("ayameRushia/roberta-base-indonesian-1.5G-sentiment-analysis-smsa")

# Load the dataset (Indonesian SMSA sentiment analysis dataset)
dataset = load_dataset("indonlu", "smsa")

# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding=True)

# Apply the tokenizer to the dataset
tokenized_train_dataset = dataset["train"].shuffle().select(range(1000)).map(tokenize_function, batched=True)
tokenized_eval_dataset = dataset["validation"].shuffle().select(range(200)).map(tokenize_function, batched=True)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",  # Changed to evaluation_strategy
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    report_to="none"  # Disable W&B
)

# Define Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_eval_dataset,
)

# Fine-tune the model
trainer.train()


Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Epoch,Training Loss,Validation Loss
1,No log,0.589947
2,No log,0.567581
3,No log,0.678255


TrainOutput(global_step=375, training_loss=0.16413154093424479, metrics={'train_runtime': 4125.1812, 'train_samples_per_second': 0.727, 'train_steps_per_second': 0.091, 'total_flos': 171126500202000.0, 'train_loss': 0.16413154093424479, 'epoch': 3.0})

Check Model Accuracy After Fine-Tuning

In [28]:
correct_finetuned = 0

for item in synthetic_data:
    prediction_finetuned = sentiment_analyzer(item["text"])[0]["label"].lower()  # Convert to lowercase
    if prediction_finetuned == item["label"].lower():  # Compare both in lowercase
        correct_finetuned += 1

accuracy_finetuned = correct_finetuned / total
print(f"Accuracy after fine-tuning: {accuracy_finetuned * 100:.2f}%")


Accuracy after fine-tuning: 66.67%
