# Análise de sentimento por Engenharia de Prompts

- Utilizar o groq.com para usar a API do Llama 3 70B para fazer análise de sentimentos do IMDB. É um enunciado bem livre e vamos acompanhando durante a semana em função dos resultados parciais que vocês conseguem fazer.

# 1. Imports

In [1]:
import torch
import pandas as pd
from datasets import load_dataset
import os
from groq import Groq
from tqdm import tqdm

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

  from .autonotebook import tqdm as notebook_tqdm


# 2. Dataset

I am going to use the IMDB dataset and its partitions

In [2]:
# Load the IMDb movie reviews dataset
train_dataset_full = load_dataset('imdb', split='train')
test_dataset_full = load_dataset('imdb', split='test')
test_dataset_full  = test_dataset_full.shuffle(seed=42).select(range(1000)) # avaliação com 1000 amostras aleatórias do conjunto de teste

In [18]:
# Selecting positives from index 11112 to 12111 and 23612 to 24612
train_dataset = train_dataset_full.select(list(range(11112, 11575)) + list(range(23612, 24075)))
train_dataset = train_dataset.shuffle(seed=42).select(range(len(train_dataset)))

In [19]:
print('Quantidade de Amostras:', len(train_dataset))
print('Quantidade de Amostras Positivas:', sum(1 for label in train_dataset['label'] if label == 1))
print('Quantidade de Amostras Negativas:', sum(1 for label in train_dataset['label'] if label == 0))


Quantidade de Amostras: 926
Quantidade de Amostras Positivas: 463
Quantidade de Amostras Negativas: 463


# 3. Groq

Groq's API stands out for its speed and efficiency, making it a viable option for developers looking to implement real-time interactions with LLMs in their applications. To use the Groq API, developers need to install the relevant client libraries and set up their API keys.

## 3.1. Client Key

In [6]:
# Basic Test Groq
client = Groq(api_key=os.environ.get("GROQ_API_KEY"),)

## 3.2. Groq Llama 3 to Sentiment Aanlysis

In [7]:
def sentiment_analysis(review, prompt):
    # organizing prompt
    prompt = f"""{prompt}. This is the movie review: '{review}'. """
    
    # Chamando a API com o prompt
    response = client.chat.completions.create(
        messages=[{"role": "system", "content": prompt}],
        model="llama3-70b-8192")
        
    # Retornando a resposta do modelo
    return response.choices[0].message.content

# 4. Prompt Engineering

Here are 3 types of prompts with different types of structures. The goal of this code is to compare their accuracy results based on the different prompts.

In [8]:
prompt_zero_shot = """Your task is to analyse the sentiment of the movie review and classify if it is 'positive' or 'negative'. Your answer MUST be ONLY one word, either 'positive' or 'negative', nothing else.
    Is the following movie review positive or negative?"""

prompt_few_shot = """Your task is to analyse the sentiment of the movie review and classify if it is 'positive' or 'negative'. Your answer MUST be ONLY one word, either 'positive' or 'negative', nothing else.
    Below are four examples to guide your understanding:
    Review 1: "This movie is very bad. I did not like it." - Sentiment: "negative"
    Review 2: "This movie is very good. I've loved it." - Sentiment: "positive"
    Review 3: "Terrible movie. Nuff Said.<br /><br />These Lines are Just Filler. The movie was bad. Why I have to expand on that I don't know. This is already a waste of my time. I just wanted to warn others. Avoid this movie. The acting sucks and the writing is just moronic. Bad in every way. The only nice thing about the movie are Deniz Akkaya's breasts. Even that was ruined though by a terrible and unneeded rape scene. The movie is a poorly contrived and totally unbelievable piece of garbage.<br /><br />OK now I am just going to rag on IMDb for this stupid rule of 10 lines of text minimum. First I waste my time watching this offal. Then feeling compelled to warn others I create an account with IMDb only to discover that I have to write a friggen essay on the film just to express how bad I think it is. Totally unnecessary." - Sentiment: "negative"
    Review 4: "I can't remember many films where a bumbling idiot of a hero was so funny throughout. Leslie Cheung is such the antithesis of a hero that he's too dense to be seduced by a gorgeous vampire... I had the good luck to see it on a big screen, and to find a video to watch again and again." - Sentiment: "positive" 
    Now is your turn. Is the following movie review positive or negative? """

prompt_chain_of_thought_cot = """Your task is to analyse the sentiment of the movie review and classify if it is 'positive' or 'negative' using a step-by-step reasoning approach. Your answer MUST be ONLY one word, either 'positive' or 'negative', nothing else.
    First, identify key words or phrases that indicate emotion or judgment. Next, assess whether these words have a positive or negative connotation. Finally, based on the predominance of positive or negative words, conclude the sentiment of the review.
    Steps to follow:
        1. Identification of sentiment-indicating keywords.
        2. Evaluation of each keyword's sentiment connotation (positive or negative).
        3. Conclusion based on the majority sentiment of keywords.
        4. Classify if the final concluision is 'positive' or 'negative', and return only the class, nothing else.
        
    Below are four examples of the classification to guide your understanding:
        Review 1: "This movie is very bad. I did not like it." - Sentiment: "negative"
        Review 2: "This movie is very good. I've loved it." - Sentiment: "positive"
        Review 3: "Terrible movie. Nuff Said.<br /><br />These Lines are Just Filler. The movie was bad. Why I have to expand on that I don't know. This is already a waste of my time. I just wanted to warn others. Avoid this movie. The acting sucks and the writing is just moronic. Bad in every way. The only nice thing about the movie are Deniz Akkaya's breasts. Even that was ruined though by a terrible and unneeded rape scene. The movie is a poorly contrived and totally unbelievable piece of garbage.<br /><br />OK now I am just going to rag on IMDb for this stupid rule of 10 lines of text minimum. First I waste my time watching this offal. Then feeling compelled to warn others I create an account with IMDb only to discover that I have to write a friggen essay on the film just to express how bad I think it is. Totally unnecessary." - Sentiment: "negative"
        Review 4: "I can't remember many films where a bumbling idiot of a hero was so funny throughout. Leslie Cheung is such the antithesis of a hero that he's too dense to be seduced by a gorgeous vampire... I had the good luck to see it on a big screen, and to find a video to watch again and again." - Sentiment: "positive" 
        Now is your turn. Follow the steps and answer: Is the following movie review positive or negative? """

In [9]:
# Exemplo de uso
review = "The movie was normal! The story was little engaging and the characters were little well-developed."
print(sentiment_analysis(review, prompt_chain_of_thought_cot))

negative


# 5. Inference Test

In [10]:
def evaluate_accuracy_without_bar(dataset, prompt):
    predictions = [sentiment_analysis(review=review, prompt = prompt) for review in dataset['text']]
    actual_labels = ['positive' if label == 1 else 'negative' for label in dataset['label']]
    
    correct_predictions = sum([pred == true for pred, true in zip(predictions, actual_labels)])
    accuracy = correct_predictions*100 / len(dataset)
    return accuracy

In [11]:
def evaluate_accuracy(dataset, prompt):
    """Evaluates the accuracy of sentiment analysis on a dataset with a progress bar."""
    predictions = []
    actual_labels = ['positive' if label == 1 else 'negative' for label in dataset['label']]
    
    # Process each review in the dataset and update the progress bar
    for review in tqdm(dataset['text'], desc="Analyzing Sentiments"):
        prediction = sentiment_analysis(review=review, prompt=prompt)
        predictions.append(prediction)
    
    # Calculate the number of correct predictions
    correct_predictions = sum(pred == true for pred, true in zip(predictions, actual_labels))
    accuracy = correct_predictions * 100 / len(dataset)
    return f"{accuracy:.4f}" 

In [13]:
# Calculando a acurácia Zero-Shot
zeroshot_accuracy = evaluate_accuracy(train_dataset, prompt_zero_shot)
print(f"Zero-Shot Accuracy: {zeroshot_accuracy}")

Zero-Shot Accuracy: 95.0500


In [21]:
# Calculando a acurácia Few-Shot
fewshot_accuracy = evaluate_accuracy(train_dataset, prompt_few_shot)
print(f"Few-Shot Accuracy: {fewshot_accuracy}%")

Analyzing Sentiments: 100%|██████████| 926/926 [1:14:44<00:00,  4.84s/it]

Few-Shot Accuracy: 94.6004%





In [22]:
# Calculando a acurácia Chain-of-Thoughts
cot_accuracy = evaluate_accuracy(train_dataset, prompt_chain_of_thought_cot)
print(f"Chain-of-Thoughts Accuracy: {cot_accuracy}%")

Analyzing Sentiments: 100%|██████████| 926/926 [1:29:50<00:00,  5.82s/it]

Chain-of-Thoughts Accuracy: 94.4924%



