# Análise de sentimento por Engenharia de Prompts

- Utilizar o groq.com para usar a API do Llama 3 70B para fazer análise de sentimentos do IMDB. É um enunciado bem livre e vamos acompanhando durante a semana em função dos resultados parciais que vocês conseguem fazer.

# 1. Imports

In [1]:
import torch
import pandas as pd
import os
from groq import Groq
from tqdm import tqdm
import pyserini
import requests
import tarfile
import json

# 2. Dataset

I am going to use the IIRC Dataset

## 2.1. Downloading Dataset

In [2]:
def download_and_extract(url, save_path, extract_to):
    """ Baixar um arquivo TAR.GZ de uma URL e extrair seu conteúdo.
    Argumentos:
    url -- URL do arquivo TAR.GZ para download
    save_path -- caminho para salvar o arquivo TAR.GZ
    extract_to -- diretório para extrair os conteúdos do arquivo TAR.GZ
    """
    # Fazendo o download do arquivo
    response = requests.get(url, stream=True)
    if response.status_code == 200:
        with open(save_path, 'wb') as f:
            f.write(response.raw.read())
        print("Download do arquivo completo.")

        # Extraindo o arquivo
        if save_path.endswith('.tgz'):
            with tarfile.open(save_path, 'r:gz') as tar:
                tar.extractall(path=extract_to)
            print("Extração completa.")
    else:
        print("Falha no download do arquivo. Status code:", response.status_code)

In [3]:
# URL do dataset IIRC
url = 'https://iirc-dataset.s3.us-west-2.amazonaws.com/iirc_train_dev.tgz'

# Caminho para salvar o arquivo .tgz
save_path = '/workspace/aimsbirdclef/ia024/iirc_train_dev.tgz'

# Diretório para extrair os conteúdos do arquivo .tgz
extract_to = '/workspace/aimsbirdclef/ia024/'

# Chamar a função de download e extração
#download_and_extract(url, save_path, extract_to)

train_json_path = '/workspace/aimsbirdclef/ia024/iirc_train_dev/train.json'
dev_json_path = '/workspace/aimsbirdclef/ia024/iirc_train_dev/dev.json'

## 2.1. Testing Dataset

In [8]:
df_train = pd.read_json(train_json_path).head(150)
df_train.iloc[149]

questions    [{'context': [{'text': 'In 1984, a revised pro...
links        [{'indices': [202, 229], 'target': 'Leicester ...
text         The musical was revived in 1941, 1945 and 1949...
title                                           Me and My Girl
pid                                                      p_149
Name: 149, dtype: object

# 3. Groq

Groq's API stands out for its speed and efficiency, making it a viable option for developers looking to implement real-time interactions with LLMs in their applications. To use the Groq API, developers need to install the relevant client libraries and set up their API keys.

## 3.1. Client Key

In [8]:
os.environ["GROQ_API_KEY"] = "x"
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

## 3.2. Groq Llama 3 testing

In [9]:
def sentiment_analysis(review, prompt):
    # organizing prompt
    prompt = f"""{prompt}. This is the movie review: '{review}'. """
    
    # Chamando a API com o prompt
    response = client.chat.completions.create(
        messages=[{"role": "system", "content": prompt}],
        model="llama3-70b-8192")
        
    # Retornando a resposta do modelo
    return response.choices[0].message.content

In [15]:
# Exemplo de uso
review = "The movie was normal! The story was little engaging and the characters were little well-developed."
print(sentiment_analysis(review, 'classifique se essa review de filme é positiva ou negativa'))

Eu classificaria essa review como negativa. Embora o autor use a palavra "normal", que pode soar neutra, as outras declarações da review são negativas. "Little engaging" (pouco atraente) e "little well-developed" (pouco desenvolvidas) sugerem que o filme teve problemas em capturar a atenção do espectador e em criar personagens interessantes. Além disso, a falta de entusiasmo e a linguagem utilizada sugerem que o autor não ficou impressionado com o filme.


# 4. Visconde


# 5. Inference Test

In [10]:
def evaluate_accuracy_without_bar(dataset, prompt):
    predictions = [sentiment_analysis(review=review, prompt = prompt) for review in dataset['text']]
    actual_labels = ['positive' if label == 1 else 'negative' for label in dataset['label']]
    
    correct_predictions = sum([pred == true for pred, true in zip(predictions, actual_labels)])
    accuracy = correct_predictions*100 / len(dataset)
    return accuracy

In [11]:
def evaluate_accuracy(dataset, prompt):
    """Evaluates the accuracy of sentiment analysis on a dataset with a progress bar."""
    predictions = []
    actual_labels = ['positive' if label == 1 else 'negative' for label in dataset['label']]
    
    # Process each review in the dataset and update the progress bar
    for review in tqdm(dataset['text'], desc="Analyzing Sentiments"):
        prediction = sentiment_analysis(review=review, prompt=prompt)
        predictions.append(prediction)
    
    # Calculate the number of correct predictions
    correct_predictions = sum(pred == true for pred, true in zip(predictions, actual_labels))
    accuracy = correct_predictions * 100 / len(dataset)
    return f"{accuracy:.4f}" 

In [13]:
# Calculando a acurácia Zero-Shot
zeroshot_accuracy = evaluate_accuracy(train_dataset, prompt_zero_shot)
print(f"Zero-Shot Accuracy: {zeroshot_accuracy}")

Zero-Shot Accuracy: 95.0500


In [21]:
# Calculando a acurácia Few-Shot
fewshot_accuracy = evaluate_accuracy(train_dataset, prompt_few_shot)
print(f"Few-Shot Accuracy: {fewshot_accuracy}%")

Analyzing Sentiments: 100%|██████████| 926/926 [1:14:44<00:00,  4.84s/it]

Few-Shot Accuracy: 94.6004%





In [22]:
# Calculando a acurácia Chain-of-Thoughts
cot_accuracy = evaluate_accuracy(train_dataset, prompt_chain_of_thought_cot)
print(f"Chain-of-Thoughts Accuracy: {cot_accuracy}%")

Analyzing Sentiments: 100%|██████████| 926/926 [1:29:50<00:00,  5.82s/it]

Chain-of-Thoughts Accuracy: 94.4924%



