Skip to content

tali0n-git/Prompt_Eng_Lab

Repository files navigation

Coconut Water Sentiment Analysis

An AI-Powered Pipeline for Historical Review Classification (1999–2012)

📌 Project Overview

This project is a high-precision sentiment analysis pipeline designed to categorize customer feedback for a coconut water brand. Using OpenAI's GPT-4o-mini, the system transforms raw JSON review data into sentiment labels (positive, negative, neutral, irrelevant).

The pipeline utilizes a 'Ralph Wiggum' agentic workflow (seen in label.py); this pattern ensures that the classification logic and API calls are autonomously iterated upon until they satisfy all rigorous unit tests before deployment.

Implementing the Ralph Wiggum pattern: Ralph Wiggum

"I'm helping!" - Keep looping until tests pass.

while not tests_passed:
    rerun_sentiment_analysis()

Technical Architecture

The pipeline is modularized into three main components:

  • label.py (AI Engine): Interfaces with the OpenAI API using advanced prompt engineering. It features robust input validation to handle data-type anomalies and empty datasets.
  • visualize.py: Aggregates sentiment distribution and generates a simple pie chart, automatically exporting them to a dedicated images/ directory.
  • main.py (Pipeline Orchestration): The "brain" of the project that handles file I/O and executes the end-to-end flow from raw JSON to final classification.

visualize.py output

Fig 1. Output from one execution of the visualize.py script.


Engineering Challenges & Solutions

1. Advanced Prompt Engineering

Instead of basic queries, I implemented a System-Prompt strategy that provides the LLM with cultural context and specific examples of nuanced sentiment. This ensures that a review like "its a ring" is correctly identified as irrelevant rather than neutral.

2. Test-Driven Development (TDD)

To ensure long-term maintainability, the project includes a comprehensive suite of automated tests (test_*.py). These verify:

  • API response consistency.
  • Correct visualization output formatting.
  • Error handling for "Wrong input" scenarios.

Getting Started

Prerequisites

  • Python 3.10+
  • OpenAI API Key (Stored securely via environment variables)

Installation & Execution

  1. Clone the repository: git clone https://github.com/your-username/sentiment-pipeline.git
  2. Install dependencies: pip install -r requirements.txt
  3. Run the core pipeline: python main.py
  4. Execute tests: python test_run.py

📁 Repository Structure

├── images/             # Generated sentiment distribution plots
├── reviews.json        # Source dataset (Coconut water reviews 1999-2012)
├── label.py            # GPT-4o-mini integration logic
├── visualize.py        # Data visualization module
├── main.py             # Pipeline entry point
├── writeup.md          # Qualitative analysis of results
└── .gitignore          # Safeguards for API keys and data artifacts

About

Running sentiment analysis using Ralph Wiggum-like code (seen in label.py)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages