# Demo: Build a Sentiment Classifier

This notebook is a simplified demonstration of how to use a foundation model to classify text. We will build a simple **sentiment classifier** for movie reviews.

The steps are:
1.  **Prepare Data**: Create a small, sample dataset of movie reviews.
2.  **Build a Basic Classifier**: Use a "zero-shot" prompt to classify the reviews.
3.  **Build an Improved Classifier**: Use a "few-shot" prompt to improve the model's accuracy.

In [9]:
import litellm
import os

if os.getenv("OPENAI_API_KEY"):
    litellm.openai_key = os.getenv("OPENAI_API_KEY")

# If using Vocareum, you can also set your API key here directly
# Uncomment and replace the string with your actual Vocareum API key
# litellm.openai_key = "voc-**********"

if (litellm.openai_key or "").startswith("voc-"):
    litellm.api_base = "https://openai.vocareum.com/v1"
    print("Detected vocareum API key. Using Vocareum OpenAI API base URL.")

Detected vocareum API key. Using Vocareum OpenAI API base URL.


## Step 1: Prepare the Data

First, we'll create a small, hard-coded dataset of movie reviews. Each review has a piece of text (`review`) and a numeric label (`label`), where `1` is for a positive review and `0` is for a negative one.

In [10]:
# A tiny, sample dataset of movie reviews
demo_dataset = [
    {"review": "An absolute masterpiece, the best film of the year!", "label": 1},  # 0
    {
        "review": "I was bored from start to finish. A total waste of time.",
        "label": 0,
    },  # 1
    {
        "review": "The acting was incredible and the story was so moving.",
        "label": 1,
    },  # 2
    {"review": "While not perfect, it had some good moments.", "label": 1},  # 3
    {"review": "A confusing plot and terrible dialogue.", "label": 0},  # 4
]

print(f"Dataset created with {len(demo_dataset)} reviews.")

Dataset created with 5 reviews.


The numeric labels aren't very descriptive. Let's map them to human-readable strings.

In [11]:
# Map numeric labels to human-readable labels
id2label = {0: "NEGATIVE", 1: "POSITIVE"}

# Print the first two entries to see the new labels
for i in range(2):
    review = demo_dataset[i]["review"]
    label_id = demo_dataset[i]["label"]
    print(f"label={id2label[label_id]}, review={review}")

label=POSITIVE, review=An absolute masterpiece, the best film of the year!
label=NEGATIVE, review=I was bored from start to finish. A total waste of time.


## Step 2: Build and Evaluate a Basic Classifier (Zero-Shot)

Now, let's ask the foundation model to classify our reviews. We'll start with a **zero-shot** prompt, which means we will simply tell the model what to do without giving it any examples.

In [12]:
# First, format our reviews into a single string for the prompt
reviews_string = ""
for i, entry in enumerate(demo_dataset):
    reviews_string += f"{i} -> {entry['review']}\n"

# Define the prompts
SYSTEM_PROMPT = """You are a helpful assistant that classifies movie reviews as POSITIVE or NEGATIVE.
Respond only with a JSON object where the keys are the review numbers and the values are the classification."""

USER_PROMPT = reviews_string

print("SYSTEM PROMPT:")
print(SYSTEM_PROMPT)
print("\nUSER PROMPT:")
print(USER_PROMPT)

SYSTEM PROMPT:
You are a helpful assistant that classifies movie reviews as POSITIVE or NEGATIVE.
Respond only with a JSON object where the keys are the review numbers and the values are the classification.

USER PROMPT:
0 -> An absolute masterpiece, the best film of the year!
1 -> I was bored from start to finish. A total waste of time.
2 -> The acting was incredible and the story was so moving.
3 -> While not perfect, it had some good moments.
4 -> A confusing plot and terrible dialogue.



In [13]:
from litellm import completion

resp = completion(
    model="gpt-5-mini",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
    ],
)

print("\nMODEL RESPONSE:")
print(resp.choices[0].message.content)


MODEL RESPONSE:
{
  "0": "POSITIVE",
  "1": "NEGATIVE",
  "2": "POSITIVE",
  "3": "POSITIVE",
  "4": "NEGATIVE"
}


In [15]:
# Let's paste the response here
response_1 = {
    "0": "POSITIVE",
    "1": "NEGATIVE",
    "2": "POSITIVE",
    "3": "POSITIVE",
    "4": "NEGATIVE",
}

Now let's check the accuracy of this response.

In [None]:
def get_accuracy(response, dataset):
    correct = 0
    total = len(response)

    for entry_number, prediction in response.items():
        entry_number = int(entry_number)
        actual_label_id = dataset[entry_number]["label"]
        actual_label = id2label[actual_label_id]

        if prediction.lower() == actual_label.lower():
            correct += 1
        else:
            print(
                f"Mismatch for entry {entry_number}: predicted={prediction}, actual={actual_label}"
            )

    accuracy = correct / total * 100
    return accuracy


accuracy_1 = get_accuracy(response_1, demo_dataset)
print(f"Accuracy: {accuracy_1}")

Accuracy: 100.0


That's great! Sometimes our model will need additional help to get the right answer. Let's try a few-shot prompt next.

## Step 3: Build an Improved Classifier (Few-Shot)

Let's try a **few-shot** prompt. We will provide a couple of correctly labeled examples *within the prompt itself* to guide the model.

In [18]:
# Create a string with the first two reviews as examples
example_string = """
Here are some examples:

EXAMPLE INPUT:
0 -> An absolute masterpiece, the best film of the year!
1 -> I was bored from start to finish. A total waste of time.

EXAMPLE OUTPUT:
{
  "0": "POSITIVE",
  "1": "NEGATIVE"
}
"""

# The new SYSTEM_PROMPT now includes the examples
IMPROVED_SYSTEM_PROMPT = SYSTEM_PROMPT + example_string

print("SYSTEM PROMPT:")
print(IMPROVED_SYSTEM_PROMPT)
print("\nUSER PROMPT:")
print(USER_PROMPT)

SYSTEM PROMPT:
You are a helpful assistant that classifies movie reviews as POSITIVE or NEGATIVE.
Respond only with a JSON object where the keys are the review numbers and the values are the classification.
Here are some examples:

EXAMPLE INPUT:
0 -> An absolute masterpiece, the best film of the year!
1 -> I was bored from start to finish. A total waste of time.

EXAMPLE OUTPUT:
{
  "0": "POSITIVE",
  "1": "NEGATIVE"
}


USER PROMPT:
0 -> An absolute masterpiece, the best film of the year!
1 -> I was bored from start to finish. A total waste of time.
2 -> The acting was incredible and the story was so moving.
3 -> While not perfect, it had some good moments.
4 -> A confusing plot and terrible dialogue.



With the added examples, the model can better understand the task and improve its accuracy.

In [21]:
resp = completion(
    model="gpt-5-mini",
    messages=[
        {"role": "system", "content": IMPROVED_SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
    ],
)
print("\nMODEL RESPONSE:")
print(resp.choices[0].message.content)


MODEL RESPONSE:
{
  "0": "POSITIVE",
  "1": "NEGATIVE",
  "2": "POSITIVE",
  "3": "POSITIVE",
  "4": "NEGATIVE"
}


In [None]:
response_2 = {
    "0": "POSITIVE",
    "1": "NEGATIVE",
    "2": "POSITIVE",
    "3": "POSITIVE",
    "4": "NEGATIVE",
}

accuracy_2 = get_accuracy(response_2, demo_dataset)
print(f"Accuracy: {accuracy_2}")

Accuracy: 100.0


## Conclusion

It's important to note the following: **prompt engineering techniques such as guiding the model with examples may improve the accuracy of the model, but to actually show that is true in your specific use case requires larger dataset and more rigorous evaluation than we have done here.**

<br /><br /><br /><br /><br /><br /><br /><br /><br />