
# 🧠 Few-Shot Document Classification with LLMs

This notebook demonstrates how to classify documents into predefined categories using **few-shot prompting** with OpenAI's GPT-3.5 model. This approach is useful when you:
- Don't have enough labeled data for training
- Want to quickly prototype classification behavior using LLMs


In [None]:
!pip install openai langchain tiktoken

## 📦 Import Libraries

In [None]:

import os
from dotenv import load_dotenv
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

load_dotenv()


## 🔍 Few-Shot Training Examples
We’ll use 5 representative labeled samples.

In [None]:

fewshot_examples = [
    {"text": "This document is an agreement between two parties regarding software services.", "label": "contract"},
    {"text": "Invoice No: 12345. Amount due: $500. Due Date: July 1, 2024.", "label": "invoice"},
    {"text": "Name: John Smith. Medical diagnosis: Hypertension. Treatment: Medication.", "label": "medical_report"},
    {"text": "Dear Mr. Johnson, Please find attached the proposal for your review.", "label": "letter"},
    {"text": "Claim ID: 78910. Policy Number: XZ-456. Claimed amount: $2000.", "label": "insurance_form"}
]


## ✍️ Prompt Template
We define the structure of our prompt using the few-shot examples.

In [None]:

prompt_template = """
You are an intelligent document classifier. Classify the following document into one of the categories: contract, invoice, medical_report, letter, insurance_form.

Use the examples below for reference:

{examples}

Document:
"""{doc}"""

Label:
"""


## 🧱 Prompt Constructor
This will format the input document with our examples.

In [None]:

example_blocks = "\n\n".join([
    f"Document:\n{ex['text']}\nLabel: {ex['label']}"
    for ex in fewshot_examples
])

def build_prompt(doc_text):
    return prompt_template.format(examples=example_blocks, doc=doc_text)


## 🧠 Load LLM
We use OpenAI GPT-3.5 via LangChain's `ChatOpenAI` interface.

In [None]:

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)


## 🧪 Classify a Single Example

In [None]:

example_doc = "Policyholder: Jane Doe. Total amount claimed: $1200. Date of incident: 01/02/2024."
prompt = build_prompt(example_doc)
response = llm.predict(prompt)

print("Document:", example_doc)
print("Predicted Label:", response.strip())


## 📊 Classify a Batch of Documents

In [None]:

test_docs = [
    "Total billed: $800. Due by August 1. Invoice No. 54321.",
    "This memo outlines project milestones and team assignments.",
    "Dear Ms. Lee, Thank you for your application. We will be in touch."
]

for doc in test_docs:
    result = llm.predict(build_prompt(doc))
    print("\nDoc:", doc)
    print("Label:", result.strip())



## ✅ Summary

You’ve just built a few-shot document classifier using GPT-3.5 and LangChain with no fine-tuning required.

This is a powerful method for:
- Quick prototyping of document understanding pipelines
- Exploring LLM-based classification without large datasets
- Creating intelligent routing logic for document processing systems
