# Problem Statement
Customer support teams often struggle with high volumes of incoming tickets. Manually reading and assigning categories is slow, prone to human error, and delays resolution times. There is a need for an automated system that can accurately categorize free-text tickets into predefined labels with high confidence.

# Objective
To build an AI-driven system that uses Large Language Models to automatically classify support tickets. The system will leverage Prompt Engineering (Zero-shot and Few-shot) and potentially Fine-tuning to predict the top 3 most likely categories for any given ticket, improving organizational efficiency.

# Environment Setup & Data Loading.

In [1]:
pip install pandas scikit-learn openai python-dotenv



# Setup the Script & Load Data

In [3]:
import pandas as pd
from sklearn.datasets import fetch_20newsgroups

try:
    # 1. Fetch a subset of data that looks like support tickets
    # Categories: Hardware, Electronics, Cryptography (Security), and Space (Tech)
    categories = ['comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware',
                  'sci.electronics', 'sci.crypt']

    dataset = fetch_20newsgroups(subset='train', categories=categories, remove=('headers', 'footers', 'quotes'))

    # 2. Convert to Dataframe
    df = pd.DataFrame({
        'text': dataset.data,
        'actual_category': [dataset.target_names[t] for t in dataset.target]
    })

    # 3. Clean: Remove empty strings and take a small sample for testing
    df = df[df['text'].str.strip() != ""]
    df = df.head(50).reset_index(drop=True)

    print("‚úÖ Success! Reliable dataset loaded.")
    print(f"Total rows: {len(df)}")
    print("\nSample Categories found:", df['actual_category'].unique())
    print("\n--- Text Preview ---")
    print(df['text'].iloc[0][:200] + "...")

except Exception as e:
    print(f"‚ùå Still hitting an error: {e}")

‚úÖ Success! Reliable dataset loaded.
Total rows: 50

Sample Categories found: ['sci.crypt' 'comp.sys.mac.hardware' 'sci.electronics'
 'comp.sys.ibm.pc.hardware']

--- Text Preview ---

(a) To use for sensitive but not strategically important traffic,
(b) if the system was cheap.

For example, I don't own a cordless phone.  With Clipper, I would.  If the 
local men in blue really wa...


# Step 2: Connecting the LLM .

In [9]:
pip install groq

Collecting groq
  Downloading groq-1.0.0-py3-none-any.whl.metadata (16 kB)
Downloading groq-1.0.0-py3-none-any.whl (138 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m138.3/138.3 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: groq
Successfully installed groq-1.0.0


In [None]:
from groq import Groq
import os

# 1. Initialize the Groq Client
client = Groq(api_key="Your API Key ")

def classify_ticket_top3_groq(ticket_text):
    categories = ["Cryptography", "IBM PC Hardware", "Mac Hardware", "Electronics", "Other"]

    # We use a system message for better "instruction following" in Llama models
    prompt = f"""
    Analyze this support ticket and provide the TOP 3 most relevant categories from: {categories}

    Ticket: "{ticket_text}"

    Output format: Just a Python list of strings.
    Example: ["Category A", "Category B", "Category C"]
    """

    # Using llama-3.3-70b-versatile for high accuracy
    completion = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[
            {"role": "system", "content": "You are a specialized support ticket classifier. Output only the requested list format."},
            {"role": "user", "content": prompt}
        ],
        temperature=0, # Keep it deterministic for classification
    )

    return completion.choices[0].message.content

# 2. Test Execution
sample_id = 2
sample_text = df['text'].iloc[sample_id]

print(f"--- TICKET TEXT ---\n{sample_text[:200]}...\n")

try:
    prediction = classify_ticket_top3_groq(sample_text)
    print(f"--- TOP 3 PREDICTIONS (Groq) ---\n{prediction}")
except Exception as e:
    print(f"‚ùå Error: {e}")

--- TICKET TEXT ---

Ergo, if your life is sufficiently boring, you have no need for privacy?

(This is not meant to be personal, just the logical conclusion of your
statement.)...

--- TOP 3 PREDICTIONS (Groq) ---
["Cryptography", "Other", "Electronics"]


# Few-Shot Learning

In [11]:
# These are our 'Shots' (Examples)
few_shot_examples = [
    {
        "text": "I'm having trouble with my motherboard. The BIOS doesn't recognize the hard drive.",
        "label": "IBM PC Hardware"
    },
    {
        "text": "Does anyone know how to implement RSA encryption in Python? I need to secure my messages.",
        "label": "Cryptography"
    },
    {
        "text": "My Mac SE/30 is showing a checkerboard pattern on the screen. Is the logic board dead?",
        "label": "Mac Hardware"
    }
]

In [12]:
def classify_ticket_fewshot_groq(ticket_text):
    categories = ["Cryptography", "IBM PC Hardware", "Mac Hardware", "Electronics", "Other"]

    # Building the few-shot string
    example_str = ""
    for ex in few_shot_examples:
        example_str += f"Ticket: {ex['text']}\nCategories: ['{ex['label']}', 'Other', 'Other']\n\n"

    prompt = f"""
    You are an expert technical support classifier.
    Below are some examples of how to classify tickets into the top 3 categories.

    {example_str}

    Now, classify this new ticket:
    Ticket: "{ticket_text}"

    Output format: Just a Python list of strings.
    Example: ["Category 1", "Category 2", "Category 3"]
    """

    completion = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[
            {"role": "system", "content": "You are a specialized support ticket classifier. Output only the requested list format."},
            {"role": "user", "content": prompt}
        ],
        temperature=0,
    )

    return completion.choices[0].message.content

# Test it out!
try:
    fewshot_prediction = classify_ticket_fewshot_groq(sample_text)
    print("--- FEW-SHOT PREDICTIONS ---")
    print(fewshot_prediction)
except Exception as e:
    print(f"‚ùå Error: {e}")

--- FEW-SHOT PREDICTIONS ---
['Philosophy', 'Other', 'Other']


# Building the Gradio Interface

In [13]:
pip install gradio



In [None]:
import gradio as gr
import pandas as pd
import time
from groq import Groq

# 1. Initialize Client
client = Groq(api_key="Your_Key")

def batch_process(file_obj, mode):
    """Reads a CSV, tags every row in the 'text' column, and returns a new CSV."""
    df = pd.read_csv(file_obj.name)

    if 'text' not in df.columns:
        return None, "Error: CSV must have a column named 'text'"

    results = []
    for index, row in df.iterrows():
        # Choose function based on UI selection
        if mode == "Zero-Shot":
            tag = classify_ticket_top3_groq(row['text'])
        else:
            tag = classify_ticket_fewshot_groq(row['text'])

        results.append(tag)
        time.sleep(1) # Slow down to avoid 429 errors on free tier

    df['predicted_tags'] = results
    output_path = "tagged_tickets.csv"
    df.to_csv(output_path, index=False)
    return output_path, "Batch Processing Complete!"

# --- UI Layout ---
with gr.Blocks(theme=gr.themes.Soft()) as demo:
    gr.Markdown("# üé´ SmartSupport Pro: Batch Tagger")

    with gr.Tab("Single Ticket"):
        ticket_input = gr.Textbox(label="Ticket Text", lines=5)
        single_mode = gr.Radio(["Zero-Shot", "Few-Shot"], value="Few-Shot", label="Mode")
        btn_single = gr.Button("Tag Single")
        output_single = gr.Label(label="Predictions")
        btn_single.click(process_ticket, [ticket_input, single_mode], output_single)

    with gr.Tab("Batch Processing (CSV)"):
        gr.Markdown("Upload a CSV with a column named **'text'**.")
        file_input = gr.File(label="Upload CSV")
        batch_mode = gr.Radio(["Zero-Shot", "Few-Shot"], value="Few-Shot", label="Mode")
        btn_batch = gr.Button("Start Batch Process")
        file_output = gr.File(label="Download Tagged CSV")
        status_text = gr.Textbox(label="Status")

        btn_batch.click(batch_process, [file_input, batch_mode], [file_output, status_text])

demo.launch()

  with gr.Blocks(theme=gr.themes.Soft()) as demo:


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://062a1f20786e88fa2f.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [None]:
# !pip install groq python-dotenv gradio pandas

import os
import time
import pandas as pd
import gradio as gr
from groq import Groq

# --- 1. Load API Key Safely ---
def load_key_from_file(filepath='env.txt'):
    if os.path.exists(filepath):
        with open(filepath, 'r') as f:
            for line in f:
                if 'GROQ_API_KEY' in line:
                    return line.split('=')[1].strip()
    return None

api_key = load_key_from_file()
client = Groq(api_key=api_key)

# --- 2. Classification Logic ---
categories = ["Cryptography", "IBM PC Hardware", "Mac Hardware", "Electronics", "Other"]

few_shot_examples = [
    {"text": "My motherboard isn't recognizing the HDD.", "label": "IBM PC Hardware"},
    {"text": "How to implement RSA in Python?", "label": "Cryptography"},
    {"text": "Mac SE/30 logic board repair tips.", "label": "Mac Hardware"}
]

def get_prediction(text, mode):
    example_str = ""
    if mode == "Few-Shot":
        for ex in few_shot_examples:
            example_str += f"Ticket: {ex['text']}\nTags: ['{ex['label']}', 'Other', 'Other']\n\n"
    
    prompt = f"""
    {example_str}
    Analyze this ticket and provide the TOP 3 categories from {categories}:
    Ticket: "{text}"
    Output: Just a Python list of strings.
    """
    
    completion = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "system", "content": "You are a support tagger. Output only the list."},
                  {"role": "user", "content": prompt}],
        temperature=0
    )
    return completion.choices[0].message.content

# --- 3. UI Functions ---
def single_tag_ui(text, mode):
    return get_prediction(text, mode)

def batch_tag_ui(file_obj, mode):
    df = pd.read_csv(file_obj.name)
    if 'text' not in df.columns:
        return None, "Error: Missing 'text' column."
    
    results = []
    for t in df['text']:
        results.append(get_prediction(t, mode))
        time.sleep(0.5) # Anti-rate-limit
        
    df['top_3_tags'] = results
    df.to_csv("tagged_results.csv", index=False)
    return "tagged_results.csv", "Success: Processed all tickets."

# --- 4. Launch Gradio ---
with gr.Blocks(theme=gr.themes.Monochrome()) as demo:
    gr.Markdown("# üöÄ AutoTag AI: Support Ticket Classifier")
    
    with gr.Tab("Single Testing"):
        inp = gr.Textbox(label="Ticket Content", lines=4)
        m = gr.Radio(["Zero-Shot", "Few-Shot"], value="Few-Shot", label="Mode")
        out = gr.Label(label="Top 3 Predictions")
        btn = gr.Button("Classify")
        btn.click(single_tag_ui, [inp, m], out)
        
    with gr.Tab("Batch Upload"):
        f_in = gr.File(label="Upload CSV (must have 'text' column)")
        m_b = gr.Radio(["Zero-Shot", "Few-Shot"], value="Few-Shot")
        f_out = gr.File(label="Download Result")
        status = gr.Textbox(label="Status")
        b_btn = gr.Button("Run Batch")
        b_btn.click(batch_tag_ui, [f_in, m_b], [f_out, status])

demo.launch()