# KUchat Production Deployment

**Enterprise-Grade AI Assistant for Kasetsart University Curriculum Information**

## Overview

This notebook provides a complete production deployment of the KUchat AI assistant system, featuring:

- GPT-OSS-20B language model (20 billion parameters)
- 4-bit quantization using Unsloth optimization framework
- Retrieval-Augmented Generation (RAG) system with Qdrant + BGE-M3-Thai + Reranking
- Web search integration (DuckDuckGo and Wikipedia)
- Public-facing Gradio web interface
- Zero local configuration required

## System Requirements

1. **Google Colab Pro+ subscription** (A100 GPU access required)
2. **Documentation repository** (automatically downloaded from GitHub)
3. **Internet connection** for model downloads and web search

## Deployment Instructions

1. Navigate to Runtime ‚Üí Change runtime type ‚Üí Select **A100 GPU**
2. Execute all cells sequentially from top to bottom
3. Wait for automatic documentation download from repository
4. Access the generated public URL for the demo interface

## Technical Specifications

- **Model**: GPT-OSS-20B (20B parameters)
- **Quantization**: 4-bit BitsAndBytes (Unsloth optimized)
- **Memory Footprint**: Approximately 12GB VRAM (model only)
- **Total VRAM Usage**: ~22GB (including RAG components)
- **Inference Speed**: 40-80 tokens per second
- **Optimization**: 2x faster inference, 75% memory reduction vs FP16
- **Quality**: Production-ready with enterprise-grade responses

---

## Step 1: Install Dependencies

In [1]:
# Model Configuration
MODEL_NAME = "unsloth/gpt-oss-20b-unsloth-bnb-4bit"

# UPDATED: BGE-M3-Thai with normalization for better performance
EMBEDDING_MODEL = "jaeyong2/bge-m3-Thai"
RERANKER_MODEL = "BAAI/bge-reranker-v2-m3"  # Cross-encoder for reranking

DOCS_FOLDER = "./docs"
HF_TOKEN = "YOUR_HUGGINGFACE_TOKEN_HERE"  # Replace with your token from https://huggingface.co/settings/tokens

print("="*60)
print("Configuration initialized")
print("="*60)
print(f"LLM Model: {MODEL_NAME}")
print(f"Embedding Model: {EMBEDDING_MODEL}")
print(f"  - Dimensions: 1024 (BGE-M3)")
print(f"  - Normalized: Yes")
print(f"  - Thai-optimized: Yes")
print(f"Reranker Model: {RERANKER_MODEL}")
print(f"Vector DB: Qdrant (in-memory)")
print(f"Expected VRAM: ~22GB (12GB model + 10GB RAG)")
print("="*60)

Configuration initialized
LLM Model: unsloth/gpt-oss-20b-unsloth-bnb-4bit
Embedding Model: jaeyong2/bge-m3-Thai
  - Dimensions: 1024 (BGE-M3)
  - Normalized: Yes
  - Thai-optimized: Yes
Reranker Model: BAAI/bge-reranker-v2-m3
Vector DB: Qdrant (in-memory)
Expected VRAM: ~22GB (12GB model + 10GB RAG)


## Step 2: Import Libraries and Verify GPU

In [2]:
%%capture
!pip install -U unsloth transformers accelerate bitsandbytes
!pip install langchain langchain-community langchain-huggingface
!pip install sentence-transformers torch tiktoken
!pip install qdrant-client
!pip install fastapi uvicorn pyngrok
!pip install gradio duckduckgo-search wikipedia
!pip install pypdf

# NEW: Install Qdrant for vector database
!pip install qdrant-client

import warnings
warnings.filterwarnings("ignore")

import torch
print(f"\nPyTorch Version: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA Version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")

print("\nAll dependencies installed (including Qdrant)")
print("Ready for BGE-M3-Thai + Reranking + Qdrant")

## Step 3: System Configuration

## Step 4: Download Documentation from Repository

Documentation files are automatically downloaded from the GitHub repository.

In [3]:
print("Downloading documentation from GitHub repository...\n")

# Clone repository with sparse checkout (docs folder only)
!git clone --depth 1 --filter=blob:none --sparse https://github.com/themistymoon/KUchat.git
%cd KUchat
!git sparse-checkout set docs

# Move docs folder from /content/KUchat/docs to /content/docs
%cd /content
!mv KUchat/docs docs
!rm -rf KUchat

print("\nDocumentation downloaded successfully")
print(f"Location: {DOCS_FOLDER}")

# Display folder structure
!echo "\nFolder structure:"
!ls -lh docs/ | head -20

Downloading documentation from GitHub repository...

Cloning into 'KUchat'...
remote: Enumerating objects: 24, done.[K
remote: Counting objects: 100% (24/24), done.[K
remote: Compressing objects: 100% (23/23), done.[K
remote: Total 24 (delta 0), reused 22 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (24/24), 6.96 KiB | 6.96 MiB/s, done.
remote: Enumerating objects: 5, done.[K
remote: Counting objects: 100% (5/5), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 5 (delta 0), reused 3 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (5/5), 55.45 KiB | 18.48 MiB/s, done.
/content/KUchat
remote: Enumerating objects: 134, done.[K
remote: Counting objects: 100% (134/134), done.[K
remote: Compressing objects: 100% (134/134), done.[K
remote: Total 134 (delta 0), reused 133 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (134/134), 150.66 MiB | 14.51 MiB/s, done.
Updating files: 100% (139/139), done.
/content

Documentation

## Step 5: Load GPT-OSS-120B Language Model

In [4]:
import unsloth
import time
from unsloth import FastLanguageModel

print("=" * 60)
print("[1/4] Loading GPT-OSS-20B Language Model")
print("=" * 60)
print("Estimated loading time: 2-3 minutes (first run)")
print("\nModel: GPT-OSS-20B (20 billion parameters)")
print("Quantization: 4-bit BitsAndBytes (Unsloth optimized)")
print("Expected VRAM usage: ~12GB\n")

start_time = time.time()

# Configure model parameters
max_seq_length = 16384  # Maximum context length

# Load model using Unsloth's FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL_NAME,
    max_seq_length=max_seq_length,
    dtype=None,  # Automatic dtype detection
    load_in_4bit=True,  # Enable 4-bit quantization
)

# Optimize for inference
FastLanguageModel.for_inference(model)

load_time = time.time() - start_time

print(f"\nModel loaded successfully in {load_time:.1f} seconds")
print(f"Model: GPT-OSS-20B (20B parameters)")
print(f"Quantization: 4-bit BitsAndBytes")
print(f"Maximum sequence length: {max_seq_length} tokens")
if torch.cuda.is_available():
    print(f"GPU memory allocated: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
    print(f"GPU memory reserved: {torch.cuda.memory_reserved() / 1024**3:.2f} GB")
    print(f"Total VRAM usage: ~{torch.cuda.memory_allocated() / 1024**3:.0f} GB")

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!
[1/4] Loading GPT-OSS-20B Language Model
Estimated loading time: 2-3 minutes (first run)

Model: GPT-OSS-20B (20 billion parameters)
Quantization: 4-bit BitsAndBytes (Unsloth optimized)
Expected VRAM usage: ~12GB

==((====))==  Unsloth 2025.10.3: Fast Gpt_Oss patching. Transformers: 4.56.2.
   \\   /|    NVIDIA A100-SXM4-80GB. Num GPUs = 1. Max memory: 79.318 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 8.0. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.16G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/3.37G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.00G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/165 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/27.9M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/446 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]


Model loaded successfully in 92.9 seconds
Model: GPT-OSS-20B (20B parameters)
Quantization: 4-bit BitsAndBytes
Maximum sequence length: 16384 tokens
GPU memory allocated: 11.67 GB
GPU memory reserved: 19.30 GB
Total VRAM usage: ~12 GB


## Step 6: Initialize RAG System

In [None]:
import os
import json
import numpy as np
from pathlib import Path
from typing import List, Dict, Tuple, Optional
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from sentence_transformers import SentenceTransformer, CrossEncoder
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchAny
import re

class RAGSystem:
    def __init__(self):
        """
        Initialize RAG System with Lenient Search Strategy

        Strategy: Broad ‚Üí Rerank ‚Üí Post-filter + Boost + Fallback
        """
        print("Initializing RAGSystem with strategy: Broad ‚Üí Rerank ‚Üí Post-filter + Boost")

        self.embedding_model = None
        self.reranker = None
        self.qdrant_client = None
        self.collection_name = "ku_curricula"
        self.docs_loaded = False
        self.catalog = {}
        self.gened_catalog = None  # General Education catalog
        self.all_chunks = []
        self.chunk_metadata = []

        # Thai keyword patterns
        self.year_keywords = {
            "1": ["‡∏õ‡∏µ‡∏´‡∏ô‡∏∂‡πà‡∏á", "‡∏õ‡∏µ 1", "‡∏õ‡∏µ1", "‡∏õ‡∏µ‡∏ó‡∏µ‡πà 1", "‡∏õ‡∏µ‡∏ó‡∏µ‡πà‡∏´‡∏ô‡∏∂‡πà‡∏á", "‡∏ä‡∏±‡πâ‡∏ô‡∏õ‡∏µ‡∏ó‡∏µ‡πà 1", "‡∏ä‡∏±‡πâ‡∏ô‡∏õ‡∏µ‡∏ó‡∏µ‡πà‡∏´‡∏ô‡∏∂‡πà‡∏á"],
            "2": ["‡∏õ‡∏µ‡∏™‡∏≠‡∏á", "‡∏õ‡∏µ 2", "‡∏õ‡∏µ2", "‡∏õ‡∏µ‡∏ó‡∏µ‡πà 2", "‡∏õ‡∏µ‡∏ó‡∏µ‡πà‡∏™‡∏≠‡∏á", "‡∏ä‡∏±‡πâ‡∏ô‡∏õ‡∏µ‡∏ó‡∏µ‡πà 2", "‡∏ä‡∏±‡πâ‡∏ô‡∏õ‡∏µ‡∏ó‡∏µ‡πà‡∏™‡∏≠‡∏á"],
            "3": ["‡∏õ‡∏µ‡∏™‡∏≤‡∏°", "‡∏õ‡∏µ 3", "‡∏õ‡∏µ3", "‡∏õ‡∏µ‡∏ó‡∏µ‡πà 3", "‡∏õ‡∏µ‡∏ó‡∏µ‡πà‡∏™‡∏≤‡∏°", "‡∏ä‡∏±‡πâ‡∏ô‡∏õ‡∏µ‡∏ó‡∏µ‡πà 3", "‡∏ä‡∏±‡πâ‡∏ô‡∏õ‡∏µ‡∏ó‡∏µ‡πà‡∏™‡∏≤‡∏°"],
            "4": ["‡∏õ‡∏µ‡∏™‡∏µ‡πà", "‡∏õ‡∏µ 4", "‡∏õ‡∏µ4", "‡∏õ‡∏µ‡∏ó‡∏µ‡πà 4", "‡∏õ‡∏µ‡∏ó‡∏µ‡πà‡∏™‡∏µ‡πà", "‡∏ä‡∏±‡πâ‡∏ô‡∏õ‡∏µ‡∏ó‡∏µ‡πà 4", "‡∏ä‡∏±‡πâ‡∏ô‡∏õ‡∏µ‡∏ó‡∏µ‡πà‡∏™‡∏µ‡πà"]
        }

        self.semester_keywords = {
            "1": ["‡∏†‡∏≤‡∏Ñ‡∏ï‡πâ‡∏ô", "‡∏†‡∏≤‡∏Ñ 1", "‡∏†‡∏≤‡∏Ñ1", "‡∏†‡∏≤‡∏Ñ‡∏Å‡∏≤‡∏£‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏µ‡πà 1", "‡πÄ‡∏ó‡∏≠‡∏° 1", "‡πÄ‡∏ó‡∏≠‡∏°1"],
            "2": ["‡∏†‡∏≤‡∏Ñ‡∏õ‡∏•‡∏≤‡∏¢", "‡∏†‡∏≤‡∏Ñ 2", "‡∏†‡∏≤‡∏Ñ2", "‡∏†‡∏≤‡∏Ñ‡∏Å‡∏≤‡∏£‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏µ‡πà 2", "‡πÄ‡∏ó‡∏≠‡∏° 2", "‡πÄ‡∏ó‡∏≠‡∏°2"],
            "summer": ["‡∏†‡∏≤‡∏Ñ‡∏§‡∏î‡∏π‡∏£‡πâ‡∏≠‡∏ô", "‡∏†‡∏≤‡∏Ñ‡∏£‡πâ‡∏≠‡∏ô", "summer"]
        }

        # Faculty/Major mappings (EXPANDED - added missing faculties)
        self.faculty_mappings = {
            # ‡∏Ñ‡∏ì‡∏∞‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå
            "‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå": ["‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå", "computer science", "cs", "‡∏Ñ‡∏≠‡∏°", "compsci"],
            "‡πÄ‡∏Ñ‡∏°‡∏µ": ["chemistry", "‡πÄ‡∏Ñ‡∏°‡∏µ", "chem"],
            "‡∏ü‡∏¥‡∏™‡∏¥‡∏Å‡∏™‡πå": ["physics", "‡∏ü‡∏¥‡∏™‡∏¥‡∏Å‡∏™‡πå", "phy"],
            "‡∏Ñ‡∏ì‡∏¥‡∏ï‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["mathematics", "math", "‡∏Ñ‡∏ì‡∏¥‡∏ï"],
            "‡∏ä‡∏µ‡∏ß‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤": ["biology", "bio", "‡∏ä‡∏µ‡∏ß‡∏∞"],
            "‡∏à‡∏∏‡∏•‡∏ä‡∏µ‡∏ß‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤": ["microbiology", "micro", "‡∏à‡∏∏‡∏•‡∏ä‡∏µ‡∏ß"],

            # ‡∏Ñ‡∏ì‡∏∞‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå
            "‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡πÑ‡∏ü‡∏ü‡πâ‡∏≤": ["electrical engineering", "ee", "‡πÑ‡∏ü‡∏ü‡πâ‡∏≤"],
            "‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡πÄ‡∏Ñ‡∏£‡∏∑‡πà‡∏≠‡∏á‡∏Å‡∏•": ["mechanical engineering", "me", "‡πÄ‡∏Ñ‡∏£‡∏∑‡πà‡∏≠‡∏á‡∏Å‡∏•"],
            "‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡πÇ‡∏¢‡∏ò‡∏≤": ["civil engineering", "ce", "‡πÇ‡∏¢‡∏ò‡∏≤"],
            "‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡πÄ‡∏Ñ‡∏°‡∏µ": ["chemical engineering", "‡πÄ‡∏Ñ‡∏°‡∏µ"],
            "‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡∏≠‡∏∏‡∏ï‡∏™‡∏≤‡∏´‡∏Å‡∏≤‡∏£": ["industrial engineering", "ie", "‡∏≠‡∏∏‡∏ï‡∏™‡∏≤‡∏´‡∏Å‡∏≤‡∏£"],
            "‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå": ["computer engineering", "cpe", "‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå"],

            # ‡∏Ñ‡∏ì‡∏∞‡πÄ‡∏Å‡∏©‡∏ï‡∏£
            "‡πÄ‡∏Å‡∏©‡∏ï‡∏£‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["agriculture", "‡πÄ‡∏Å‡∏©‡∏ï‡∏£", "agri"],
            "‡∏û‡∏∑‡∏ä‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["plant science", "‡∏û‡∏∑‡∏ä"],
            "‡∏™‡∏±‡∏ï‡∏ß‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["animal science", "‡∏™‡∏±‡∏ï‡∏ß‡πå"],
            "‡∏Å‡∏µ‡∏è‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤": ["entomology", "‡πÅ‡∏°‡∏•‡∏á"],
            "‡∏Ñ‡∏´‡∏Å‡∏£‡∏£‡∏°‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["home economics", "‡∏Ñ‡∏´‡∏Å‡∏£‡∏£‡∏°"],

            # ‡∏Ñ‡∏ì‡∏∞‡∏ö‡∏£‡∏¥‡∏´‡∏≤‡∏£‡∏ò‡∏∏‡∏£‡∏Å‡∏¥‡∏à
            "‡∏ö‡∏£‡∏¥‡∏´‡∏≤‡∏£‡∏ò‡∏∏‡∏£‡∏Å‡∏¥‡∏à": ["business administration", "ba", "‡∏ö‡∏ò", "business"],
            "‡∏ö‡∏±‡∏ç‡∏ä‡∏µ": ["accounting", "acc", "‡∏ö‡∏±‡∏ç‡∏ä‡∏µ"],
            "‡∏Å‡∏≤‡∏£‡∏à‡∏±‡∏î‡∏Å‡∏≤‡∏£": ["management", "‡∏à‡∏±‡∏î‡∏Å‡∏≤‡∏£"],
            "‡∏Å‡∏≤‡∏£‡∏ï‡∏•‡∏≤‡∏î": ["marketing", "mkt"],
            "‡∏Å‡∏≤‡∏£‡πÄ‡∏á‡∏¥‡∏ô": ["finance", "fin"],

            # ‡∏Ñ‡∏ì‡∏∞‡∏°‡∏ô‡∏∏‡∏©‡∏¢‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå
            "‡∏†‡∏≤‡∏©‡∏≤‡πÑ‡∏ó‡∏¢": ["thai", "‡πÑ‡∏ó‡∏¢"],
            "‡∏†‡∏≤‡∏©‡∏≤‡∏≠‡∏±‡∏á‡∏Å‡∏§‡∏©": ["english", "eng", "‡∏≠‡∏±‡∏á‡∏Å‡∏§‡∏©"],
            "‡∏†‡∏≤‡∏©‡∏≤‡∏à‡∏µ‡∏ô": ["chinese", "‡∏à‡∏µ‡∏ô"],
            "‡∏†‡∏≤‡∏©‡∏≤‡∏ç‡∏µ‡πà‡∏õ‡∏∏‡πà‡∏ô": ["japanese", "‡∏ç‡∏µ‡πà‡∏õ‡∏∏‡πà‡∏ô"],
            "‡∏†‡∏≤‡∏©‡∏≤‡∏ù‡∏£‡∏±‡πà‡∏á‡πÄ‡∏®‡∏™": ["french", "‡∏ù‡∏£‡∏±‡πà‡∏á‡πÄ‡∏®‡∏™"],
            "‡∏õ‡∏£‡∏±‡∏ä‡∏ç‡∏≤": ["philosophy", "‡∏õ‡∏£‡∏±‡∏ä‡∏ç‡∏≤"],
            "‡∏õ‡∏£‡∏∞‡∏ß‡∏±‡∏ï‡∏¥‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["history", "‡∏õ‡∏£‡∏∞‡∏ß‡∏±‡∏ï‡∏¥‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå"],

            # ‡∏Ñ‡∏ì‡∏∞‡∏™‡∏±‡∏á‡∏Ñ‡∏°‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå
            "‡πÄ‡∏®‡∏£‡∏©‡∏ê‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["economics", "econ", "‡πÄ‡∏®‡∏£‡∏©‡∏ê"],
            "‡∏£‡∏±‡∏ê‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["political science", "‡∏£‡∏±‡∏ê‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå"],
            "‡∏™‡∏±‡∏á‡∏Ñ‡∏°‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤": ["sociology", "‡∏™‡∏±‡∏á‡∏Ñ‡∏°"],
            "‡∏à‡∏¥‡∏ï‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤": ["psychology", "‡∏à‡∏¥‡∏ï‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤"],

            # ‡∏Ñ‡∏ì‡∏∞‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå
            "‡∏Ñ‡∏£‡∏∏‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["education", "‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå", "‡∏Ñ‡∏£‡∏π"],

            # ‡∏Ñ‡∏ì‡∏∞‡πÅ‡∏û‡∏ó‡∏¢‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå (NEW!)
            "‡πÅ‡∏û‡∏ó‡∏¢‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["medicine", "medical", "‡πÅ‡∏û‡∏ó‡∏¢‡πå", "‡∏´‡∏°‡∏≠", "‡πÅ‡∏û‡∏ó‡∏¢‡πå‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå"],
            
            # ‡∏Ñ‡∏ì‡∏∞‡∏û‡∏¢‡∏≤‡∏ö‡∏≤‡∏•‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå
            "‡∏û‡∏¢‡∏≤‡∏ö‡∏≤‡∏•‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["nursing", "‡∏û‡∏¢‡∏≤‡∏ö‡∏≤‡∏•", "‡∏ô‡∏≤‡∏á‡∏û‡∏¢‡∏≤‡∏ö‡∏≤‡∏•"],
            
            # ‡∏Ñ‡∏ì‡∏∞‡πÄ‡∏†‡∏™‡∏±‡∏ä‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå
            "‡πÄ‡∏†‡∏™‡∏±‡∏ä‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["pharmacy", "‡πÄ‡∏†‡∏™‡∏±‡∏ä", "‡πÄ‡∏†‡∏™‡∏±‡∏ä‡∏Å‡∏£"],
            
            # ‡∏Ñ‡∏ì‡∏∞‡∏™‡∏±‡∏ï‡∏ß‡πÅ‡∏û‡∏ó‡∏¢‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå
            "‡∏™‡∏±‡∏ï‡∏ß‡πÅ‡∏û‡∏ó‡∏¢‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["veterinary medicine", "veterinary", "‡∏™‡∏±‡∏ï‡∏ß‡πÅ‡∏û‡∏ó‡∏¢‡πå", "‡∏´‡∏°‡∏≠‡∏™‡∏±‡∏ï‡∏ß‡πå"],
            
            # ‡∏Ñ‡∏ì‡∏∞‡∏õ‡πà‡∏≤‡πÑ‡∏°‡πâ
            "‡∏õ‡πà‡∏≤‡πÑ‡∏°‡πâ": ["forestry", "‡∏õ‡πà‡∏≤‡πÑ‡∏°‡πâ", "forest"],
            
            # ‡∏Ñ‡∏ì‡∏∞‡∏õ‡∏£‡∏∞‡∏°‡∏á
            "‡∏õ‡∏£‡∏∞‡∏°‡∏á": ["fisheries", "‡∏õ‡∏£‡∏∞‡∏°‡∏á", "fish"],
            
            # ‡∏Ñ‡∏ì‡∏∞‡∏™‡∏¥‡πà‡∏á‡πÅ‡∏ß‡∏î‡∏•‡πâ‡∏≠‡∏°
            "‡∏™‡∏¥‡πà‡∏á‡πÅ‡∏ß‡∏î‡∏•‡πâ‡∏≠‡∏°": ["environment", "environmental", "‡∏™‡∏¥‡πà‡∏á‡πÅ‡∏ß‡∏î‡∏•‡πâ‡∏≠‡∏°"],
            
            # ‡∏Ñ‡∏ì‡∏∞‡∏≠‡∏∏‡∏ï‡∏™‡∏≤‡∏´‡∏Å‡∏£‡∏£‡∏°‡πÄ‡∏Å‡∏©‡∏ï‡∏£
            "‡∏≠‡∏∏‡∏ï‡∏™‡∏≤‡∏´‡∏Å‡∏£‡∏£‡∏°‡πÄ‡∏Å‡∏©‡∏ï‡∏£": ["agro-industry", "agro", "‡∏≠‡∏∏‡∏ï‡∏™‡∏≤‡∏´‡∏Å‡∏£‡∏£‡∏°‡πÄ‡∏Å‡∏©‡∏ï‡∏£"],
            
            # ‡∏Ñ‡∏ì‡∏∞‡∏™‡∏ñ‡∏≤‡∏õ‡∏±‡∏ï‡∏¢‡∏Å‡∏£‡∏£‡∏°‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå
            "‡∏™‡∏ñ‡∏≤‡∏õ‡∏±‡∏ï‡∏¢‡∏Å‡∏£‡∏£‡∏°": ["architecture", "‡∏™‡∏ñ‡∏≤‡∏õ‡∏±‡∏ï‡∏¢‡πå", "arch"],
            
            # ‡∏Ñ‡∏ì‡∏∞‡πÄ‡∏®‡∏£‡∏©‡∏ê‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå
            "‡πÄ‡∏®‡∏£‡∏©‡∏ê‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["economics", "econ", "‡πÄ‡∏®‡∏£‡∏©‡∏ê"],
            
            # ‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏•‡∏±‡∏¢‡∏ö‡∏π‡∏£‡∏ì‡∏≤‡∏Å‡∏≤‡∏£‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå
            "‡∏ö‡∏π‡∏£‡∏ì‡∏≤‡∏Å‡∏≤‡∏£‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå": ["interdisciplinary", "‡∏ö‡∏π‡∏£‡∏ì‡∏≤‡∏Å‡∏≤‡∏£"],
        }

        print("‚úì RAG System initialized")
        print("  Next step: load_models() then load_documents()")

    def load_models(self):
        """Load embedding model and reranker"""
        print("Loading embedding & reranker models...")

        # Embedding: BGE-M3-Thai (1024 dimensions)
        self.embedding_model = SentenceTransformer('jaeyong2/bge-m3-Thai')
        print("‚úì Loaded BGE-M3-Thai (1024d, normalized)")

        # Reranker: BGE-Reranker-v2-m3
        self.reranker = CrossEncoder('BAAI/bge-reranker-v2-m3')
        print("‚úì Loaded BGE-Reranker-v2-m3 (Cross-Encoder)")

        # Initialize Qdrant client
        self.qdrant_client = QdrantClient(":memory:")
        print("‚úì Initialized Qdrant (in-memory)")

    def load_catalog(self, docs_folder: str):
        """Load program catalog from JSON"""
        catalog_path = os.path.join(docs_folder, "curricula_catalog.json")

        if not os.path.exists(catalog_path):
            print(f"‚ö† Warning: Catalog not found at {catalog_path}")
            return False

        try:
            with open(catalog_path, 'r', encoding='utf-8') as f:
                catalog_data = json.load(f)

            # Handle new catalog format with "curricula" array
            if isinstance(catalog_data, dict) and "curricula" in catalog_data:
                # New format: {"version": "2.0", "curricula": [...]}
                curricula_list = catalog_data["curricula"]
                for program in curricula_list:
                    file_path = program['file_path']
                    self.catalog[file_path] = program

                print(f"‚úì Loaded catalog v{catalog_data.get('version', '2.0')}: {len(self.catalog)} programs")
                print(f"  Faculties: {catalog_data.get('total_faculties', 'N/A')}")
                print(f"  Keywords: {catalog_data.get('total_keywords', 'N/A')}")
            else:
                # Old format: {program_id: {...}, ...}
                for program_id, metadata in catalog_data.items():
                    file_path = metadata['file_path']
                    self.catalog[file_path] = metadata

                print(f"‚úì Loaded catalog: {len(self.catalog)} programs")
                print(f"  Faculties: {len(set(m['faculty'] for m in self.catalog.values()))}")

            # Load General Education catalog
            gened_path = os.path.join(docs_folder, "general_education_catalog.json")
            if os.path.exists(gened_path):
                try:
                    with open(gened_path, 'r', encoding='utf-8') as f:
                        self.gened_catalog = json.load(f)
                    print(f"‚úì Loaded Gen Ed catalog: {self.gened_catalog['total_courses']} courses")
                    print(f"  Categories: {', '.join([cat['category_th'] for cat in self.gened_catalog['categories']])}")
                except Exception as e:
                    print(f"‚ö† Warning: Could not load Gen Ed catalog: {e}")
                    self.gened_catalog = None

            return True
        except Exception as e:
            print(f"Error loading catalog: {e}")
            import traceback
            traceback.print_exc()
            return False

    def load_documents(self, docs_folder: str):
        """Load curriculum PDFs with proper metadata"""
        if not self.load_catalog(docs_folder):
            return False

        print(f"\nLoading documents from {docs_folder}...")

        # Find all curriculum PDFs
        pdf_files = []
        for root, dirs, files in os.walk(docs_folder):
            for file in files:
                if file.endswith('.pdf'):
                    file_path = os.path.join(root, file)
                    pdf_files.append(file_path)

        if not pdf_files:
            print("No PDF files found!")
            return False

        print(f"Found {len(pdf_files)} PDF files")
        print(f"Catalog has metadata for {len(self.catalog)} programs")

        # Text splitter
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1500,
            chunk_overlap=300,
            separators=["\n\n", "\n", ".", "!", "?", " ", ""],
            length_function=len
        )

        # Process PDFs
        all_chunks = []
        all_metadata = []

        for file_path in pdf_files:
            try:
                loader = PyPDFLoader(file_path)
                docs = loader.load()

                # Convert absolute path to relative path for catalog lookup
                relative_path = os.path.relpath(file_path, docs_folder).replace('\\', '/')
                
                # Get metadata from catalog using relative path
                catalog_metadata = self.catalog.get(relative_path, {})

                # Split into chunks
                chunks = text_splitter.split_documents(docs)

                for i, chunk in enumerate(chunks):
                    metadata = {
                        'file_path': file_path,
                        'file_name': os.path.basename(file_path),
                        'chunk_index': i,
                        'total_chunks': len(chunks),
                        'program': catalog_metadata.get('program', os.path.basename(file_path)),
                        'faculty': catalog_metadata.get('faculty', 'Unknown'),
                        'degree': catalog_metadata.get('degree', 'Unknown'),
                        'id': catalog_metadata.get('id', ''),
                        'keywords': catalog_metadata.get('keywords', [])
                    }

                    all_chunks.append(chunk.page_content)
                    all_metadata.append(metadata)

            except Exception as e:
                print(f"Error processing {file_path}: {e}")

        if not all_chunks:
            print("No chunks created!")
            return False

        # Create Qdrant collection
        try:
            self.qdrant_client.recreate_collection(
                collection_name=self.collection_name,
                vectors_config=VectorParams(
                    size=1024,  # BGE-M3-Thai dimension
                    distance=Distance.COSINE
                )
            )
            print(f"‚úì Created Qdrant collection: {self.collection_name}")
        except Exception as e:
            print(f"Error creating collection: {e}")
            return False

        # Generate embeddings and upload
        print(f"\nGenerating embeddings for {len(all_chunks)} chunks...")

        batch_size = 32
        for i in range(0, len(all_chunks), batch_size):
            batch_chunks = all_chunks[i:i+batch_size]
            batch_metadata = all_metadata[i:i+batch_size]

            # Generate embeddings
            embeddings = self.embedding_model.encode(
                batch_chunks,
                normalize_embeddings=True,
                show_progress_bar=False
            )

            # Upload to Qdrant
            points = [
                PointStruct(
                    id=i+j,
                    vector=embeddings[j].tolist(),
                    payload={
                        'text': batch_chunks[j],
                        **batch_metadata[j]
                    }
                )
                for j in range(len(batch_chunks))
            ]

            self.qdrant_client.upsert(
                collection_name=self.collection_name,
                points=points
            )

            if (i+batch_size) % 100 == 0:
                print(f"  Processed {min(i+batch_size, len(all_chunks))}/{len(all_chunks)} chunks")

        self.all_chunks = all_chunks
        self.chunk_metadata = all_metadata
        self.docs_loaded = True

        print(f"\n‚úì Loaded {len(all_chunks)} chunks from {len(pdf_files)} PDFs")
        print(f"  Embedding dimensions: 1024 (BGE-M3-Thai)")
        print(f"  Vector DB: Qdrant (cosine similarity)")

        return True

    def extract_keywords_from_question(self, question: str) -> List[str]:
        """Extract keywords from Thai question"""
        keywords = []

        # Extract year
        for year, patterns in self.year_keywords.items():
            if any(pattern in question for pattern in patterns):
                keywords.append(f"‡∏õ‡∏µ {year}")
                keywords.append(f"‡∏ä‡∏±‡πâ‡∏ô‡∏õ‡∏µ‡∏ó‡∏µ‡πà {year}")

        # Extract semester
        for sem, patterns in self.semester_keywords.items():
            if any(pattern in question for pattern in patterns):
                if sem != "summer":
                    keywords.append(f"‡∏†‡∏≤‡∏Ñ {sem}")
                    keywords.append(f"‡∏†‡∏≤‡∏Ñ‡∏Å‡∏≤‡∏£‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏µ‡πà {sem}")
                else:
                    keywords.append("‡∏†‡∏≤‡∏Ñ‡∏§‡∏î‡∏π‡∏£‡πâ‡∏≠‡∏ô")

        # Extract major/program (lenient: check if ANY synonym appears)
        for major, synonyms in self.faculty_mappings.items():
            if any(syn.lower() in question.lower() for syn in synonyms):
                keywords.append(major)

        return keywords

    def query(self, question: str, k: int = 5, initial_k: int = 50, score_threshold: float = 0.3):
        """
        LENIENT Query Strategy:
        1. Broad search (NO pre-filtering, retrieve 50 candidates)
        2. Rerank with Cross-Encoder
        3. Post-filter + Boost + Fallback
        """
        if not self.docs_loaded:
            print("No documents loaded!")
            return None

        print(f"\n{'='*70}")
        print(f"Query: {question}")
        print(f"{'='*70}")

        # Extract keywords for BOOSTING (not filtering!)
        keywords = self.extract_keywords_from_question(question)

        if keywords:
            print(f"Keywords detected: {keywords[:5]}")

        # Stage 1: BROAD Semantic Search (ALL programs)
        query_embedding = self.embedding_model.encode(
            question,
            normalize_embeddings=True
        )

        print(f"\n[Stage 1] Semantic Search (Qdrant)")
        print(f"  Searching ALL programs (no pre-filter)")
        print(f"  Query embedding norm: {np.linalg.norm(query_embedding):.4f}")

        # Search ALL programs
        results = self.qdrant_client.search(
            collection_name=self.collection_name,
            query_vector=query_embedding.tolist(),
            limit=initial_k
        )

        if not results:
            print(f"\nNo results found")
            return None

        print(f"  Retrieved {len(results)} candidates")
        print(f"  Top 3 semantic scores: {[f'{r.score:.3f}' for r in results[:3]]}")

        # Stage 2: Reranking
        print(f"\n[Stage 2] Reranking (Cross-Encoder)")
        print(f"  Reranking {len(results)} candidates...")

        pairs = [[question, result.payload['text']] for result in results]
        rerank_scores = self.reranker.predict(pairs)

        reranked_results = [
            (result, float(rerank_score))
            for result, rerank_score in zip(results, rerank_scores)
        ]

        reranked_results.sort(key=lambda x: x[1], reverse=True)

        print(f"  Reranking complete")
        print(f"  Top 3 rerank scores: {[f'{score:.3f}' for _, score in reranked_results[:3]]}")

        # Stage 3: Post-filter + Boost
        print(f"\n[Stage 3] Post-filter + Boost + Fallback")

        filtered_results = []
        for result, rerank_score in reranked_results:
            # Keyword Boosting
            boost = 0.0
            metadata = result.payload
            text_content = metadata.get('text', '').lower()

            if keywords:
                # Check if chunk contains keywords
                for keyword in keywords:
                    if keyword.lower() in text_content:
                        boost += 0.1

                # Check if metadata matches program/faculty
                for keyword in keywords:
                    if keyword.lower() in metadata.get('program', '').lower():
                        boost += 0.2
                    if keyword.lower() in metadata.get('faculty', '').lower():
                        boost += 0.1

            final_score = rerank_score + boost
            filtered_results.append((result, final_score))

        # Sort by boosted score
        filtered_results.sort(key=lambda x: x[1], reverse=True)

        # Take top K
        final_results = filtered_results[:k]

        if not final_results:
            print("  No results after filtering!")
            return None

        print(f"  Final results: {len(final_results)}")
        print(f"  Top 3 boosted scores: {[f'{score:.3f}' for _, score in final_results[:3]]}")

        # Build context
        result_texts = []
        source_files_metadata = []
        seen_files = set()

        for i, (result, score) in enumerate(final_results):
            metadata = result.payload
            text = metadata['text']

            # Source tracking
            file_name = metadata.get('file_name', 'Unknown')
            if file_name not in seen_files:
                source_files_metadata.append({
                    'file_name': file_name,
                    'program': metadata.get('program', 'Unknown'),
                    'faculty': metadata.get('faculty', 'Unknown')
                })
                seen_files.add(file_name)

            # Add rank and source info
            chunk_info = f"[Document {i+1}] {metadata.get('program', 'Unknown')}"
            result_texts.append(f"{chunk_info}\n\n{text}")

        print(f"\n[Result Summary]")
        print(f"  Total chunks: {len(result_texts)}")
        print(f"  Unique programs: {len(seen_files)}")
        print(f"  Sources: {', '.join([m['program'] for m in source_files_metadata[:3]])}")

        # Show related programs
        if len(seen_files) > 1:
            related_programs = set()
            for metadata in [r[0].payload for r in final_results]:
                related_programs.add(metadata.get('id', ''))

            related_programs_ids = [pid for pid in related_programs if pid]

            suggestions = []
            for rel_id in list(related_programs_ids)[:3]:
                for file_path, metadata in self.catalog.items():
                    if metadata.get('id') == rel_id:
                        suggestions.append(f"- {metadata['program']}")
                        break

            if suggestions:
                result_texts.append(
                    f"\n‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ó‡∏µ‡πà‡πÄ‡∏Å‡∏µ‡πà‡∏¢‡∏ß‡∏Ç‡πâ‡∏≠‡∏á:\n" + "\n".join(suggestions)
                )

        # AUTO-APPEND: Add General Education info for year/course queries
        gen_ed_appended = False
        if re.search(r'(‡∏õ‡∏µ\s*\d+|‡∏£‡∏≤‡∏¢‡∏ß‡∏¥‡∏ä‡∏≤|‡πÄ‡∏£‡∏µ‡∏¢‡∏ô‡∏≠‡∏∞‡πÑ‡∏£|‡∏ß‡∏¥‡∏ä‡∏≤‡πÄ‡∏•‡∏∑‡∏≠‡∏Å|‡∏ß‡∏¥‡∏ä‡∏≤‡πÄ‡∏™‡∏£‡∏µ|‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ|‡∏≠‡∏¢‡∏π‡πà‡∏î‡∏µ‡∏°‡∏µ‡∏™‡∏∏‡∏Ç|‡∏™‡∏∏‡∏ô‡∏ó‡∏£‡∏µ‡∏¢‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå|‡∏ú‡∏π‡πâ‡∏õ‡∏£‡∏∞‡∏Å‡∏≠‡∏ö‡∏Å‡∏≤‡∏£|‡∏û‡∏•‡πÄ‡∏°‡∏∑‡∏≠‡∏á|‡∏†‡∏≤‡∏©‡∏≤‡πÅ‡∏•‡∏∞‡∏Å‡∏≤‡∏£‡∏™‡∏∑‡πà‡∏≠‡∏™‡∏≤‡∏£|‡πÅ‡∏ô‡∏∞‡∏ô‡∏≥)', question.lower()):
            if self.gened_catalog:
                # Check if question asks about specific Gen Ed category
                category_match = None
                category_map = {
                    '‡∏≠‡∏¢‡∏π‡πà‡∏î‡∏µ‡∏°‡∏µ‡∏™‡∏∏‡∏Ç': 'living_well',
                    '‡∏™‡∏∏‡∏ô‡∏ó‡∏£‡∏µ‡∏¢‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå': 'aesthetics',
                    '‡∏ú‡∏π‡πâ‡∏õ‡∏£‡∏∞‡∏Å‡∏≠‡∏ö‡∏Å‡∏≤‡∏£': 'entrepreneurship',
                    '‡∏û‡∏•‡πÄ‡∏°‡∏∑‡∏≠‡∏á': 'citizenship',
                    '‡∏†‡∏≤‡∏©‡∏≤': 'communication',
                    '‡∏Å‡∏≤‡∏£‡∏™‡∏∑‡πà‡∏≠‡∏™‡∏≤‡∏£': 'communication'
                }

                for keyword, cat_id in category_map.items():
                    if keyword in question.lower():
                        category_match = cat_id
                        break

                # Build Gen Ed context from JSON catalog
                gen_ed_context = f"""
[‡∏ß‡∏¥‡∏ä‡∏≤‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ - General Education]
‡∏ô‡∏±‡∏Å‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏∏‡∏Å‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ï‡πâ‡∏≠‡∏á‡∏•‡∏á‡∏ó‡∏∞‡πÄ‡∏ö‡∏µ‡∏¢‡∏ô‡∏ß‡∏¥‡∏ä‡∏≤‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ ‡∏£‡∏ß‡∏° {self.gened_catalog['credit_requirements']['total_minimum']} ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï

**‡∏ß‡∏¥‡∏ä‡∏≤‡∏ö‡∏±‡∏á‡∏Ñ‡∏±‡∏ö:**
"""
                # Add required courses
                for req in self.gened_catalog['required_courses']:
                    gen_ed_context += f"- {req['course_code']} {req['course_name_th']} ({req['credits']}) - ‡∏ö‡∏±‡∏á‡∏Ñ‡∏±‡∏ö‡∏ó‡∏∏‡∏Å‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£\n"

                gen_ed_context += "\n**‡∏ß‡∏¥‡∏ä‡∏≤‡πÄ‡∏•‡∏∑‡∏≠‡∏Å (‡πÄ‡∏•‡∏∑‡∏≠‡∏Å‡∏ï‡∏≤‡∏°‡∏Å‡∏•‡∏∏‡πà‡∏°‡∏™‡∏≤‡∏£‡∏∞):**\n"

                # Add categories with sample courses
                for category in self.gened_catalog['categories']:
                    cat_req = self.gened_catalog['credit_requirements']['by_category'].get(
                        category['category_id'], {}
                    )
                    min_credits = cat_req.get('minimum', 3)

                    # If specific category is asked, show ALL courses
                    if category_match == category['category_id']:
                        gen_ed_context += f"\n**{category['category_th']} ({category['category_en']})** - ‡∏Ç‡∏±‡πâ‡∏ô‡∏ï‡πà‡∏≥ {min_credits} ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï\n"
                        gen_ed_context += f"  {category['description']}\n"
                        gen_ed_context += f"  **‡∏£‡∏≤‡∏¢‡∏ß‡∏¥‡∏ä‡∏≤‡∏ó‡∏±‡πâ‡∏á‡∏´‡∏°‡∏î {category['total_courses']} ‡∏ß‡∏¥‡∏ä‡∏≤:**\n\n"

                        # Show ALL courses for this category
                        for course in category['courses']:
                            gen_ed_context += f"  - {course['course_code']} {course['course_name_th']} ({course['credits']})\n"
                    else:
                        # Show summary for other categories
                        gen_ed_context += f"\n**{category['category_th']} ({category['category_en']})** - ‡∏Ç‡∏±‡πâ‡∏ô‡∏ï‡πà‡∏≥ {min_credits} ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï\n"
                        gen_ed_context += f"  {category['description']}\n"

                        # Show 3-5 sample courses
                        sample_courses = category['courses'][:5]
                        for course in sample_courses:
                            gen_ed_context += f"  - {course['course_code']} {course['course_name_th']} ({course['credits']})\n"

                        if category['total_courses'] > 5:
                            gen_ed_context += f"  ... ‡πÅ‡∏•‡∏∞‡∏≠‡∏µ‡∏Å {category['total_courses'] - 5} ‡∏ß‡∏¥‡∏ä‡∏≤\n"

                gen_ed_context += f"\n**‡∏£‡∏ß‡∏°‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï:** {self.gened_catalog['credit_requirements']['required']} (‡∏ö‡∏±‡∏á‡∏Ñ‡∏±‡∏ö) + {self.gened_catalog['credit_requirements']['elective_minimum']} (‡πÄ‡∏•‡∏∑‡∏≠‡∏Å) = {self.gened_catalog['credit_requirements']['total_minimum']} ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï"

                result_texts.append(gen_ed_context)
                gen_ed_appended = True

                if category_match:
                    print(f"  ‚úì Auto-appended FULL Gen Ed info for category: {category_match}")
                else:
                    print(f"  ‚úì Auto-appended General Education info from catalog ({self.gened_catalog['total_courses']} courses)")

        # Return both context and source metadata
        context_text = "\n\n---\n\n".join(result_texts)

        # Add Gen Ed to sources metadata if appended
        if gen_ed_appended:
            source_files_metadata.append({
                'file_name': 'general_education_catalog.json',
                'program': '‡∏ß‡∏¥‡∏ä‡∏≤‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ',
                'faculty': '‡∏ó‡∏∏‡∏Å‡∏Ñ‡∏ì‡∏∞'
            })

        return {'context': context_text, 'sources': source_files_metadata}

# Initialize RAG system with LENIENT mode
print("[2/4] Initializing RAG System...")
print("Strategy: Broad search ‚Üí Rerank ‚Üí Post-filter + Boost + Fallback")
print()

rag_system = RAGSystem()

[2/4] Initializing RAG System...
Strategy: Broad search ‚Üí Rerank ‚Üí Post-filter + Boost + Fallback

Initializing RAGSystem with strategy: Broad ‚Üí Rerank ‚Üí Post-filter + Boost
‚úì RAG System initialized
  Next step: load_models() then load_documents()


## Step 7: Load Curriculum Documents

In [6]:
print("[3/4] Loading documents into RAG system...\n")

# First, load the models (embedding, reranker, Qdrant client)
print("Loading models...")
rag_system.load_models()

# Then, load the documents
print("\nLoading documents...")
rag_system.load_documents(DOCS_FOLDER)

[3/4] Loading documents into RAG system...

Loading models...
Loading embedding & reranker models...




config.json:   0%|          | 0.00/787 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

‚úì Loaded BGE-M3-Thai (1024d, normalized)


config.json:   0%|          | 0.00/795 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

‚úì Loaded BGE-Reranker-v2-m3 (Cross-Encoder)
‚úì Initialized Qdrant (in-memory)

Loading documents...
‚úì Loaded catalog v2.0: 131 programs
  Faculties: 20
  Keywords: 863
‚úì Loaded Gen Ed catalog: 204 courses
  Categories: ‡∏û‡∏•‡πÄ‡∏°‡∏∑‡∏≠‡∏á‡πÑ‡∏ó‡∏¢‡πÅ‡∏•‡∏∞‡∏û‡∏•‡πÄ‡∏°‡∏∑‡∏≠‡∏á‡πÇ‡∏•‡∏Å, ‡∏†‡∏≤‡∏©‡∏≤‡∏Å‡∏±‡∏ö‡∏Å‡∏≤‡∏£‡∏™‡∏∑‡πà‡∏≠‡∏™‡∏≤‡∏£, ‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡πÅ‡∏´‡πà‡∏á‡∏ú‡∏π‡πâ‡∏õ‡∏£‡∏∞‡∏Å‡∏≠‡∏ö‡∏Å‡∏≤‡∏£, ‡∏™‡∏∏‡∏ô‡∏ó‡∏£‡∏µ‡∏¢‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå, ‡∏≠‡∏¢‡∏π‡πà‡∏î‡∏µ‡∏°‡∏µ‡∏™‡∏∏‡∏Ç

Loading documents from ./docs...
Found 130 PDF files
Catalog has metadata for 131 programs
‚úì Created Qdrant collection: ku_curricula

Generating embeddings for 968 chunks...
  Processed 800/968 chunks

‚úì Loaded 968 chunks from 130 PDFs
  Embedding dimensions: 1024 (BGE-M3-Thai)
  Vector DB: Qdrant (cosine similarity)


True

## Step 8: Initialize Web Search System

In [7]:
class WebSearchSystem:
    @staticmethod
    def search_duckduckgo(query: str, max_results: int = 3):
        try:
            with DDGS() as ddgs:
                results = list(ddgs.text(query, max_results=max_results))
                return "\n\n".join([f"**{r['title']}**\n{r['body']}" for r in results])
        except Exception as e:
            return f"Web search failed: {str(e)}"

    @staticmethod
    def search_wikipedia(query: str):
        try:
            wikipedia.set_lang('th')
            summary = wikipedia.summary(query, sentences=3)
            return summary
        except:
            try:
                wikipedia.set_lang('en')
                summary = wikipedia.summary(query, sentences=3)
                return summary
            except Exception as e:
                return f"Wikipedia search failed: {str(e)}"

web_search = WebSearchSystem()
print("Web search system initialized")

Web search system initialized


## Step 9: Chat Function with RAG and Web Search

In [8]:
def chat_with_bot(
    message: str,
    history: List[Tuple[str, str]],
    use_rag: bool,
    use_web_search: bool,
    temperature: float,
    max_tokens: int
) -> Tuple[str, str]:
    """
    Main chat function with RAG, web search, and STREAMING support
    Returns: Generator that yields (response, log) for streaming
    """
    import time
    from transformers import TextIteratorStreamer
    from threading import Thread

    log_messages = []

    try:
        start_time = time.time()
        log_msg = f"[{time.strftime('%H:%M:%S')}] Processing query: '{message[:50]}...'"
        log_messages.append(log_msg)

        # Build context
        context = ""

        if use_rag and rag_system.docs_loaded:
            log_msg = f"[{time.strftime('%H:%M:%S')}] Searching curriculum documents..."
            log_messages.append(log_msg)

            rag_result = rag_system.query(message, k=5)  # Returns dict with 'context' and 'sources'

            if rag_result and isinstance(rag_result, dict):
                rag_context = rag_result['context']
                sources = rag_result['sources']
                word_count = len(rag_context.split())

                # Check if retrieved context is relevant (has enough content)
                if word_count < 20:
                    log_msg = f"[{time.strftime('%H:%M:%S')}] Found only {word_count} words - insufficient information"
                    log_messages.append(log_msg)
                    context += f"\n\n‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£‡∏ó‡∏µ‡πà‡∏û‡∏ö: ‡πÑ‡∏°‡πà‡πÄ‡∏û‡∏µ‡∏¢‡∏á‡∏û‡∏≠‡∏´‡∏£‡∏∑‡∏≠‡πÑ‡∏°‡πà‡πÄ‡∏Å‡∏µ‡πà‡∏¢‡∏ß‡∏Ç‡πâ‡∏≠‡∏á (‡πÄ‡∏û‡∏µ‡∏¢‡∏á {word_count} ‡∏Ñ‡∏≥)"
                else:
                    context += f"\n\n‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£‡∏à‡∏≤‡∏Å‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏°‡∏´‡∏≤‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏•‡∏±‡∏¢‡πÄ‡∏Å‡∏©‡∏ï‡∏£‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå:\n{rag_context}"

                    # Log source documents
                    log_msg = f"[{time.strftime('%H:%M:%S')}] Found {word_count} words from {len(sources)} document(s)"
                    log_messages.append(log_msg)

                    # Add detailed source information to log
                    unique_sources = []
                    for src in sources:
                        source_label = f"{src['program']} ({src['file_name']})" if src['program'] else src['file_name']
                        if source_label not in unique_sources:
                            unique_sources.append(source_label)

                    if unique_sources:
                        log_msg = f"[{time.strftime('%H:%M:%S')}] Sources used:"
                        log_messages.append(log_msg)
                        for i, src in enumerate(unique_sources[:5], 1):
                            log_msg = f"    {i}. {src}"
                            log_messages.append(log_msg)
                        if len(unique_sources) > 5:
                            log_msg = f"    ... and {len(unique_sources) - 5} more"
                            log_messages.append(log_msg)
            else:
                log_msg = f"[{time.strftime('%H:%M:%S')}] No relevant documents found"
                log_messages.append(log_msg)
                context += "\n\n‡πÑ‡∏°‡πà‡∏û‡∏ö‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£‡∏ó‡∏µ‡πà‡πÄ‡∏Å‡∏µ‡πà‡∏¢‡∏ß‡∏Ç‡πâ‡∏≠‡∏á"

        if use_web_search:
            log_msg = f"[{time.strftime('%H:%M:%S')}] Searching web..."
            log_messages.append(log_msg)

            web_results = web_search.search_duckduckgo(message, max_results=2)
            context += f"\n\n‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏à‡∏≤‡∏Å‡∏≠‡∏¥‡∏ô‡πÄ‡∏ó‡∏≠‡∏£‡πå‡πÄ‡∏ô‡πá‡∏ï:\n{web_results}"

            log_msg = f"[{time.strftime('%H:%M:%S')}] Web search completed"
            log_messages.append(log_msg)

        # Build conversation history
        conversation = ""
        for user_msg, bot_msg in history[-3:]:  # Last 3 exchanges
            conversation += f"‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°: {user_msg}\n‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö: {bot_msg}\n\n"

        # Build prompt with strict instructions and few-shot examples
        system_instruction = """‡∏Ñ‡∏∏‡∏ì‡πÄ‡∏õ‡πá‡∏ô‡∏ú‡∏π‡πâ‡∏ä‡πà‡∏ß‡∏¢‡∏ï‡∏≠‡∏ö‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°‡πÄ‡∏Å‡∏µ‡πà‡∏¢‡∏ß‡∏Å‡∏±‡∏ö‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏°‡∏´‡∏≤‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏•‡∏±‡∏¢‡πÄ‡∏Å‡∏©‡∏ï‡∏£‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå

‡∏Å‡∏é‡∏™‡∏≥‡∏Ñ‡∏±‡∏ç:
1) ‡∏ï‡∏≠‡∏ö‡πÄ‡∏õ‡πá‡∏ô‡∏†‡∏≤‡∏©‡∏≤‡πÑ‡∏ó‡∏¢‡πÄ‡∏ó‡πà‡∏≤‡∏ô‡∏±‡πâ‡∏ô ‡πÅ‡∏•‡∏∞‡πÑ‡∏°‡πà‡∏≠‡∏ò‡∏¥‡∏ö‡∏≤‡∏¢‡∏Å‡∏£‡∏∞‡∏ö‡∏ß‡∏ô‡∏Å‡∏≤‡∏£‡∏Ñ‡∏¥‡∏î
2) ‡∏´‡∏•‡∏µ‡∏Å‡πÄ‡∏•‡∏µ‡πà‡∏¢‡∏á‡∏õ‡∏£‡∏∞‡πÇ‡∏¢‡∏Ñ‡πÄ‡∏ä‡∏¥‡∏á‡∏™‡∏±‡πà‡∏á‡πÄ‡∏ä‡πà‡∏ô "We need to", "Let's ...", "Then ..."
3) ‡πÉ‡∏ä‡πâ‡πÄ‡∏â‡∏û‡∏≤‡∏∞‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏à‡∏≤‡∏Å‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£‡∏ó‡∏µ‡πà‡πÉ‡∏´‡πâ‡πÑ‡∏ß‡πâ (‡∏´‡πâ‡∏≤‡∏°‡πÄ‡∏î‡∏≤)
4) ‡∏™‡∏£‡∏∏‡∏õ‡πÄ‡∏â‡∏û‡∏≤‡∏∞ ‚Äú‡πÄ‡∏ô‡∏∑‡πâ‡∏≠‡∏´‡∏≤‡∏ó‡∏µ‡πà‡πÄ‡∏Å‡∏µ‡πà‡∏¢‡∏ß‡∏Ç‡πâ‡∏≠‡∏á‡∏Å‡∏±‡∏ö‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°‚Äù ‡πÄ‡∏õ‡πá‡∏ô‡∏´‡∏•‡∏±‡∏Å ‡∏ñ‡πâ‡∏≤‡∏°‡∏µ‡∏´‡∏•‡∏≤‡∏¢‡∏™‡πà‡∏ß‡∏ô ‡πÉ‡∏´‡πâ‡∏à‡∏±‡∏î‡∏•‡∏≥‡∏î‡∏±‡∏ö‡∏à‡∏≤‡∏Å‡∏™‡∏≥‡∏Ñ‡∏±‡∏ç‡∏™‡∏∏‡∏î‡∏Å‡πà‡∏≠‡∏ô
5) ‡∏ñ‡πâ‡∏≤‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°‡∏£‡∏∞‡∏ö‡∏∏‡∏õ‡∏µ/‡∏ä‡∏±‡πâ‡∏ô‡∏õ‡∏µ ‡πÉ‡∏´‡πâ‡∏£‡∏∞‡∏ö‡∏∏‡∏†‡∏≤‡∏Ñ‡∏Å‡∏≤‡∏£‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏µ‡πà 1 ‡πÅ‡∏•‡∏∞ 2 (‡∏ñ‡πâ‡∏≤‡∏°‡∏µ‡πÉ‡∏ô‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£) ‡πÅ‡∏•‡∏∞‡∏£‡∏ß‡∏°‡∏ß‡∏¥‡∏ä‡∏≤‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ‡∏ó‡∏µ‡πà‡πÄ‡∏Å‡∏µ‡πà‡∏¢‡∏ß‡∏Ç‡πâ‡∏≠‡∏á
7. ‡∏ñ‡πâ‡∏≤‡∏°‡∏µ‡∏Å‡∏≤‡∏£‡∏Å‡∏•‡πà‡∏≤‡∏ß‡∏ñ‡∏∂‡∏á‡πÉ‡∏´‡πâ‡∏£‡∏ß‡∏°‡∏ß‡∏¥‡∏ä‡∏≤‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ ‡πÄ‡∏ä‡πà‡∏ô ‡∏†‡∏≤‡∏©‡∏≤‡πÑ‡∏ó‡∏¢ ‡∏†‡∏≤‡∏©‡∏≤‡∏≠‡∏±‡∏á‡∏Å‡∏§‡∏© ‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡πÅ‡∏´‡πà‡∏á‡πÅ‡∏ú‡πà‡∏ô‡∏î‡∏¥‡∏ô ‡∏î‡πâ‡∏ß‡∏¢
8. ‡∏´‡πâ‡∏≤‡∏°‡∏ï‡∏±‡∏î‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏≠‡∏≠‡∏Å
9.‡∏ï‡∏£‡∏ß‡∏à‡∏™‡∏≠‡∏ö‡πÉ‡∏´‡πâ‡πÅ‡∏ô‡πà‡πÉ‡∏à‡∏ß‡πà‡∏≤‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏£‡∏≤‡∏¢‡∏ß‡∏¥‡∏ä‡∏≤‡∏ô‡∏±‡πâ‡∏ô‡πÄ‡∏õ‡πá‡∏ô‡∏Ç‡∏≠‡∏á‡∏õ‡∏µ‡πÑ‡∏´‡∏ô

‡∏ï‡∏±‡∏ß‡∏≠‡∏¢‡πà‡∏≤‡∏á‡∏ó‡∏µ‡πà 1:
‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°: "‡∏õ‡∏µ 1 ‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå‡∏ï‡πâ‡∏≠‡∏á‡πÄ‡∏£‡∏µ‡∏¢‡∏ô‡∏≠‡∏∞‡πÑ‡∏£‡∏ö‡πâ‡∏≤‡∏á"
‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£: "‡∏õ‡∏µ‡∏ó‡∏µ‡πà 1 ‡∏†‡∏≤‡∏Ñ 1: 01417111 ‡πÅ‡∏Ñ‡∏•‡∏Ñ‡∏π‡∏•‡∏±‡∏™ I (3), 01418111 ‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå‡πÄ‡∏ö‡∏∑‡πâ‡∏≠‡∏á‡∏ï‡πâ‡∏ô (2), 01418112 ‡πÅ‡∏ô‡∏ß‡∏Ñ‡∏¥‡∏î‡∏Å‡∏≤‡∏£‡πÇ‡∏õ‡∏£‡πÅ‡∏Å‡∏£‡∏° (3), 01418141 ‡∏ó‡∏£‡∏±‡∏û‡∏¢‡πå‡∏™‡∏¥‡∏ô‡∏ó‡∏≤‡∏á‡∏õ‡∏±‡∏ç‡∏ç‡∏≤ (3), 01999111 ‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡πÅ‡∏´‡πà‡∏á‡πÅ‡∏ú‡πà‡∏ô‡∏î‡∏¥‡∏ô (2), ‡∏†‡∏≤‡∏©‡∏≤‡πÑ‡∏ó‡∏¢ (3), ‡∏†‡∏≤‡∏©‡∏≤‡∏≠‡∏±‡∏á‡∏Å‡∏§‡∏© (3). ‡∏†‡∏≤‡∏Ñ 2: 01417322 ‡∏û‡∏µ‡∏ä‡∏Ñ‡∏ì‡∏¥‡∏ï‡πÄ‡∏ä‡∏¥‡∏á‡πÄ‡∏™‡πâ‡∏ô‡∏û‡∏∑‡πâ‡∏ô‡∏ê‡∏≤‡∏ô (3), 01418113 ‡∏Å‡∏≤‡∏£‡πÇ‡∏õ‡∏£‡πÅ‡∏Å‡∏£‡∏°‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå (3), 01418131 ‡∏Å‡∏≤‡∏£‡πÇ‡∏õ‡∏£‡πÅ‡∏Å‡∏£‡∏°‡∏ó‡∏≤‡∏á‡∏™‡∏ñ‡∏¥‡∏ï‡∏¥ (3), 01418132 ‡∏´‡∏•‡∏±‡∏Å‡∏°‡∏π‡∏•‡∏Å‡∏≤‡∏£‡∏Ñ‡∏ì‡∏ô‡∏≤ (3), ‡∏Å‡∏¥‡∏à‡∏Å‡∏£‡∏£‡∏°‡∏û‡∏•‡∏®‡∏∂‡∏Å‡∏©‡∏≤ (1), ‡∏ß‡∏¥‡∏ä‡∏≤‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ 2 ‡∏ß‡∏¥‡∏ä‡∏≤ (6)"

‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö‡∏ó‡∏µ‡πà‡∏ñ‡∏π‡∏Å:
"‡∏õ‡∏µ‡∏ó‡∏µ‡πà 1 ‡∏Ç‡∏≠‡∏á‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå ‡∏°‡∏µ‡∏£‡∏≤‡∏¢‡∏ß‡∏¥‡∏ä‡∏≤‡∏î‡∏±‡∏á‡∏ô‡∏µ‡πâ:

**‡∏†‡∏≤‡∏Ñ‡∏Å‡∏≤‡∏£‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏µ‡πà 1:**
1. 01417111 ‡πÅ‡∏Ñ‡∏•‡∏Ñ‡∏π‡∏•‡∏±‡∏™ I (3 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
2. 01418111 ‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå‡πÄ‡∏ö‡∏∑‡πâ‡∏≠‡∏á‡∏ï‡πâ‡∏ô (2 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
3. 01418112 ‡πÅ‡∏ô‡∏ß‡∏Ñ‡∏¥‡∏î‡∏Å‡∏≤‡∏£‡πÇ‡∏õ‡∏£‡πÅ‡∏Å‡∏£‡∏°‡πÄ‡∏ö‡∏∑‡πâ‡∏≠‡∏á‡∏ï‡πâ‡∏ô (3 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
4. 01418141 ‡∏ó‡∏£‡∏±‡∏û‡∏¢‡πå‡∏™‡∏¥‡∏ô‡∏ó‡∏≤‡∏á‡∏õ‡∏±‡∏ç‡∏ç‡∏≤‡πÅ‡∏•‡∏∞‡∏à‡∏£‡∏£‡∏¢‡∏≤‡∏ö‡∏£‡∏£‡∏ì‡∏ß‡∏¥‡∏ä‡∏≤‡∏ä‡∏µ‡∏û (3 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
5. 01999111 ‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡πÅ‡∏´‡πà‡∏á‡πÅ‡∏ú‡πà‡∏ô‡∏î‡∏¥‡∏ô (2 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
6. ‡∏ß‡∏¥‡∏ä‡∏≤‡∏†‡∏≤‡∏©‡∏≤‡πÑ‡∏ó‡∏¢ (3 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
7. ‡∏ß‡∏¥‡∏ä‡∏≤‡∏†‡∏≤‡∏©‡∏≤‡∏≠‡∏±‡∏á‡∏Å‡∏§‡∏© (3 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)

‡∏£‡∏ß‡∏° 19 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï

**‡∏†‡∏≤‡∏Ñ‡∏Å‡∏≤‡∏£‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏µ‡πà 2:**
1. 01417322 ‡∏û‡∏µ‡∏ä‡∏Ñ‡∏ì‡∏¥‡∏ï‡πÄ‡∏ä‡∏¥‡∏á‡πÄ‡∏™‡πâ‡∏ô‡∏û‡∏∑‡πâ‡∏ô‡∏ê‡∏≤‡∏ô (3 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
2. 01418113 ‡∏Å‡∏≤‡∏£‡πÇ‡∏õ‡∏£‡πÅ‡∏Å‡∏£‡∏°‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå (3 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
3. 01418131 ‡∏Å‡∏≤‡∏£‡πÇ‡∏õ‡∏£‡πÅ‡∏Å‡∏£‡∏°‡∏ó‡∏≤‡∏á‡∏™‡∏ñ‡∏¥‡∏ï‡∏¥ (3 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
4. 01418132 ‡∏´‡∏•‡∏±‡∏Å‡∏°‡∏π‡∏•‡∏Å‡∏≤‡∏£‡∏Ñ‡∏ì‡∏ô‡∏≤ (3 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
5. 01175xxx ‡∏Å‡∏¥‡∏à‡∏Å‡∏£‡∏£‡∏°‡∏û‡∏•‡∏®‡∏∂‡∏Å‡∏©‡∏≤ (1 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
6. ‡∏ß‡∏¥‡∏ä‡∏≤‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ‡∏Å‡∏•‡∏∏‡πà‡∏°‡∏™‡∏≤‡∏£‡∏∞‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡πÅ‡∏´‡πà‡∏á‡∏ú‡∏π‡πâ‡∏õ‡∏£‡∏∞‡∏Å‡∏≠‡∏ö‡∏Å‡∏≤‡∏£ (3 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)
7. ‡∏ß‡∏¥‡∏ä‡∏≤‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ‡∏Å‡∏•‡∏∏‡πà‡∏°‡∏™‡∏≤‡∏£‡∏∞‡∏™‡∏∏‡∏ô‡∏ó‡∏£‡∏µ‡∏¢‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå (3 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï)

‡∏£‡∏ß‡∏° 19 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï

**‡∏£‡∏ß‡∏°‡∏ó‡∏±‡πâ‡∏á‡∏õ‡∏µ:** 38 ‡∏´‡∏ô‡πà‡∏ß‡∏¢‡∏Å‡∏¥‡∏ï"

‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö‡∏ó‡∏µ‡πà‡∏ú‡∏¥‡∏î:
- ‡∏ï‡∏≠‡∏ö‡πÄ‡∏â‡∏û‡∏≤‡∏∞‡∏†‡∏≤‡∏Ñ 1 (‡∏ï‡πâ‡∏≠‡∏á‡∏ï‡∏≠‡∏ö‡∏ó‡∏±‡πâ‡∏á 2 ‡∏†‡∏≤‡∏Ñ!)
- ‡∏Ç‡πâ‡∏≤‡∏°‡∏ß‡∏¥‡∏ä‡∏≤‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ
- ‡∏°‡∏µ‡∏Ñ‡∏≥‡∏ß‡πà‡∏≤ "We have a long text..." ‡∏´‡∏£‡∏∑‡∏≠ "Let's produce final answer"
- ‡∏ö‡∏≠‡∏Å‡∏ß‡πà‡∏≤ "‡πÑ‡∏°‡πà‡∏°‡∏µ‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏†‡∏≤‡∏Ñ 2" ‡∏ó‡∏±‡πâ‡∏á‡∏ó‡∏µ‡πà‡∏°‡∏µ‡πÉ‡∏ô‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£

‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö‡∏ó‡∏µ‡πà‡∏ú‡∏¥‡∏î:
- ‡∏ï‡∏≠‡∏ö‡πÄ‡∏â‡∏û‡∏≤‡∏∞‡∏†‡∏≤‡∏Ñ 1 (‡∏Ñ‡∏ß‡∏£‡∏ï‡∏≠‡∏ö‡∏ó‡∏±‡πâ‡∏á‡∏†‡∏≤‡∏Ñ 1 ‡πÅ‡∏•‡∏∞ 2)
- ‡∏Ç‡πâ‡∏≤‡∏°‡∏ß‡∏¥‡∏ä‡∏≤‡∏®‡∏∂‡∏Å‡∏©‡∏≤‡∏ó‡∏±‡πà‡∏ß‡πÑ‡∏õ (‡∏ï‡πâ‡∏≠‡∏á‡∏£‡∏ß‡∏°‡∏ó‡∏∏‡∏Å‡∏ß‡∏¥‡∏ä‡∏≤)
- "We have a long text..." (‡∏´‡πâ‡∏≤‡∏°‡πÅ‡∏™‡∏î‡∏á‡∏Å‡∏£‡∏∞‡∏ö‡∏ß‡∏ô‡∏Å‡∏≤‡∏£‡∏Ñ‡∏¥‡∏î)

‡∏ï‡∏±‡∏ß‡∏≠‡∏¢‡πà‡∏≤‡∏á‡∏ó‡∏µ‡πà 2:
‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°: "‡∏Ñ‡πà‡∏≤‡πÄ‡∏ó‡∏≠‡∏°‡∏Ñ‡∏ì‡∏∞‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡πÄ‡∏ó‡πà‡∏≤‡πÑ‡∏´‡∏£‡πà"
‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£: "‡πÑ‡∏°‡πà‡∏°‡∏µ‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏Ñ‡πà‡∏≤‡πÄ‡∏ó‡∏≠‡∏°"
‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö‡∏ó‡∏µ‡πà‡∏ñ‡∏π‡∏Å:
"‡∏Ç‡∏≠‡∏≠‡∏†‡∏±‡∏¢‡∏Ñ‡πà‡∏∞ ‡πÑ‡∏°‡πà‡∏û‡∏ö‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏•‡∏Ñ‡πà‡∏≤‡πÄ‡∏ó‡∏≠‡∏°‡πÉ‡∏ô‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£ ‡∏Å‡∏£‡∏∏‡∏ì‡∏≤‡∏ï‡∏¥‡∏î‡∏ï‡πà‡∏≠‡∏™‡∏≥‡∏ô‡∏±‡∏Å‡∏ó‡∏∞‡πÄ‡∏ö‡∏µ‡∏¢‡∏ô‡πÅ‡∏•‡∏∞‡∏õ‡∏£‡∏∞‡∏°‡∏ß‡∏•‡∏ú‡∏• ‡∏°‡∏´‡∏≤‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏•‡∏±‡∏¢‡πÄ‡∏Å‡∏©‡∏ï‡∏£‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå ‡πÇ‡∏ó‡∏£. 02-942-8000"

‡∏ï‡∏±‡∏ß‡∏≠‡∏¢‡πà‡∏≤‡∏á‡∏ó‡∏µ‡πà 3:
‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°: "‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå‡∏ï‡πà‡∏≤‡∏á‡∏à‡∏≤‡∏Å‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå‡∏≠‡∏¢‡πà‡∏≤‡∏á‡πÑ‡∏£"
‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£: "‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå ‡πÄ‡∏ô‡πâ‡∏ô‡∏Æ‡∏≤‡∏£‡πå‡∏î‡πÅ‡∏ß‡∏£‡πå‡πÅ‡∏•‡∏∞‡∏£‡∏∞‡∏ö‡∏ö‡∏ù‡∏±‡∏á‡∏ï‡∏±‡∏ß ‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå ‡πÄ‡∏ô‡πâ‡∏ô‡∏ã‡∏≠‡∏ü‡∏ï‡πå‡πÅ‡∏ß‡∏£‡πå‡πÅ‡∏•‡∏∞‡∏≠‡∏±‡∏•‡∏Å‡∏≠‡∏£‡∏¥‡∏ó‡∏∂‡∏°"
‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö‡∏ó‡∏µ‡πà‡∏ñ‡∏π‡∏Å:
"‡∏Ñ‡∏ß‡∏≤‡∏°‡πÅ‡∏ï‡∏Å‡∏ï‡πà‡∏≤‡∏á‡∏£‡∏∞‡∏´‡∏ß‡πà‡∏≤‡∏á 2 ‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£:

**‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå:**
- ‡πÄ‡∏ô‡πâ‡∏ô‡∏î‡πâ‡∏≤‡∏ô‡∏Æ‡∏≤‡∏£‡πå‡∏î‡πÅ‡∏ß‡∏£‡πå‡πÅ‡∏•‡∏∞‡∏£‡∏∞‡∏ö‡∏ö‡∏ù‡∏±‡∏á‡∏ï‡∏±‡∏ß
- ‡πÄ‡∏£‡∏µ‡∏¢‡∏ô‡∏ß‡∏¥‡∏ä‡∏≤‡∏ß‡∏á‡∏à‡∏£‡∏î‡∏¥‡∏à‡∏¥‡∏ó‡∏±‡∏• ‡πÑ‡∏°‡πÇ‡∏Ñ‡∏£‡πÇ‡∏õ‡∏£‡πÄ‡∏ã‡∏™‡πÄ‡∏ã‡∏≠‡∏£‡πå

**‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå:**
- ‡πÄ‡∏ô‡πâ‡∏ô‡∏î‡πâ‡∏≤‡∏ô‡∏ã‡∏≠‡∏ü‡∏ï‡πå‡πÅ‡∏ß‡∏£‡πå‡πÅ‡∏•‡∏∞‡∏≠‡∏±‡∏•‡∏Å‡∏≠‡∏£‡∏¥‡∏ó‡∏∂‡∏°
- ‡πÄ‡∏£‡∏µ‡∏¢‡∏ô‡∏ß‡∏¥‡∏ä‡∏≤‡πÇ‡∏Ñ‡∏£‡∏á‡∏™‡∏£‡πâ‡∏≤‡∏á‡∏Ç‡πâ‡∏≠‡∏°‡∏π‡∏• ‡∏õ‡∏±‡∏ç‡∏ç‡∏≤‡∏õ‡∏£‡∏∞‡∏î‡∏¥‡∏©‡∏ê‡πå"

‡πÄ‡∏£‡∏¥‡πà‡∏°‡∏ï‡∏≠‡∏ö‡πÄ‡∏•‡∏¢:
"""

        user_query = f"""
{conversation}

‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°: {message}

‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£:
{context if context else "‡πÑ‡∏°‡πà‡∏û‡∏ö‡πÄ‡∏≠‡∏Å‡∏™‡∏≤‡∏£"}

‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö:"""

        prompt = system_instruction + user_query

        # Generate response with STREAMING
        log_msg = f"[{time.strftime('%H:%M:%S')}] Generating response with streaming (max {max_tokens} tokens)..."
        log_messages.append(log_msg)

        inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

        # Setup streaming
        streamer = TextIteratorStreamer(
            tokenizer,
            skip_prompt=True,
            skip_special_tokens=True
        )

        generation_kwargs = dict(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=temperature,
            do_sample=True,
            top_p=0.9,
            top_k=50,
            pad_token_id=tokenizer.eos_token_id,
            use_cache=True,
            streamer=streamer
        )

        # Start generation in separate thread
        gen_start = time.time()
        thread = Thread(target=model.generate, kwargs=generation_kwargs)
        thread.start()

        log_msg = f"[{time.strftime('%H:%M:%S')}] üîÑ Streaming started..."
        log_messages.append(log_msg)

        # Stream tokens as they are generated
        full_response = ""
        token_count = 0

        import re

        for new_text in streamer:
            full_response += new_text
            token_count += 1

            # Yield partial response for real-time display
            # Apply post-processing on the fly
            display_text = full_response

            # Basic cleanup during streaming
            if "‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö:" in display_text:
                display_text = display_text.split("‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö:")[-1].strip()

            yield display_text, "\n".join(log_messages)

        thread.join()  # Wait for generation to complete

        gen_time = time.time() - gen_start
        tokens_per_sec = token_count / gen_time if gen_time > 0 else 0

        log_msg = f"[{time.strftime('%H:%M:%S')}] ‚úÖ Streaming completed: {token_count} tokens in {gen_time:.2f}s ({tokens_per_sec:.1f} tok/s)"
        log_messages.append(log_msg)

        # Final post-processing
        answer = full_response

        # Extract answer (remove prompt)
        if "‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö:" in answer:
            answer = answer.split("‡∏Ñ‡∏≥‡∏ï‡∏≠‡∏ö:")[-1].strip()

        # POST-PROCESSING: Remove chain of thought in English (ENHANCED)
        original_answer = answer

        # STAGE 1: Remove model control tokens
        answer = re.sub(r'(assistant|user|system)(final|response|answer)?\*{0,2}', '', answer, flags=re.IGNORECASE)

        # STAGE 2: Remove English thinking patterns (TARGETED - preserve content)
        # Only remove leading English chain-of-thought before first Thai content
        thinking_patterns = [
            r'^.*?(We have to|We need to|We can|We should|We must|Let\'s produce|Let\'s answer).*?\n',
            r'^.*?(The (user|question|document) (is asking|wants|needs)).*?\n',
            r'^.*?Then (list|answer|provide).*?\n',
        ]

        for pattern in thinking_patterns:
            answer = re.sub(pattern, '', answer, flags=re.IGNORECASE)

        # STAGE 3: Line-by-line filtering (PRESERVE Thai content)
        lines = answer.split('\n')
        filtered_lines = []
        found_thai_header = False  # Track if we've found main Thai content

        for line in lines:
            line_stripped = line.strip()

            # Skip empty lines at the start
            if not line_stripped and not found_thai_header:
                continue

            # Check if this line starts the real answer
            is_thai_header = (
                line_stripped.startswith('**‡∏õ‡∏µ') or
                line_stripped.startswith('‡∏õ‡∏µ‡∏ó‡∏µ‡πà') or
                line_stripped.startswith('**‡∏†‡∏≤‡∏Ñ') or
                line_stripped.startswith('‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£') or
                line_stripped.startswith('‡∏£‡∏≤‡∏¢‡∏ß‡∏¥‡∏ä‡∏≤') or
                line_stripped.startswith('‡∏Ç‡∏≠‡∏≠‡∏†‡∏±‡∏¢') or
                re.match(r'^\d+\.', line_stripped) or  # Numbered list
                line_stripped.startswith('|')  # Table
            )

            # Once we find Thai header, keep ALL remaining lines
            if is_thai_header:
                found_thai_header = True

            # Skip English thinking ONLY before Thai content starts
            if not found_thai_header:
                is_english_thinking = re.match(
                    r'^(We |Let\'s |They |The |This |Should |Must |Need |Then )',
                    line_stripped,
                    re.IGNORECASE
                )
                if is_english_thinking:
                    continue

            # Keep this line
            filtered_lines.append(line)

        answer = '\n'.join(filtered_lines).strip()

        # STAGE 4: Remove leading English junk (TARGETED)
        # Only remove leading English before first Thai markdown/content
        match = re.search(r'(^\*\*[‡∏Å-‡πô]|^‡∏õ‡∏µ|^‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£|^‡∏£‡∏≤‡∏¢‡∏ß‡∏¥‡∏ä‡∏≤|^‡∏Ç‡∏≠‡∏≠‡∏†‡∏±‡∏¢|^\|.*\||^\d+\.\s+\d+)', answer, re.MULTILINE)
        if match and match.start() > 0:
            # Found Thai content, remove everything before it
            removed_text = answer[:match.start()].strip()
            if removed_text:
                answer = answer[match.start():].strip()

        # STAGE 5: Remove trailing incomplete sentences (optional)
        # If answer ends mid-sentence, keep it (don't truncate valid content)

        if answer != original_answer:
            log_msg = f"[{time.strftime('%H:%M:%S')}] ‚úÇÔ∏è Removed English chain-of-thought"
            log_messages.append(log_msg)

        total_time = time.time() - start_time

        log_msg = f"[{time.strftime('%H:%M:%S')}] Total time: {total_time:.2f}s"
        log_messages.append(log_msg)

        # Final yield with cleaned answer
        yield answer, "\n".join(log_messages)

    except Exception as e:
        import traceback
        error_traceback = traceback.format_exc()
        error_log = "\n".join(log_messages) + f"\n[{time.strftime('%H:%M:%S')}] ERROR: {str(e)}\n{error_traceback}"
        yield f"Error: {str(e)}", error_log

print("Chat function initialized with STREAMING support ‚úì")

Chat function initialized with STREAMING support ‚úì


## Step 10: Launch Gradio Demo Interface

In [None]:
import gradio as gr

print("\n[4/4] Launching Gradio interface...\n")

# Create Gradio interface
with gr.Blocks(
    title="KUchat - Kasetsart University AI Assistant",
    theme=gr.themes.Soft()
) as demo:

    gr.Markdown("""
    # KUchat - Kasetsart University AI Assistant

    ‡∏ú‡∏π‡πâ‡∏ä‡πà‡∏ß‡∏¢‡∏ï‡∏≠‡∏ö‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°‡πÄ‡∏Å‡∏µ‡πà‡∏¢‡∏ß‡∏Å‡∏±‡∏ö‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏°‡∏´‡∏≤‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏•‡∏±‡∏¢‡πÄ‡∏Å‡∏©‡∏ï‡∏£‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå

    **Powered by GPT-OSS-20B (4-bit, Unsloth) on A100 GPU**

    ---
    """)

    with gr.Row():
        with gr.Column(scale=3):
            chatbot = gr.Chatbot(
                height=400,
                label="Chat History",
                show_copy_button=True,
                type="messages"
            )

            # Log display
            log_box = gr.Textbox(
                label="System Log",
                lines=4,
                max_lines=4,
                interactive=False,
                show_copy_button=True
            )

            msg = gr.Textbox(
                label="Your Question",
                placeholder="‡∏ñ‡∏≤‡∏°‡∏Ñ‡∏≥‡∏ñ‡∏≤‡∏°‡πÄ‡∏Å‡∏µ‡πà‡∏¢‡∏ß‡∏Å‡∏±‡∏ö‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£ ‡πÄ‡∏ä‡πà‡∏ô '‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå‡∏°‡∏µ‡∏≠‡∏∞‡πÑ‡∏£‡∏ö‡πâ‡∏≤‡∏á'",
                lines=2
            )

            with gr.Row():
                submit_btn = gr.Button("Send", variant="primary")
                clear_btn = gr.Button("Clear")

        with gr.Column(scale=1):
            gr.Markdown("### Settings")

            use_rag = gr.Checkbox(
                label="Use RAG (Curriculum Documents)",
                value=True,
                info="Search in curriculum documents"
            )

            use_web_search = gr.Checkbox(
                label="Use Web Search",
                value=False,
                info="Search online for latest information"
            )

            temperature = gr.Slider(
                minimum=0.1,
                maximum=1.0,
                value=0.7,
                step=0.1,
                label="Temperature",
                info="Higher values increase creativity"
            )

            max_tokens = gr.Slider(
                minimum=128,
                maximum=6000,   # 24K tokens ‡∏™‡∏≥‡∏´‡∏£‡∏±‡∏ö output (‡πÄ‡∏´‡∏•‡∏∑‡∏≠ 8K ‡∏™‡∏≥‡∏´‡∏£‡∏±‡∏ö prompt)
                value=2048,      # ‡∏Ñ‡πà‡∏≤‡πÄ‡∏£‡∏¥‡πà‡∏°‡∏ï‡πâ‡∏ô 2048 tokens
                step=256,
                label="Max Tokens",
                info="Maximum response length"
            )

            gr.Markdown("""
            ---
            ### System Information
            - **Model**: GPT-OSS-20B
            - **Parameters**: 20B (4-bit)
            - **GPU**: A100 80GB
            - **VRAM Usage**: ~12GB (model) + ~10GB (RAG) = ~22GB total
            - **Free VRAM**: ~58GB
            - **Inference Speed**: 40-80 tokens/sec
            - **Optimization**: Unsloth Framework
            - **Vector DB**: Qdrant
            - **Embedding**: BGE-M3-Thai (1024d)
            - **Documents**: 131 programs
            """)

    gr.Markdown("""
    ---
    ### Example Questions:
    - ‡∏´‡∏•‡∏±‡∏Å‡∏™‡∏π‡∏ï‡∏£‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå‡∏°‡∏µ‡∏≠‡∏∞‡πÑ‡∏£‡∏ö‡πâ‡∏≤‡∏á
    - ‡∏Ñ‡∏ì‡∏∞‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏®‡∏≤‡∏™‡∏ï‡∏£‡πå‡∏°‡∏µ‡∏Å‡∏µ‡πà‡∏™‡∏≤‡∏Ç‡∏≤
    - ‡∏Ñ‡πà‡∏≤‡πÄ‡∏ó‡∏≠‡∏°‡∏Ñ‡∏ì‡∏∞‡∏ö‡∏£‡∏¥‡∏´‡∏≤‡∏£‡∏ò‡∏∏‡∏£‡∏Å‡∏¥‡∏à‡πÄ‡∏ó‡πà‡∏≤‡πÑ‡∏´‡∏£‡πà
    - ‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå‡∏ï‡πà‡∏≤‡∏á‡∏à‡∏≤‡∏Å‡∏ß‡∏¥‡∏®‡∏ß‡∏Å‡∏£‡∏£‡∏°‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå‡∏≠‡∏¢‡πà‡∏≤‡∏á‡πÑ‡∏£
    """)

    # Chat function with STREAMING support
    def respond(message, chat_history, use_rag, use_web_search, temperature, max_tokens):
        # Convert chat_history from messages format to tuples for chat_with_bot
        history_tuples = [(msg["content"], resp["content"])
                          for msg, resp in zip(chat_history[::2], chat_history[1::2])] if chat_history else []

        # Add user message immediately
        chat_history.append({"role": "user", "content": message})

        # Add empty assistant message for streaming
        chat_history.append({"role": "assistant", "content": ""})

        # Stream the response
        for bot_message, log in chat_with_bot(
            message, history_tuples, use_rag, use_web_search, temperature, max_tokens
        ):
            # Update assistant message with streaming content
            chat_history[-1]["content"] = bot_message

            # Yield updated chat history and log for real-time display
            yield "", chat_history, log

    # Event handlers
    submit_btn.click(
        respond,
        inputs=[msg, chatbot, use_rag, use_web_search, temperature, max_tokens],
        outputs=[msg, chatbot, log_box]
    )

    msg.submit(
        respond,
        inputs=[msg, chatbot, use_rag, use_web_search, temperature, max_tokens],
        outputs=[msg, chatbot, log_box]
    )

    clear_btn.click(lambda: ([], ""), None, [chatbot, log_box], queue=False)

# Launch with public URL
print("="*60)
print("LAUNCHING KUCHAT DEMO")
print("="*60)
print("\nStarting Gradio interface...\n")

demo.launch(
    share=True,  # Creates public URL
    debug=True,  # Show errors in Colab notebook
    show_error=True,
    server_port=7860
)

print("\n" + "="*60)
print("KUCHAT IS LIVE")
print("="*60)
print("\nPublic URL generated above")
print("Share the URL with anyone to use the chatbot")
print("Keep this cell running to keep the demo alive")
print("="*60)


[4/4] Launching Gradio interface...

LAUNCHING KUCHAT DEMO

Starting Gradio interface...

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://0b8dacf857abb148b0.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)



Query: ‡∏õ‡∏µ 1 ‡∏ß‡∏¥‡∏®‡∏ß‡∏∞‡∏Ñ‡∏≠‡∏°‡πÄ‡∏£‡∏µ‡∏¢‡∏ô‡∏≠‡∏∞‡πÑ‡∏£‡∏ö‡πâ‡∏≤‡∏á
Keywords detected: ['‡∏õ‡∏µ 1', '‡∏ä‡∏±‡πâ‡∏ô‡∏õ‡∏µ‡∏ó‡∏µ‡πà 1', '‡∏ß‡∏¥‡∏ó‡∏¢‡∏≤‡∏Å‡∏≤‡∏£‡∏Ñ‡∏≠‡∏°‡∏û‡∏¥‡∏ß‡πÄ‡∏ï‡∏≠‡∏£‡πå']

[Stage 1] Semantic Search (Qdrant)
  Searching ALL programs (no pre-filter)
  Query embedding norm: 1.0000
  Retrieved 50 candidates
  Top 3 semantic scores: ['0.542', '0.539', '0.525']

[Stage 2] Reranking (Cross-Encoder)
  Reranking 50 candidates...
  Reranking complete
  Top 3 rerank scores: ['0.194', '0.150', '0.103']

[Stage 3] Post-filter + Boost + Fallback
  Final results: 5
  Top 3 boosted scores: ['0.194', '0.150', '0.103']

[Result Summary]
  Total chunks: 5
  Unique programs: 4
  Sources: Bachelor of Engineering in Digital Manufacturing and Robotics Integration (International Program).pdf, Bachelor of Science (Physics).pdf, Bachelor of Science (Computer Science).pdf
  ‚úì Auto-appended General Education info from catalog (204 courses)

Query: ‡∏≠‡∏¢‡∏≤‡∏Å‡∏•‡∏á‡∏ß‡∏¥‡∏ä‡

---

## Demo Features

### Chat Interface
- Gradio UI with chat history
- Real-time responses from GPT-OSS-120B
- Copy/paste support

### Controls
- **RAG Toggle**: Search curriculum documents
- **Web Search**: Get latest online information
- **Temperature**: Adjust creativity (0.1-1.0)
- **Max Tokens**: Control response length

### Public Access
- Share URL works for 72 hours
- Anyone can access without login
- Multiple users can chat simultaneously

### Performance
- **Model**: GPT-OSS-120B (120B parameters)
- **Quantization**: 4-bit BNB (Unsloth optimized)
- **GPU**: A100 80GB (~40GB VRAM used)
- **Speed**: 30-60 tokens/second (2x faster)
- **Quality**: Production-ready
- **VRAM Savings**: 75% compared to FP16

---

## Troubleshooting

### No A100 GPU?
- Go to: Runtime ‚Üí Change runtime type ‚Üí A100 GPU
- Requires Google Colab Pro+ subscription

### Model loading fails?
- Check HuggingFace token is set
- Verify internet connection
- Try restarting runtime

### Demo stops working?
- Cell must keep running for demo to work
- Colab disconnects after ~12 hours idle
- Re-run cell 10 to restart demo

---

## Cost Estimate

**Google Colab Pro+**: ~$50/month
- A100 GPU access
- ~$1-2 per hour of usage
- Background execution
- Priority access

**Alternative**: Use test version (T4 GPU) for free
- See `colab_backend_test.ipynb`
- Smaller model but still functional
- Good for testing/development

---