# Final Project Protocol: Foundations of Machine Learning Frameworks (CSCN8010)

## Project Title

## **AI-Enhanced Self-Service Portal for Student Affairs: A Proof of Concept**

## Context
Student Success Advisors (SSAs) are currently overwhelmed by low-complexity, high-volume inquiries. A reduction in SSA staffing highlights the urgent need for a scalable solution. This project proposes the design and implementation of a chatbot-based self-service tool to handle frequently asked questions (FAQs), integrated with a fallback mechanism ("off-ramp") to escalate complex queries to human advisors.

## Objectives
- Develop a functional Proof of Concept (PoC) chatbot interface for handling common student inquiries.
- Implement keyword-based detection for distress or complex issues to trigger escalation.
- Use publicly available policy documents and a list of FAQs as the chatbot's knowledge base.
- Build the chatbot using NLP, LLM architectures (like BERT or OpenAI ChatGPT), and Deep Learning models.
- Ensure replicable deployment with sound software engineering practices.

## 💬 Sample Chatbot Conversation: College Student Support

**Student:**  
Hi, I’m trying to figure out how to pay my tuition fees.

**Chatbot:**  
You can pay your tuition fees online through the Student Portal. Log in, navigate to the “Finance” section, and select “Make a Payment.” Here’s the link: [https://collegeportal.edu/finance](https://collegeportal.edu/finance)

**Student:**  
Thanks. Do I need to pay the full amount at once?

**Chatbot:**  
Good question! You can choose to pay in full or set up a payment plan. The payment plan allows you to pay in installments throughout the semester. Details and deadlines are available on the same “Finance” page.

**Student:**  
What happens if I miss a payment?

**Chatbot:**  
Missing a payment could result in late fees or a hold on your account, which may block access to registration and transcripts. If you’re having difficulty, it’s best to contact the Student Accounts Office as soon as possible.

---

**Student:**  
I'm feeling overwhelmed and not sure I can keep up this term.

**Chatbot:**  
I'm really sorry you're feeling this way. You don’t have to go through it alone. Please speak with a Student Success Advisor who can support you. You can book an appointment [here](https://collegeportal.edu/ssa-booking) or call us directly at 555-123-4567.
```


## Deliverables
- A working chatbot with a simple GUI (Streamlit or Flask-based web UI).
- NLP model embedded in chatbot to process user queries.
- LLM backbone (e.g., BERT) with a deep learning classifier head.
- Distress detection model for escalation.
- Requirements file and structured repo.
- README.md with project documentation and user manual.

## 4-Week Timeline
| Week | Milestone |
|------|-----------|
| 1    | Requirements gathering, knowledge base curation, and data preparation |
| 2    | Model prototyping and chatbot interface scaffolding |
| 3    | Integration of NLP model, off-ramp logic, and GUI polishing |
| 4    | Testing, documentation, and final presentation/demo |

## Logical Steps
1. Collect FAQs and college website content
2. Prepare text data and preprocess (tokenization, embedding)
3. Build NLP model using LLM architecture (e.g., BERT)
4. Add Deep Learning classifier for intent detection
5. Implement off-ramp detection via emotion classification
6. Develop UI in Flask or Tkinter
7. Test, document, freeze model, finalize submission

## Integration of NLP, LLM, and Deep Learning
| **Component**                    | **Technology Used**                         | **Purpose in the Chatbot**                                                                 |
|----------------------------------|---------------------------------------------|---------------------------------------------------------------------------------------------|
| NLP Preprocessing                | `spaCy`, `NLTK`, `transformers` tokenizers | Clean, tokenize, lemmatize, and vectorize student queries                                  |
| LLM Backbone                     | Pretrained `BERT`, `DistilBERT`, `GPT2`     | Understand user intent and generate context-aware representations                          |
| Deep Learning Classifier         | `PyTorch` / `Keras` / `TensorFlow`          | Fine-tune a classification head on top of LLM to route queries or trigger off-ramps        |
| Response Generation              | Rule-based / Template / LLM Decoder         | Return answers based on classification or use decoder models to generate responses         |
| Off-Ramp Detection               | Deep model trained to recognize key phrases | Classify emotional tone or distress based on LLM embeddings                                 |
| GUI Interface                    | `Flask`, `Tkinter`, or `Gradio`             | Provide chat input/output, and display escalation messages                                 |

#### Chatbot Architecture (Student-Level)

```simpletext
[Student Query] 
     ↓
[NLP Preprocessing Layer]
     ↓
[LLM Backbone (e.g., OpenAI API)]
     ↓
[Tokenizer → LLM (e.g. ChatGPT 4o)]
     ↓
[Dual Deep Learning Classifier Head]
   ↙                          ↘
Intent                        Emotion
Classifier                    Classifier
     ↓                             ↘
[Intent Classification]       [Distress Detection] → [Escalate to Human]
   ↓                                    ↓
[FAQ Match]                        [Escalate to SSA]
     ↓
[Response Generator]
     ↓
[Rule-based Answer or Retrieval]
     ↓
[Flask or Streamlit GUI]
```

## Sample PoC Code (Hard-Coded Simulation)

In [1]:
from transformers import pipeline

# Load a small QA pipeline
qa_pipeline = pipeline("question-answering")

context = """
You can pay your fees online via the Student Portal under the 'Finance' section.
To drop a course, login to the registration system and select 'Drop Course'.
If you are in distress, contact a Student Success Advisor.
"""

query = "How do I pay my fees?"
result = qa_pipeline(question=query, context=context)
print(f"Answer: {result['answer']}")

  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


Answer: online via the Student Portal


## UI Description
The chatbot will be a web-based GUI (Flask app or Streamlit desktop GUI). 
**Features:**
- Input textbox for student questions
- Real-time response area
- Always-visible emergency off-ramp button
- Chat history panel

## Folder Structure

```plaintext
student-bot/
├── data/                  # Training data and raw documents
├── model/                 # Saved LLM + classifiers
├── src/
│   ├── preprocessing.py   # spaCy/NLP functions
│   ├── inference.py       # LLM + DL inference logic
│   ├── chatbot.py         # Response generator + off-ramp logic
│   └── ui.py              # Flask or Streamlit GUI
├── documentation/
│   └── user_manual.md
├── requirements.txt
├── .env                   # Avoid hardcoding the API key directly in your script
└── app.py                 # Main entry for chatbot

```

## 🔌 Modular Architecture Breakdown

### 1. **Preprocessing Layer**
- Use `spaCy` or `NLTK` to clean and normalize text (lowercase, remove stopwords, punctuation).
- Tokenize queries if needed using `transformers` or simply pass raw queries to the OpenAI API for processing.

### 2. **LLM Encoder (ChatGPT via OpenAI API)**
- Connect to ChatGPT (`gpt-3.5-turbo` or `gpt-4`) using the [OpenAI Python SDK](https://platform.openai.com/docs/).
- Use the model to **understand intent**, **rephrase queries**, or **generate structured responses**.
- Example usage:

In [2]:
import openai
import os
from dotenv import load_dotenv

load_dotenv()  # Load from .env file

client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def query_chatgpt(prompt):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful college support assistant."},
            {"role": "user", "content": prompt}
        ]
    )
    return response.choices[0].message.content

# Example usage
print(query_chatgpt("How do I appeal a grade?"))

To appeal a grade, you typically need to follow the procedures established by your college or university. Here is a general guide you can follow:

1. Review the grading policy: Start by reviewing your school's policies and procedures regarding grade appeals. This information is usually found in the student handbook, on the school's website, or by contacting the registrar's office.

2. Understand the grounds for appeal: Most schools allow grade appeals only under specific circumstances, such as clerical errors, calculation mistakes, or unfair treatment. Make sure your reason for appealing falls within the acceptable grounds.

3. Contact your professor: Before formally appealing the grade, it's a good idea to reach out to your professor to discuss your concerns. Seek clarification on how your grade was determined and present any evidence you have to support your appeal.

4. Submit a formal appeal: If you are unsatisfied with the outcome of your discussion with the professor, or if you be

* Prompts can be structured to:

  * Return an **intent classification** (e.g., `"intent: timetable"`).
  * Detect **distress indicators** based on emotional language.
  * Return **direct answers** or links if desired.

### 3. **Deep Learning Heads (Optional)**

* If students want hands-on experience building models, they can:

  * Add a Deep Learning classifier using PyTorch or Keras on the frontend (e.g., distress detection or fallback classification).
  * Use the ChatGPT response as input to a simple binary classifier for escalation detection.

### 4. **Response System**

* **Template-Based Response**: Use predefined answers matched to classified intents (for reproducibility).
* **ChatGPT Generation**: Use OpenAI's model to generate rich, dynamic responses (with citations or links where needed).

### 5. **Off-Ramp Escalation**

* Triggered when:

  * The API response or local classifier indicates distress (e.g., "I can't take this anymore").
  * A predefined keyword list matches high-risk phrases.
* The chatbot should:

  * Display a supportive message.
  * Show a booking link to speak with a Student Success Advisor (SSA).
  * Optionally log or flag the session for follow-up.

### 6. **GUI (Flask Web App or Streamlit)**

* User-friendly chat interface should include:

  * A text input field for the student’s query.
  * A display panel showing the bot’s response.
  * An emergency off-ramp panel/banner when triggered.
  * Optional: conversation history, session logging, or theme customization.


## Sample Tools to Use
| Task                          | Suggested Library         |
|-------------------------------|---------------------------|
| Tokenization & Embeddings     | Hugging Face Transformers |
| Deep Learning Model           | PyTorch or Keras          |
| Emotion Classification        | Custom LSTM/CNN Classifier |
| GUI Interface                 | Flask or Gradio           |
| Escalation Logic              | Custom Python function    |

## Student Activities (Modular)
You could divide the implementation into parts:
- **Part 1:** Use `spaCy` or `transformers` to clean and tokenize queries.
- **Part 2:** Load a pretrained `BERT` or `DistilBERT` model and extract embeddings.
- **Part 3:** Add a classification layer on top (fine-tuning).
- **Part 4:** Detect escalation keywords or distress using a separate binary classifier.
- **Part 5:** Display interaction using a local web UI.

## Summary
This chatbot will combine NLP preprocessing, a transformer-based LLM (e.g., BERT), and Deep Learning classifiers to simulate realistic, scalable student support. Students will gain experience with language modeling, text classification, and GUI application design.

### 🧠 Why Build Steps 3–6 If GPT Can Do It All?

| Step | Component              | Why It’s Still Valuable to Implement                                                                                                                                                                                        |
| ---- | ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 3    | **Intent Classifier**  | 🎓 **Educational value**: Students learn how to build and evaluate classifiers. <br>🧪 **Control**: Allows you to explicitly route queries by intent (e.g., “fees”, “courses”) instead of relying on GPT's latent behavior. |
| 4    | **Distress Detector**  | 🛡️ **Ethical safety net**: Don’t rely solely on GPT to detect sensitive queries. <br>🚨 Helps flag specific keywords or emotional tone with a **transparent, explainable model**.                                          |
| 5    | **Response Generator** | 📚 **Standardization**: Use templated answers when accuracy, formality, or compliance matter (e.g., fee deadlines, withdrawal policies). <br>🧩 Mix GPT and templates for hybrid response generation.                       |
| 6    | **Off-Ramp Logic**     | ✅ **Fail-safe mechanism**: Allows independent escalation triggers even if GPT misses the signal. <br>💡 Integrates classifiers, rules, and GPT into one robust decision pipeline.                                           |

---

### 🔁 GPT Is Powerful, But Not Always the Best Alone

While GPT-3.5/4 is impressive for understanding and generating language:

* It’s a **black box** — hard to control or audit.
* It may **hallucinate** answers or miss subtle emotional cues.
* It’s not guaranteed to catch every **escalation** scenario (e.g., “I can’t keep doing this”).
* For institutions, answers often need to follow **official templates** or link to verified resources — GPT might skip those.

---

### 🎓 From the learning perspective, Steps 3–6 Offer:

* Hands-on experience with **model training, evaluation, and deployment**.
* A chance to understand when and how to **blend LLMs with traditional ML pipelines**.
* Exposure to **real-world software architecture**, where redundancy, modularity, and failover matter.

