<a href="https://colab.research.google.com/github/valvekhris-eng/SPU-Assistant/blob/main/spuAI_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [128]:
import sys
import time
import google.generativeai as genai
import pandas as pd
import io
import requests
from bs4 import BeautifulSoup
from pdfminer.high_level import extract_text
from google.colab import userdata
from ipywidgets.widgets.interaction import clear_output
import ipywidgets as widgets
from IPython.display import display
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import json
import os


# Try to get API key from Colab secrets, with a fallback to environment variable
try:
    GOOGLE_API_KEY = userdata.get('Smash')
    if not GOOGLE_API_KEY: # userdata.get might return None if secret doesn't exist
        raise ValueError("Colab secret 'Smash' not found or empty.")
except Exception as e:
    print(f"Warning: Failed to retrieve API key from Colab secrets ('Smash'): {e}.")
    print("Attempting to load from environment variable 'GOOGLE_API_KEY' or 'SMASH_API_KEY'.")
    GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY") or os.getenv("SMASH_API_KEY")
    if not GOOGLE_API_KEY:
        print("Error: API key not found in environment variables. Please ensure the 'Smash' secret is available in Colab or set GOOGLE_API_KEY/SMASH_API_KEY environment variable.")
        # Assign a dummy value to prevent immediate crash, though subsequent genai.configure will fail
        GOOGLE_API_KEY = "dummy_api_key_if_not_set" # This will cause genai.configure to fail if no real key is found

genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel('gemini-3-pro-preview')
gemini_pro_model = genai.GenerativeModel('gemini-pro')

chat_state = "Start"
user_role = None
conversation_history = []
history_limit = 5
target_url = "https://www.spu.ac.za"
user_feedback = []

CONVERSATION_HISTORY_FILE = 'conversation_history.json'
USER_FEEDBACK_FILE = 'user_feedback.json'

spu_info = """
You are the SPU Student Asssistant for Sol Plaatje University.
Your job is to help students with a variety of things.

1. FINDING RESIDENCES:
- Moroka Hall; Male only, located on Central Campus.
- Ra-Thaga Hall: Mixed-gender, has laundry laundry facilitis, and is a designated Quiet Zone.
- Tauna Hall: Modern, Mixed-gender located near the sports fiels
- There are also off-campus residence options
- Like Liquor Board, Puma, Snake Park near the campus

2. WATER SAVING (Kimberly is a water-scarce area):
- Students must report leaks to maintance immediately.
- If water usage goes above 1000 Liters per floor, its a 'Critical Leak.'
- SPU's goal is to be the greenest campus in the Northern Cape

3. APPLYING TO FACULTIES
- Sol Plaatje University (SPU) in Kimberly, South Africa, offers academic programs through specialized schools rather than
traditional faculties. These schools include the School of Humanties, School of Education, School of Natural and Applied
Sciences, and School of Economic and Management Sciences, offering degrees in fields like ICT, heritage and education.

Key Academic School or Faculties
- School of Humanities: offers qualifications in Heritage Studies, Court Intepreting, Languages (Afrikaans/English), and
Social Sciences.
- School of Education: Focues on initial teacher education, covering various school phases, including Foundation phase,
Intermediate Phase, and Senior/FET Phase.
- School of Natural and Applied Sciences: Provides programs in Physical Sciences, Biological Sciences, Mathematics,
ICT, Data Science and Computer Science.
- School of Economic and Management Sciences: offers qualifications in Retail Business Management and well as other
related Accounting courses.

The UNIVERSITY IS CONTINOUSLY DEVELOPING NEW PROGRAMMES AND STUDENTS ARE ADVICED TO VISIT OUR WEBSITE (www.spu.ac.za)
ON A REGUKAR BASIS TO KEEP UP WITH THE LATEST ADDITIONS OF ACCREDITED PROGRAMMES.

4. FINDING FUNDING
There is a wide ragge of possible sources of financial support for higher education students in South Africa. These range
from bursaries to study loans from government or private institutionsb e.g commercial banks.
It is the student responsibility to apply for financial aid and you are therefore encouraged to investigate all possibilities
and make sure yoou apply in time for financial dates as they have specific deadlines.

Various funding options include:
- NSFAS
All available for financially and academically deserving students in any area of study. Conatct +27 (0)860 067327
- FUNZA LUSHAKA BURSARY
Only applicable to the following B.E.D programmes:
* B.ED Senior Phase and FET (Life Sciences and Mathematics)
* B.ED Senior Phase and FET (Geography and Mathematics)
* B.ED Senior Phase and FET (Mathematics and Natutal Sciences)
All other B.E.D programmes apply for other funding. Only on-line applications are accepted.
- ETDP SETA
Telephone +27 (0)53 832 0051/2
FAX + 27 (0)86 581 8322
- W&R SETA
For Students who are interested in studying the Diploma In Retail Business Management. can apply for this bursary.
Shop 16b, Flaxley House, 24 - 28 Du Toispan Road, Kimberly. Telephone +27 (0)53 831 4117.
- THE NORTHERN CAPE PREMIER'S EDUCATION TRUST FUND
Tel: +27 (0)53 839 9147
- NoRTGERN CAPE DEPARTMENT OF EDUCATION
Tel no: +27 (0)53 839 6500 (Bursary Office)
FAX to e-mail: +27 (0)867 733 842
- FUNDI
All qualifications
Call Centre +27 (0)860 55 55 44
E-mail support@fundi.co.za
- BURSARIES SOUTH AFRICA
Website www.bursaries2018.co.za

Contact details are where you can contact the institutions to get more detailed information about applying.

       DO YOU NEED ANY FURTHER INFORMATION OR ASSISTANCE?
       CONTACT THE FOLLOWING

For funding information, kindly contact:
- Mrs Chrizelle Mally, +27 (0)53 491 0102, e-mail: chrizelle.mally@spu.ac.za

For Tuition and Other Informaton (Quatations), fee details, kindly contact:
- Mrs Valerie Herman, +27 (0)53 491 0156, e-mail: valerie.herman@spu.ac.za
- Lebo Tlhagang, +27 (0.53 491 0215), e-mail: lebo.tlhagang@spu.ac.za


"""

pdf_path = '2021-Prospectus-Brochure.pdf'
try:
    with open(pdf_path, 'rb') as f:
        prospectus_text = extract_text(f)
    print(f"Prospectus text loaded successfully, length: {len(prospectus_text)} characters.")
except FileNotFoundError:
    prospectus_text = "Error: The prospectus PDF file '2021-Prospectus-Brochure.pdf' was not found. Please ensure it is in the correct directory."
    print(prospectus_text)
except Exception as e:
    prospectus_text = f"Error: An unexpected issue occurred during PDF text extraction: {e}"
    print(prospectus_text)

embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
print("SentenceTransformer model 'all-MiniLM-L6-v2' loaded.")

def chunk_text(text, chunk_size=500, chunk_overlap=100):
  chunks = []
  start = 0
  while start < len(text):
    end = start + chunk_size
    chunk = text[start:min(end, len(text))]
    chunks.append(chunk)
    if end >= len(text):
      break
    start += chunk_size - chunk_overlap
  return chunks

def get_embeddings(texts):
  return embedding_model.encode(texts, convert_to_tensor=True)

spu_info_chunks = chunk_text(spu_info)
spu_info_embeddings = get_embeddings(spu_info_chunks)

prospectus_chunks = chunk_text(prospectus_text)
prospectus_embeddings = get_embeddings(prospectus_chunks)

print(f"SPU Info chunked into {len(spu_info_chunks)} pieces and embedded.")
print(f"Prospectus chunked into {len(prospectus_chunks)} pieces and embedded.")

def retrieve_relevant_chunks(query, top_k=3, chunks=None, embeddings=None):
    if chunks is None or embeddings is None or len(chunks) == 0:
        return []

    query_embedding = embedding_model.encode([query], convert_to_tensor=True)
    similarities = cosine_similarity(query_embedding.cpu(), embeddings.cpu())
    top_k_indices = np.argsort(similarities[0])[-top_k:][::-1]
    relevant_chunks = [chunks[i] for i in top_k_indices]
    return relevant_chunks

def is_valid_id(id_input):
  return id_input.isdigit() and len(id_input) == 9

def get_spu_web_info(url):
    """
    Retrieves live web information from a given URL with enhanced error handling
    and more specific content extraction.
    """
    print(f"Attempting to retrieve web info from: {url}")
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()

        soup = BeautifulSoup(response.text, 'html.parser')
        extracted_text_parts = []

        # Target specific tags for content extraction
        target_tags = ['main', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p']

        for tag_name in target_tags:
            for element in soup.find_all(tag_name):
                text_content = element.get_text(separator=' ', strip=True)
                if text_content:
                    extracted_text_parts.append(text_content)

        extracted_text = '\n'.join(extracted_text_parts)

        # Truncate the extracted text to a maximum of 2000 characters
        if len(extracted_text) > 2000:
            extracted_text = extracted_text[:2000] + "... [truncated]"
            print(f"Content from {url} truncated to 2000 characters.")
        else:
            print(f"Successfully extracted content from {url}, length: {len(extracted_text)} characters.")

        if not extracted_text.strip():
            extracted_text = "Could not extract significant content using targeted tags."
            print(extracted_text)

        return extracted_text

    except requests.exceptions.HTTPError as e:
        error_message = f"HTTP Error for {url}: {e.response.status_code} - {e.response.reason}"
        print(error_message)
        return error_message
    except requests.exceptions.ConnectionError as e:
        error_message = f"Connection Error for {url}: Could not connect to the server. {e}"
        print(error_message)
        return error_message
    except requests.exceptions.Timeout as e:
        error_message = f"Timeout Error for {url}: The request timed out. {e}"
        print(error_message)
        return error_message
    except requests.exceptions.RequestException as e:
        error_message = f"An unexpected Request Error occurred for {url}: {e}"
        print(error_message)
        return error_message
    except Exception as e:
        error_message = f"An unexpected error occurred while fetching web info from {url}: {e}"
        print(error_message)
        return error_message

def load_data(filepath):
    if os.path.exists(filepath):
        with open(filepath, 'r') as f:
            return json.load(f)
    return []

def save_data(filepath, data):
    with open(filepath, 'w') as f:
        json.dump(data, f, indent=4)

conversation_history = load_data(CONVERSATION_HISTORY_FILE)
user_feedback = load_data(USER_FEEDBACK_FILE)
print(f"Loaded {len(conversation_history)} conversation history entries and {len(user_feedback)} feedback entries.")

def record_feedback(message_id, feedback_type, feedback_text=None):
    feedback_entry = {
        'message_id': message_id,
        'feedback_type': feedback_type,
        'feedback_text': feedback_text,
        'timestamp': time.time()
    }
    user_feedback.append(feedback_entry)
    save_data(USER_FEEDBACK_FILE, user_feedback)
    print(f"Feedback recorded for message_id {message_id}: {feedback_type}")

def get_user_intent(query):
    query_lower = query.lower()
    if any(keyword in query_lower for keyword in ["residence", "housing", "dorm"]):
        return "residence"
    elif any(keyword in query_lower for keyword in ["course", "program", "faculty", "admission", "apply", "school"]):
        return "academic"
    elif any(keyword in query_lower for keyword in ["news", "events", "updates", "announcements"]):
        return "web_info"
    else:
        return "general"

problem_suggestions = {
    'stress': "If you're feeling stressed, please reach out to SPU's Student Counselling Services. You can find their contact details on the SPU website or in the prospectus.",
    'financial difficulty': "For financial assistance, please contact Mrs. Chrizelle Mally at +27 (0)53 491 0102 or chrizelle.mally@spu.ac.za, or consider applying for NSFAS or other bursaries mentioned in the SPU info.",
    'academic struggles': "If you're struggling academically, please connect with your school's academic support services or tutors. The School of Humanities, Education, Natural and Applied Sciences, and Economic and Management Sciences all offer support. Check the prospectus for more details.",
    'residence issues': "For residence issues, you can report leaks to maintenance, or if you need general assistance with accommodation, check the SPU info for residence options like Moroka Hall, Ra-Thaga Hall, Tauna Hall, or off-campus options."
}

def assistant_response(message):
    global chat_state, user_role, conversation_history
    message_id = time.time()
    user_input = message.strip()

    conversation_history.append(f"User: {user_input}")
    save_data(CONVERSATION_HISTORY_FILE, conversation_history)

    if chat_state == "Start":
        chat_state = 'AWAITING_ROLE'
        response_text = "Hello! I am your SPU Assistant. To help you better, please state if you are a student, Personnel, Guest, or Alumni."
        conversation_history.append(f"Assistant: {response_text}")
        save_data(CONVERSATION_HISTORY_FILE, conversation_history)
        return response_text, message_id

    if chat_state == 'AWAITING_ROLE':
        choice = user_input.lower()
        if choice == "student":
            user_role = "student"
            chat_state = "AWAITING_ID"
            response_text = "Welcome! Please provide your Student Number. (If you are a student without one, say 'No Number')"
        elif choice == "personnel":
            user_role = "personnel"
            chat_state = "AWAITING_ID"
            response_text = "Welcome! Please provide your Personnel Number"
        elif choice == "alumni":
            user_role = "alumni"
            chat_state = "AWAITING_ID"
            response_text = "Welcome! Please provide your Alumni Number"
        elif choice in ["guest", "visitor", "none"]:
            user_role = "guest"
            chat_state = "READY"
            response_text = "Welcome to the SPU University how may I help you today"
        else:
            response_text = "I did not quite catch that. Are you a Student, Personnel, or Alumni?"
        conversation_history.append(f"Assistant: {response_text}")
        save_data(CONVERSATION_HISTORY_FILE, conversation_history)
        return response_text, message_id

    if chat_state == "AWAITING_ID":
        if user_role == "student" and ('no' in user_input.lower() or "don't" in user_input.lower()):
            chat_state = "READY"
            response_text = "I've noticed you are a First Year/Applicant. How can I help you with your application today?"
        elif is_valid_id(user_input):
            chat_state = "READY"
            response_text = f"Thank you. {user_role.capitalize()} ID {user_input} verified. How can I assist you now?"
        else:
            response_text = f"That doesn't look right. Please enter exactly 9 digits for your {user_role} number"
        conversation_history.append(f"Assistant: {response_text}")
        save_data(CONVERSATION_HISTORY_FILE, conversation_history)
        return response_text, message_id

    if chat_state == "READY":
        live_web_info_content = get_spu_web_info(target_url)
        web_chunks = chunk_text(live_web_info_content)
        web_embeddings = get_embeddings(web_chunks)

        user_intent = get_user_intent(user_input)
        print(f"Detected user intent: {user_intent}")

        top_k_spu_info = 2
        top_k_prospectus = 2
        top_k_web_info = 2

        if user_intent == "residence":
            top_k_spu_info = 4
        elif user_intent == "academic":
            top_k_prospectus = 4
        elif user_intent == "web_info":
            top_k_web_info = 4
        elif user_intent == "general":
            top_k_spu_info = 3
            top_k_prospectus = 3
            top_k_web_info = 1

        relevant_spu_info = retrieve_relevant_chunks(user_input, top_k=top_k_spu_info, chunks=spu_info_chunks, embeddings=spu_info_embeddings)
        relevant_prospectus = retrieve_relevant_chunks(user_input, top_k=top_k_prospectus, chunks=prospectus_chunks, embeddings=prospectus_embeddings)
        relevant_web_info = retrieve_relevant_chunks(user_input, top_k=top_k_web_info, chunks=web_chunks, embeddings=web_embeddings)

        all_relevant_snippets = []
        all_relevant_snippets.extend(relevant_spu_info)
        all_relevant_snippets.extend(relevant_prospectus)
        all_relevant_snippets.extend(relevant_web_info)

        context_prompt = "Relevant information from SPU resources and website:\n" + "\n".join(all_relevant_snippets)
        if user_intent != "general":
            context_prompt = f"Prioritized based on '{user_intent}' intent.\n" + context_prompt

        proactive_suggestion = []
        user_input_lower = user_input.lower()
        for keyword, suggestion in problem_suggestions.items():
            if keyword in user_input_lower:
                proactive_suggestion.append(suggestion)
        if proactive_suggestion:
            context_prompt += "\n\nProactive Suggestion: " + " ".join(proactive_suggestion)

        history_prompt = ""
        if conversation_history:
            history_prompt = "Previous conversation:\n" + "\n".join(conversation_history[-history_limit:]) + "\n"

        role_instructions = ""
        if user_role == "student":
            role_instructions = "As a student, you are looking for information primarily related to academic programs, funding, residences, and student life. Keep the tone encouraging and helpful, focusing on details relevant to a student's journey at SPU."
        elif user_role == "personnel":
            role_instructions = "As personnel, you are likely looking for operational information, policies, or specific departmental details. Provide concise and accurate information, referencing official procedures where possible."
        elif user_role == "alumni":
            role_instructions = "As an alumni, you might be interested in university news, events, ways to reconnect, or postgraduate opportunities. Offer engaging and informative responses that highlight SPU's advancements and alumni engagement opportunities."
        elif user_role == "guest":
            role_instructions = "As a guest, you are looking for general information about the university, its offerings, or how to get started with applications. Provide welcoming and clear information, guiding them to relevant sections or contacts."

        final_prompt = f"""
You are the SPU Student Assistant. {role_instructions} Adopt an empathetic and engaging tone. If a query is ambiguous, ask clarifying questions. Proactively offer related insights or information that might be helpful.
        {history_prompt}

        {context_prompt}

        User: {user_input}
        Assistant:
        """

        print(f"Prompt sent to LLM:\n{final_prompt}")

        try:
            response = gemini_pro_model.generate_content(final_prompt)
            response_text = response.text
        except Exception as e:
            response_text = f"An API error occurred while generating the response: {e}. Please check your API key and network connection."
            print(response_text)
            conversation_history.append(f"Assistant: {response_text}")
            save_data(CONVERSATION_HISTORY_FILE, conversation_history)
            return response_text, message_id


        conversation_history.append(f"Assistant: {response_text}")
        save_data(CONVERSATION_HISTORY_FILE, conversation_history)

        if len(conversation_history) > history_limit * 2:
            conversation_history = conversation_history[-(history_limit * 2):]

        return response_text, message_id


output_area = widgets.Output()

input_box = widgets.Text(
    placeholder='Ask me anything',
    description=':',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='70%')
)

button = widgets.Button(description='Ask Assistant',
                        button_style='info',
                        tooltip='Click to ask Me',
                        icon='paper-plane'
)

h_button = widgets.Button(description='üëç Helpful', button_style='success', layout=widgets.Layout(width='auto'))
unh_button = widgets.Button(description='üëé Not Helpful', button_style='danger', layout=widgets.Layout(width='auto'))

clear_chat_button = widgets.Button(description='Clear Chat',
                                   button_style='warning',
                                   tooltip='Click to clear chat and reset',
                                   icon='trash'
)

def on_helpful_clicked(b, msg_id):
    record_feedback(msg_id, 'helpful')
    b.disabled = True
    unh_button.disabled = True

def on_unhelpful_clicked(b, msg_id):
    record_feedback(msg_id, 'not_helpful')
    b.disabled = True
    h_button.disabled = True

def on_clear_chat_clicked(b):
    global chat_state, user_role, conversation_history, user_feedback
    with output_area:
        clear_output(wait=True)
        chat_state = "Start"
        user_role = None
        conversation_history = []
        user_feedback = []
        save_data(CONVERSATION_HISTORY_FILE, conversation_history)
        save_data(USER_FEEDBACK_FILE, user_feedback)
        print("Chat cleared! How can I help you starting fresh?")
        input_box.value = ""
        response_text, message_id = assistant_response("Start")
        print(f"Assistant: {response_text}")

clear_chat_button.on_click(on_clear_chat_clicked)

def on_button_clicked(b):
    with output_area:
        clear_output(wait=True)

        question = input_box.value
        if question.strip() !="":
          print(f"Searching SPU records for: {question}...")

          assistant_response_text, message_id = assistant_response(question)
          print(f"Assistant: {assistant_response_text}")
          print(f"\n(Message ID for feedback: {message_id})\n")

          h_button.disabled = False
          unh_button.disabled = False
          h_button.on_click(lambda btn: on_helpful_clicked(btn, message_id))
          unh_button.on_click(lambda btn: on_unhelpful_clicked(btn, message_id))
          display(widgets.HBox([h_button, unh_button]))

        else:
            print("Please type a question first")

button.on_click(on_button_clicked)
display(widgets.VBox([widgets.HBox([input_box, button, clear_chat_button]), output_area]))

def detect_spu_leaks(meter_readings):
  analysis_prompt = """
  You are an SPU Water Engineer.
  Analyze these daily meter readings:
  {meter_readings}

  Tasks:
  1. Is there a spike that looks like a pipe burst or a leak?
  2. Which day did it happen"
  3. Write a short alert message to send to the SPU Maintenance Team.
  """
  response = model.generate_content(analysis_prompt)
  print(response.text)

  leak_button = widgets.Button(description='Run Leak Detection',
                                button_style='warning',
                                icon='tint'
  )

print("SPU Assistant initialized and ready for interaction!")




Prospectus text loaded successfully, length: 33334 characters.


Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


SentenceTransformer model 'all-MiniLM-L6-v2' loaded.
SPU Info chunked into 10 pieces and embedded.
Prospectus chunked into 84 pieces and embedded.
Loaded 16 conversation history entries and 0 feedback entries.


VBox(children=(HBox(children=(Text(value='', description=':', layout=Layout(width='70%'), placeholder='Ask me ‚Ä¶

SPU Assistant initialized and ready for interaction!
