# Notebook: [ Week #7 - Quick Prototyping with Streamlit ]

💡💡💡 We strongly suggest that you at least go through the following materials in our notes before proceeding with this hands-on session:

- *"Setting up a Streamlit Project with pip and venv"*

- *"Working with Streamlit"*

---

💎 The **best way** to go through this hands-on:
- Try not to look at the **Solution Reference** (included in this notebook) or the solution video(s) unless you get stuck for more than 15-20 minutes.

- At this point, you should be comfortable with fiddling with the code and resolvings errors on your own.

---
---

# Getting Familiar with Streamlit: A Simple Form Submission and Processing in Streamlit

- Create a new folder for this hands-on. We will call this new folder as `project folder`.

- Create a **new virtual environment** in the `project folder` and activate it.
    - The virtual environment must have at least the following packages:
      - `streamlit`
      - `openai`
      - `tiktoken`
      - `python-dotenv`
      - `pandas`
    

- Run this script in VS Code to verify that the Streamlit project folder has been set up correctly.

    - 1. Create a new file named `main.py` in the root of your project folder and copy the code below into it.

    - 2. In the `terminal` of VS Code, execute the command `streamlit run main.py`.

In [None]:
# Set up and run this Streamlit App
import streamlit as st

# region <--------- Streamlit App Configuration --------->
st.set_page_config(
    layout="centered",
    page_title="My Streamlit App"
)
# endregion <--------- Streamlit App Configuration --------->

st.title("Streamlit App")

form = st.form(key="form")
form.subheader("Prompt")

user_prompt = form.text_area("Enter your prompt here", height=200)

if form.form_submit_button("Submit"):
    st.toast("User Input Submitted - {user_prompt}")
    print(f"User Input is {user_prompt}")

---
---
<br>


# Developing a Customer Service Bot

- The subsequent sections of this notebook contain the instructions for building a simple Q&A Bot using Streamlit.

- The Bot relies on the functions that we created in Week 3.

    - We suggest that you take a look at Week 3 to understand the overall use case.

    - The notebook from Week 3 will also help you understand how we derived the functions that we will be using in this notebook.
    
    - Note that we did not include all the functions from Week 3 to allow us to better focus on the Streamlit part of the project.


- To make it easier to refer to, we have included the functions from Week 3 in this notebook.

---

## Project Structure

![](https://abc-notes.data.tech.gov.sg/resources/img/topic-07-streamlit-llm.png)

---
---
<br>

## Setting Up the Project

### Download Required Data

- Step 1: Download the the data from the link below
    - https://abc-notes.data.tech.gov.sg/resources/data/courses-full.json
- Step 2: Create a folder "data" in the project folder and copy the `courses-full.json` into the `data` folder.

---
<br>

### Copy your `.env` file into the Root of the Project Folder

- copy and paste the `.env` file that you have created in `Topic 6` into the root of the project folder.

- the `.env` should at least contain the `OPENAI_API_KEY` key-value pair for the purpose of this hands-on.

---
---
<br>


# Task 1: Convert Helpers Functions into a Python Script that can be Imported

The purpose of this task is to create a new Python script named `llm.py` under the folder  `helper_functions` that is located at the root of the project folder.
- The `llm.py` script will contain the `helper functions` below:
    - 1. `get_embedding`
    - 2. `get_completion`
    - 3. `get_completion_by_messages`
    - 4. `count_tokens`
    - 5. `count_tokens_from_messages`


- > 💡 Note that beside the functions themselve, the script MUST also contain the necessary `import statements` and other dependencies that are required for the functions to work.

- This script will allow us to import the functions into the main script `main.py` for the purpose of building Streamlit app.

## Packages to be Imported

In [None]:
import os
from dotenv import load_dotenv
from openai import OpenAI
import tiktoken

## Load the `.env` file in this new Script

In [None]:
load_dotenv('.env')

In [None]:
# Pass the API Key to the OpenAI Client
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

## Function for Generating Embedding

In [None]:
def get_embedding(input, model='text-embedding-3-small'):
    response = client.embeddings.create(
        input=input,
        model=model
    )
    return [x.embedding for x in response.data]

## Function for Text Generation

In [None]:
# This is the "Updated" helper function for calling LLM
def get_completion(prompt, model="gpt-4o-mini", temperature=0, top_p=1.0, max_tokens=1024, n=1, json_output=False):
    if json_output == True:
      output_json_structure = {"type": "json_object"}
    else:
      output_json_structure = None

    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create( #originally was openai.chat.completions
        model=model,
        messages=messages,
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_tokens,
        n=1,
        response_format=output_json_structure,
    )
    return response.choices[0].message.content

## Function for Text Generation from `Messages`

In [None]:
# Note that this function directly take in "messages" as the parameter.
def get_completion_by_messages(messages, model="gpt-4o-mini", temperature=0, top_p=1.0, max_tokens=1024, n=1):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_tokens,
        n=1
    )
    return response.choices[0].message.content

## Functions for Token Counting

In [None]:
# This function is for calculating the tokens given the "message"
# ⚠️ This is simplified implementation that is good enough for a rough estimation

def count_tokens(text):
    encoding = tiktoken.encoding_for_model('gpt-4o-mini')
    return len(encoding.encode(text))


def count_tokens_from_message(messages):
    encoding = tiktoken.encoding_for_model('gpt-4o-mini')
    value = ' '.join([x.get('content') for x in messages])
    return len(encoding.encode(value))


---
✅ **CheckPoint**

<details>
    <summary><font size="4" color="darkgreen"><b>See the Reference Solution (👆🏼 Click to expand)</b></font></summary>

This is one way you can organize your script:

------------------------------ START OF SCRIPT ------------------------------

```Python
import os
from dotenv import load_dotenv
from openai import OpenAI
import tiktoken


load_dotenv('.env')

# Pass the API Key to the OpenAI Client
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

def get_embedding(input, model='text-embedding-3-small'):
    response = client.embeddings.create(
        input=input,
        model=model
    )
    return [x.embedding for x in response.data]


# This is the "Updated" helper function for calling LLM
def get_completion(prompt, model="gpt-4o-mini", temperature=0, top_p=1.0, max_tokens=1024, n=1, json_output=False):
    if json_output == True:
      output_json_structure = {"type": "json_object"}
    else:
      output_json_structure = None

    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create( #originally was openai.chat.completions
        model=model,
        messages=messages,
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_tokens,
        n=1,
        response_format=output_json_structure,
    )
    return response.choices[0].message.content


# Note that this function directly take in "messages" as the parameter.
def get_completion_by_messages(messages, model="gpt-4o-mini", temperature=0, top_p=1.0, max_tokens=1024, n=1):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_tokens,
        n=1
    )
    return response.choices[0].message.content


# This function is for calculating the tokens given the "message"
# ⚠️ This is simplified implementation that is good enough for a rough estimation
def count_tokens(text):
    encoding = tiktoken.encoding_for_model('gpt-4o-mini')
    return len(encoding.encode(text))


def count_tokens_from_message(messages):
    encoding = tiktoken.encoding_for_model('gpt-4o-mini')
    value = ' '.join([x.get('content') for x in messages])
    return len(encoding.encode(value))

```

------------------------------ END OF SCRIPT ------------------------------
</details>

---
---
<br>

## Testing out the `llm.py` Script

1. Copy the code over to replace all the code in the `main.py` script.

2. Start the Streamlit app by running `python -m streamlit run main.py` in the terminal.

In [None]:
# Set up and run this Streamlit App
import streamlit as st
from helper_functions import llm # <--- This is the helper function that we have created 🆕


# region <--------- Streamlit App Configuration --------->
st.set_page_config(
    layout="centered",
    page_title="My Streamlit App"
)
# endregion <--------- Streamlit App Configuration --------->

st.title("Streamlit App")

form = st.form(key="form")
form.subheader("Prompt")

user_prompt = form.text_area("Enter your prompt here", height=200)

if form.form_submit_button("Submit"):
    st.toast(f"User Input Submitted - {user_prompt}")
    response = llm.get_completion(user_prompt) # <--- This calls the helper function that we have created 🆕
    st.write(response) # <--- This displays the response generated by the LLM onto the frontend 🆕
    print(f"User Input is {user_prompt}")

🎉🥳 Congratulations! You have successfully created a web application that allows users to interect with the LLM!!! 🎉🥳

---
---

# Task 2: Writing the Main Logics into Functions

In this task, a new Python script `customer_query_handler.py` in a new folder `logics` at the project's root folder.
- This script will contains a list of functions shown the cells below:
    - 1. `identify_category_and_courses`
    - 2. `get_product_details`
    - 3. `generate_response_based_on_course_details`

- As a piece of info needed by some of these functions, we also need to have a function to read in the `courses-full.json` file in `data` folder

---
<br>

- 💡💡💡 Take note that, in the script, you will need to:
    - Import the necessary packages.
    - Change some of the code in the functions to make it work with the `helper functions` that we have created in the previous task.


## Function #1: `identify_category_and_courses`

In [None]:
category_n_course_name = {'Programming and Development': ['Web Development Bootcamp',
                                                          'Introduction to Cloud Computing',
                                                          'Advanced Web Development',
                                                          'Cloud Architecture Design'],
                          'Data Science & AI': ['Data Science with Python',
                                                'AI and Machine Learning for Beginners',
                                                'Machine Learning with R',
                                                'Deep Learning with TensorFlow'],
                          'Marketing': ['Digital Marketing Masterclass',
                                        'Social Media Marketing Strategy'],
                          'Cybersecurity': ['Cybersecurity Fundamentals',
                                            'Ethical Hacking for Beginners'],
                          'Business and Management': ['Project Management Professional (PMP)Â® Certification Prep',
                                                      'Agile Project Management'],
                          'Writing and Literature': ['Creative Writing Workshop',
                                                     'Advanced Creative Writing'],
                          'Design': ['Graphic Design Essentials', 'UI/UX Design Fundamentals']}


def identify_category_and_courses(user_message):
    delimiter = "####"

    system_message = f"""
    You will be provided with customer service queries. \
    The customer service query will be enclosed in
    the pair of {delimiter}.

    Decide if the query is relevant to any specific courses
    in the Python dictionary below, which each key is a `category`
    and the value is a list of `course_name`.

    If there are any relevant course(s) found, output the pair(s) of a) `course_name` the relevant courses and b) the associated `category` into a
    list of dictionary object, where each item in the list is a relevant course
    and each course is a dictionary that contains two keys:
    1) category
    2) course_name

    {category_n_course_name}

    If are no relevant courses are found, output an empty list.

    Ensure your response contains only the list of dictionary objects or an empty list, \
    without any enclosing tags or delimiters.
    """

    messages =  [
        {'role':'system',
         'content': system_message},
        {'role':'user',
         'content': f"{delimiter}{user_message}{delimiter}"},
    ]
    category_and_product_response_str = get_completion_from_messages(messages)
    category_and_product_response_str = category_and_product_response_str.replace("'", "\"")
    category_and_product_response = json.loads(category_and_product_response_str)
    return category_and_product_response
    

In [None]:
# Testing the function to make sure it works
# This part should not be included as part of the Python script.

user_query = "I'm interested in learning about artificial intelligence."
result = identify_category_and_courses(user_query)
print(result) 

## Function #2: `get_product_details`

In [None]:
import json

filepath = './data/courses-full.json'
# filepath = 'courses-full.json'

with open(filepath, 'r') as file:
    json_string = file.read()
    dict_of_courses = json.loads(json_string)


def get_course_details(list_of_category_n_course: list[dict]):
    course_names_list = []
    for x in category_and_product_response:
        course_names_list.append(x.get('course_name')) # x["course_name"]

    list_of_course_details = []
    for course_name in course_names_list:
        list_of_course_details.append(dict_of_courses.get(course_name))
    return list_of_course_details

In [None]:
# Testing the function to make sure it works
# This part should not be included as part of the Python script.

# This is a sample input to test the function
sample_input = [{'category': 'Programming and Development','course_name': 'Web Development Bootcamp'},
                {'category': 'Data Science & AI', 'course_name': 'Data Science with Python'},
                {'category': 'Data Science & AI', 'course_name': 'AI and Machine Learning for Beginners'}]


product_details = get_course_details(sample_input)
product_details

## Function #3: `generate_response_based_on_course_details`

In [None]:
def generate_response_based_on_course_details(user_message, product_details):
    delimiter = "####"

    system_message = f"""
    Follow these steps to answer the customer queries.
    The customer query will be delimited with a pair {delimiter}.

    Step 1:{delimiter} If the user is asking about course, \
    understand the relevant course(s) from the following list.
    All available courses shown in the json data below:
    {product_details}

    Step 2:{delimiter} Use the information about the course to \
    generate the answer for the customer's query.
    You must only rely on the facts or information in the course information.
    Your response should be as detail as possible and \
    include information that is useful for customer to better understand the course.

    Step 3:{delimiter}: Answer the customer in a friendly tone.
    Make sure the statements are factually accurate.
    Your response should be comprehensive and informative to help the \
    the customers to make their decision.
    Complete with details such rating, pricing, and skills to be learnt.
    Use Neural Linguistic Programming to construct your response.

    Use the following format:
    Step 1:{delimiter} <step 1 reasoning>
    Step 2:{delimiter} <step 2 reasoning>
    Step 3:{delimiter} <step 3 response to customer>

    Make sure to include {delimiter} to separate every step.
    """

    messages =  [
        {'role':'system',
         'content': system_message},
        {'role':'user',
         'content': f"{delimiter}{user_message}{delimiter}"},
    ]

    response_to_customer = llm.get_completion_by_messages(messages)
    response_to_customer = response_to_customer.split(delimiter)[-1]
    return response_to_customer

In [None]:
# Testing the function to make sure it works
# This part should not be included as part of the Python script.

user_query = f"""Do you have any coding or data related courses that are under $1000 """

response = generate_response_based_on_course_details(user_query, product_details)
print(response)

# Task 3: Creating the main function `process_user_query` that calls all the functions above

- Now that we have all the functions in place, we will combine the functions into a main function `process_user_query` in the same script `customer_query_handler.py`.

- This main function will call the functions we have created in the previous task to process the user query.
  - It acts like a pipeline that combines the functions that act as the `steps` in the pipeline.
  - The user query will be passed into this main function and the function will return the response to the user query.

- This function will be called in the main script `main.py`.


In [None]:
def process_user_message(user_input):
    delimiter = "```"

    # Process 1: If Courses are found, look them up
    category_n_course_name = identify_category_and_courses(user_input)
    print("category_n_course_name : ", category_n_course_name)

    # Process 2: Get the Course Details
    course_details = get_course_details(category_n_course_name)

    # Process 3: Generate Response based on Course Details
    reply = generate_response_based_on_course_details(user_input, course_details)

    return reply

---
✅ **CheckPoint**

<details>
    <summary><font size="4" color="darkgreen"><b>See the Reference Solution (👆🏼 Click to expand)</b></font></summary>

This is one way you can organize your script:

------------------------------ START OF SCRIPT ------------------------------

```Python
import json
from helper_functions import llm

category_n_course_name = {'Programming and Development': ['Web Development Bootcamp',
                                                          'Introduction to Cloud Computing',
                                                          'Advanced Web Development',
                                                          'Cloud Architecture Design'],
                          'Data Science & AI': ['Data Science with Python',
                                                'AI and Machine Learning for Beginners',
                                                'Machine Learning with R',
                                                'Deep Learning with TensorFlow'],
                          'Marketing': ['Digital Marketing Masterclass',
                                        'Social Media Marketing Strategy'],
                          'Cybersecurity': ['Cybersecurity Fundamentals',
                                            'Ethical Hacking for Beginners'],
                          'Business and Management': ['Project Management Professional (PMP)Â® Certification Prep',
                                                      'Agile Project Management'],
                          'Writing and Literature': ['Creative Writing Workshop',
                                                     'Advanced Creative Writing'],
                          'Design': ['Graphic Design Essentials', 'UI/UX Design Fundamentals']}

# Load the JSON file
filepath = './data/courses-full.json'
with open(filepath, 'r') as file:
    json_string = file.read()
    dict_of_courses = json.loads(json_string)


def identify_category_and_courses(user_message):
    delimiter = "####"

    system_message = f"""
    You will be provided with customer service queries. \
    The customer service query will be enclosed in
    the pair of {delimiter}.

    Decide if the query is relevant to any specific courses
    in the Python dictionary below, which each key is a `category`
    and the value is a list of `course_name`.

    If there are any relevant course(s) found, output the pair(s) of a) `course_name` the relevant courses and b) the associated `category` into a
    list of dictionary object, where each item in the list is a relevant course
    and each course is a dictionary that contains two keys:
    1) category
    2) course_name

    {category_n_course_name}

    If are no relevant courses are found, output an empty list.

    Ensure your response contains only the list of dictionary objects or an empty list, \
    without any enclosing tags or delimiters.
    """

    messages =  [
        {'role':'system',
         'content': system_message},
        {'role':'user',
         'content': f"{delimiter}{user_message}{delimiter}"},
    ]
    category_and_product_response_str = llm.get_completion_by_messages(messages)
    category_and_product_response_str = category_and_product_response_str.replace("'", "\"")
    category_and_product_response = json.loads(category_and_product_response_str)
    return category_and_product_response
    

def get_course_details(list_of_relevant_category_n_course: list[dict]):
    course_names_list = []
    for x in list_of_relevant_category_n_course:
        course_names_list.append(x.get('course_name')) # x["course_name"]

    list_of_course_details = []
    for course_name in course_names_list:
        list_of_course_details.append(dict_of_courses.get(course_name))
    return list_of_course_details


def generate_response_based_on_course_details(user_message, product_details):
    delimiter = "####"

    system_message = f"""
    Follow these steps to answer the customer queries.
    The customer query will be delimited with a pair {delimiter}.

    Step 1:{delimiter} If the user is asking about course, \
    understand the relevant course(s) from the following list.
    All available courses shown in the json data below:
    {product_details}

    Step 2:{delimiter} Use the information about the course to \
    generate the answer for the customer's query.
    You must only rely on the facts or information in the course information.
    Your response should be as detail as possible and \
    include information that is useful for customer to better understand the course.

    Step 3:{delimiter}: Answer the customer in a friendly tone.
    Make sure the statements are factually accurate.
    Your response should be comprehensive and informative to help the \
    the customers to make their decision.
    Complete with details such rating, pricing, and skills to be learnt.
    Use Neural Linguistic Programming to construct your response.

    Use the following format:
    Step 1:{delimiter} <step 1 reasoning>
    Step 2:{delimiter} <step 2 reasoning>
    Step 3:{delimiter} <step 3 response to customer>

    Make sure to include {delimiter} to separate every step.
    """

    messages =  [
        {'role':'system',
         'content': system_message},
        {'role':'user',
         'content': f"{delimiter}{user_message}{delimiter}"},
    ]

    response_to_customer = llm.get_completion_by_messages(messages)
    response_to_customer = response_to_customer.split(delimiter)[-1]
    return response_to_customer


def process_user_message(user_input):
    delimiter = "```"

    # Process 1: If Courses are found, look them up
    category_n_course_name = identify_category_and_courses(user_input)
    print("category_n_course_name : ", category_n_course_name)

    # Process 2: Get the Course Details
    course_details = get_course_details(category_n_course_name)

    # Process 3: Generate Response based on Course Details
    reply = generate_response_based_on_course_details(user_input, course_details)

    # Process 4: Append the response to the list of all messages
    return reply
```

------------------------------ END OF SCRIPT ------------------------------
</details>

---
---

# Task 4: Build a Chat Interface for the App

- Modify your `main.py` script to:

    1. import the `process_user_message` function from the `customer_query_handler` script.

    2. insert the `process_user_message` function into the Streamlit app to process the user query.

---
✅ **CheckPoint**

<details>
    <summary><font size="4" color="darkgreen"><b>See the Reference Solution (👆🏼 Click to expand)</b></font></summary>

This is one way you can organize your script:

------------------------------ START OF SCRIPT ------------------------------

```Python
# Set up and run this Streamlit App
import streamlit as st
# from helper_functions import llm # <--- Not needed anymore. The helper function is now directly called by `customer_query_handler` 🆕
from logics.customer_query_handler import process_user_message

# region <--------- Streamlit App Configuration --------->
st.set_page_config(
    layout="centered",
    page_title="My Streamlit App"
)
# endregion <--------- Streamlit App Configuration --------->

st.title("Streamlit App")

form = st.form(key="form")
form.subheader("Prompt")

user_prompt = form.text_area("Enter your prompt here", height=200)

if form.form_submit_button("Submit"):
    st.toast(f"User Input Submitted - {user_prompt}")
    response = process_user_message(user_prompt) #<--- This calls the `process_user_message` function that we have created 🆕
    st.write(response)
    print(f"User Input is {user_prompt}")
```

------------------------------ END OF SCRIPT ------------------------------
</details>