In [1]:
from pathlib import Path
import os

from openai import OpenAI
from dotenv import dotenv_values

# Load the API key from the .env file
env_path = Path('../.env')
config = dotenv_values(env_path)
client = OpenAI(api_key=config['OPENAI_API_KEY'])

Create description and instructions for the assistant.  The description should be a brief summary of what the bot does and must be less than 512 characters. Its only used to display in the openai portal - not too important.

The 'instructions' however are critical.  They are where 'prompt engineering' come in to find the best instuctions to give to the bot to make it work how you want.  Just as it's easier to format and edit the intructions text, they are stored in a separate file 'Data/tutor_prompt.md' and loaded from there.

In [2]:
description = "A Chemistry Tutor bot specializing in Quantum Chemistry, utilizing the textbook QUANTUM CHEMISTRY SECOND EDITION by Donald A. McQuarrie. Bot will assist students after they have completed exams questions by providing feedback and guidance on where their answers could be improved."

bot_name = "Chemistry Tutor Toy_1"

print(f"Description length: {len(description)}.")
assert len(description) <= 512, "Description must be less than 512 characters."

Description length: 280.


Now we read in the instructions we want to use for our assistant from a text file.

In [3]:
#Path to the prompt file
prompt_path = Path('../Data/tutor_prompt.md')

# Read the prompt from the file
with open(prompt_path, 'r') as file:
    instructions = file.read()
    print(f"Instructions length: {len(instructions)}.")

Instructions length: 9549.


We now  need to create an openai "Assistant".  Our api call returns a Assistants object that contains a unique id for that assistant.  We need this id when dispatching any POST/GET to the OpenAI API endpoints.

The 'temperature' parameter controls the randomness and creativity of the generated response where 0 would be virtually deterministic and not very creative, whereas the max of 2 could be very "creative".

You also have to decide on what model you want to use.  Currently (May 2024) "gpt-4-turbo" is OpenAI's best model.  But it is also the most [expensive](https://openai.com/api/pricing/).  For very basic bots it may be worth using 'gpt-3.5-turbo' as it is 20 times cheaper!

In [4]:
# Set the model and temperature for the tutor agent
model = "gpt-4o"
temperature = 0.1

# Create the tutor agent using the beta OpenAI Assistants API
tutor_agent = client.beta.assistants.create(
    model=model,
    name=bot_name,
    description=description,
    instructions=instructions,
    tools=[{"type": "file_search"}],
    temperature=temperature,
)

Now we need to add knowledge to our assistant in the form of files that the assistant can search using their "file_search" tool.  The OpenAI API automatically parses, chunks and then stores a representation of the knowledge base in a [vector store](https://platform.openai.com/docs/assistants/tools/file-search/vector-stores). 

In [5]:
# Generate Paths to any files in the KnowLedgeBase directory
base_path = Path('../KnowledgeBase')
files = os.listdir(base_path)
file_paths = [Path(base_path, file) for file in files]
print(f"Found {len(file_paths)} Knowledge Files.")

Found 1 Knowledge Files.


In [6]:
# Create a vector store caled "Tutor_Bot_Test"
vector_store = client.beta.vector_stores.create(name=bot_name + "_Vector_Store")
 
# Ready the files for upload to OpenAI
file_streams = [open(path, "rb") for path in file_paths]
 
# Use the upload and poll SDK helper to upload the files, add them to the vector store, and wait for the operation to complete
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
  vector_store_id=vector_store.id, files=file_streams
)
 
# You can see the status of the file batch and the number of files that were added to the vector store
print(file_batch.status)
print(file_batch.file_counts)

completed
FileCounts(cancelled=0, completed=1, failed=0, in_progress=0, total=1)


Now we have to update our assistant to associate the 'Assistant' and the vector store and enable the assistant to use the [file_search](https://platform.openai.com/docs/assistants/tools/file-search) tool.  The 'Assistant' will determine itself automatically based on its instructions and any messages send to it whether to use "file_search" to augment its context will information retrieved from it's knowledge base (the vector store).

In [7]:
tutor_agent = client.beta.assistants.update(
    assistant_id=tutor_agent.id,
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)

Now save the secret openai ids into the .env file so they won't leak accidentally - they will be retrieved from the env file and used by our `Use_tutor_bot.ipynb` notebook.

In [8]:
# Append the tutor agent id and the vector store id  to the .env file
config['TUTOR_AGENT_ID'] = tutor_agent.id
config['VECTOR_STORE_ID'] = vector_store.id

with open(env_path, 'w') as file:
    for key, value in config.items():
        file.write(f"{key}={value}\n")