# Semantic Kernel OpenAI Assistant Agent File Search

## Azure Resources Needed
1. Azure OpenAI
    - Deploy GPT-4o

## Prepare the files

In [1]:
import os

file_directory = "Data/nasabooks"

# List all files in the directory
try:
    filenames = os.listdir(file_directory)
    print(filenames)
except FileNotFoundError:
    print(f"Directory '{file_directory}' not found.")

# Get the full path of a file
def get_filepath_for_filename(filename: str) -> str:
    base_directory = file_directory
    return os.path.join(base_directory, filename)



['page-69.pdf', 'page-41.pdf', 'page-13.pdf', 'page-33.pdf', 'page-59.pdf', 'page-9.pdf', 'page-17.pdf', 'page-23.pdf', 'page-61.pdf', 'page-11.pdf', 'page-55.pdf', 'page-35.pdf', 'page-25.pdf', 'page-57.pdf', 'page-45.pdf', 'page-19.pdf', 'page-43.pdf', 'page-21.pdf', 'page-65.pdf', 'page-7.pdf', 'page-8.pdf', 'page-49.pdf', 'page-63.pdf', 'page-51.pdf', 'page-31.pdf', 'page-73.pdf', 'page-27.pdf', 'page-67.pdf', 'page-15.pdf', 'page-71.pdf', 'page-39.pdf']


## Reformat citations with the proper filenames

In [2]:
from semantic_kernel.contents.annotation_content import AnnotationContent

async def reformat_citations(agent, response):
    # Extract the annotations
    annotations = [item for item in response.items if isinstance(item, AnnotationContent)]
    
    # Original response
    paragraph = response.content
    
    # Dictionary to store key-value pairs of text and filename
    text_filename_pairs = {}

    # Iterate over the annotations and extract the relevant information
    for annotation in annotations:
        file_id = annotation.file_id
        text = annotation.quote
        # Retrieve the filename from the file_id
        cited_file = await agent.client.files.retrieve(file_id)
        filename = cited_file.filename

        if text not in text_filename_pairs:
            text_filename_pairs[text] = []
        text_filename_pairs[text].append(filename)

    # Replace the citation texts with their corresponding filenames prefixed with " Source: "
    for text, filenames in text_filename_pairs.items():
        sources = " Source: " + ", ".join(filenames)
        paragraph = paragraph.replace(text, sources)

    return paragraph

## Step 1-2: Create an Agent and Thread

In [3]:
from semantic_kernel.agents.open_ai.azure_assistant_agent import AzureAssistantAgent
from semantic_kernel.contents.chat_message_content import ChatMessageContent
from semantic_kernel.contents.utils.author_role import AuthorRole
from semantic_kernel.kernel import Kernel

# Step 1: Create an assistant agent
agent = await AzureAssistantAgent.create(
        kernel=Kernel(),
        service_id="agent",
        name="SK_OpenAI_Assistant_Agent_File_Search",
        instructions="""
            The document store contains pages from a Nasa book.
            Always analyze the document store to provide an answer to the user's question.
            Never rely on your knowledge of information not included in the document store.
            Always format response using markdown.
            """,
        enable_file_search=True,
        vector_store_filenames=[get_filepath_for_filename(filename) for filename in filenames],
    )

# Step 2: Create a thread
thread_id = await agent.create_thread()

## Step 3-6: Helper Function 
3. Add a message to the thread
4. Run the Assistant
5. Display the Assistant's Response

In [4]:
async def run_agent(user_question):
    # STEP 3: Add a user question to the thread
    await agent.add_chat_message(
            thread_id=thread_id, 
            message=ChatMessageContent(role=AuthorRole.USER, content=user_question)
    )

    # STEP 4: Invoke the agent to get a response
    async for response in agent.invoke(thread_id=thread_id):
        annotations = [item for item in response.items if isinstance(item, AnnotationContent)]
        #STEP 5: Print the Assistant response
        if annotations is None:
            print(f"{response.content}", end="", flush=True)
        else:
            print(f"{await reformat_citations(agent,response)}", end="", flush=True)

In [5]:
user_question = "How did the wide floodplains in Queensland originate?"
await run_agent(user_question)

The wide floodplains in Queensland, also known as the Channel Country, are unique and believed to have been formed due to the extreme variation in water and sediment discharges from the rivers. These floodplains experience periods of no rainfall, leading to the rivers effectively being non-existent. During years of modest rainfall, water flows in the main channels and sometimes spills over into billabongs. Every few decades, tropical storms lead to extremely high discharges of water, inundating the entire width of the floodplain. Such occasions transform the floodplain into a series of brown and green water surfaces with only treetops marking the islands Source: page-49.pdf.

## Appending Messages to the Thread

In [6]:
user_question = "What forms the Lower Amazon River?"
await run_agent(user_question)

The Lower Amazon River is formed by the confluence of the Rio Solimões and the Rio Negro. The Rio Solimões, which is rich with sediment, flows down from the Andes Mountains and carries café-au-lait-colored water. The Rio Negro, with waters nearly sediment-free and colored by decayed leaf and plant matter, flows from the Colombian hills and jungles. When these two rivers meet east of Manaus, Brazil, they flow side by side within the same channel for several kilometers. Eventually, turbulent eddies mix these two streams, forming the Lower Amazon River Source: page-61.pdf.

## Display Chat History

In [7]:
async for message in agent.get_thread_messages(thread_id):
    print(f"{message.role} : {message.content}")

AuthorRole.ASSISTANT : The Lower Amazon River is formed by the confluence of the Rio Solimões and the Rio Negro. The Rio Solimões, which is rich with sediment, flows down from the Andes Mountains and carries café-au-lait-colored water. The Rio Negro, with waters nearly sediment-free and colored by decayed leaf and plant matter, flows from the Colombian hills and jungles. When these two rivers meet east of Manaus, Brazil, they flow side by side within the same channel for several kilometers. Eventually, turbulent eddies mix these two streams, forming the Lower Amazon River【8:0†source】.
AuthorRole.USER : What forms the Lower Amazon River?
AuthorRole.ASSISTANT : The wide floodplains in Queensland, also known as the Channel Country, are unique and believed to have been formed due to the extreme variation in water and sediment discharges from the rivers. These floodplains experience periods of no rainfall, leading to the rivers effectively being non-existent. During years of modest rainfall

## Deleting Files, Thread, Agent

In [8]:
if agent is not None:
    [await agent.delete_file(file_id) for file_id in agent.file_search_file_ids]
    await agent.delete_thread(thread_id)
    await agent.delete()
