#### Use Assistants API Browser tool to retrieve and summarize web search results with annotations   

**[Note: Assistants Browser tool is in private alpha at present, this cookbook is to be published when it is offered as public beta.]**

OpenAI's Assistants API enables developers to integrate AI-powered assistants into their applications, allowing them to perform a variety of tasks such as answering questions, generating text, translating languages, and more. Additionally, the Assistants API provides access to built-in tools like File Search and Code Interpreter.

This cookbook focuses on the new Browser tool, which allows developers to search, retrieve, and summarize web content based on an input prompt. The Browser tool enhances the Assistants API by incorporating more recent data from web searches into its output.

### Step 1: Setup an assistant with the browser tool
In the code cell below we setup an Assistant to use the Browser tool. You can learn more about setting up and invoking assistants https://platform.openai.com/docs/assistants/overview. In summary, you need to create an assistant, create a thread, add message to the thread, and invoke run function on the given thread and assistant id.   

In [1]:
from openai import OpenAI

# Model for assistant 
model = "gpt-4o"

# Definition of the browser tool 
BROWSER_TOOL = [
    {
        "type": "browser",
        "browser": {
            "locale": "en-US"
        }
    }
]

# System prompts 
system_prompt = "Conduct a thorough search on the topic provided by the user and output your findings. Provide more detailed text of each citation."

# Create a client 
client = OpenAI()

# Create an assistant 
assistant = client.beta.assistants.create(
            name="INTERNET SEARCH ASSISTANT",
            instructions=system_prompt,
            model=model,
            tools=BROWSER_TOOL
        )

### Step 2: Invoke the assistant with a user message
In the step below, we'll create a thread, create and append message to the thread, and invoke run to execute assistant on the given thread (without streaming).  

In [2]:
user_message = "Housing market in the US in the next two years 2025 and 2026"

# Create a thread of conversation 
thread = client.beta.threads.create()

# Create a message on the thread 
messages = client.beta.threads.messages.create(
            thread_id=thread.id,
            role="user",
            content=user_message
        )

# Run the assistant with given thread 
run = client.beta.threads.runs.create_and_poll(
            thread_id=thread.id,
            assistant_id=assistant.id,
            temperature=1,
            top_p=1
        )

### Step 3: Retrieve the message text value and annotations 
The Browser tool will return a response from the LLM that summarizes a variety of sources on the topic, with annotations. You can retrieve the message and annotations as outlined below

In [3]:
# get the message back from the run 
message = client.beta.threads.messages.list(
            thread_id=thread.id,
            after=messages.id,
            order="asc"
        )

message_content = message.data[0].content[0].text.value 
print("----- Start of message content output -----")
print(message_content)
print("----- End of message content output -----")

----- Start of message content output -----
The U.S. housing market in the next two years, 2025 and 2026, is expected to experience various dynamics influenced by factors such as mortgage rates, inventory levels, and economic conditions. Here is a synthesis of expert predictions:

### Home Prices and Market Trends
1. **Moderate Price Growth**:
   - According to Goldman Sachs, home prices are projected to rise by approximately 3.8% in 2025 and 4.9% in 2026【8†source】【9†source】. This growth represents a recovery from the slower pace expected in 2024 but still reflects a tempered increase due to ongoing market corrections from previous highs during the pandemic.

2. **Sellers' Market Continues**:
   - The market is likely to remain a sellers' market, characterized by tight inventory and sustained demand. While there is some expectation of new home construction, it may not significantly alleviate the pressure on supply until 2027 or later【6†source】【7†source】.

### Inventory and New Construc

The citations to the URLs retrieved by the API are stored in the `FileCitationAnnotation` object storied as list of annotations. This object includes fields that specify the annotation text appearing in the `message_content` variable, such as `text='【7†source】'`, the position of the annotation in the text (`start_index=371` to `end_index=381`), and the corresponding best-matching URL for the source ` `. For example, the URL citation for the first record in the list is:
`url_citation = {
    'url': 'https://www.noradarealestate.com/blog/goldman-sachs-housing-market-forecast/',
    'title': "Goldman Sachs' 5-Year Housing Forecast from 2024 to 2027"
}`

In [4]:
annotations = message.data[0].content[0].text.annotations

print(annotations)

[FileCitationAnnotation(end_index=425, file_citation=None, start_index=415, text='【8†source】', type='url_citation', url_citation={'url': 'https://finance.yahoo.com/news/four-years-housing-market-gridlock-183219157.html', 'title': 'Four years of housing market gridlock? Goldman Sachs issues U.S. home price predictions through 2026'}), FileCitationAnnotation(end_index=435, file_citation=None, start_index=425, text='【9†source】', type='url_citation', url_citation={'url': 'https://www.noradarealestate.com/blog/goldman-sachs-housing-market-forecast/', 'title': "Goldman Sachs' 5-Year Housing Forecast from 2024 to 2027"}), FileCitationAnnotation(end_index=913, file_citation=None, start_index=903, text='【6†source】', type='url_citation', url_citation={'url': 'https://realwealth.com/learn/housing-market-predictions/', 'title': '25+ Housing Market Predictions for the Next 5 Years [2024-2028]'}), FileCitationAnnotation(end_index=923, file_citation=None, start_index=913, text='【7†source】', type='url

### Step 4: Post-Processing

After retrieving the message containing the text and its annotations, you can process this information based on your application's requirements using the data in the `Annotations`. Below are two examples to illustrate this.

**Example 1: Adding Annotations to the Text**

The annotations referenced in the text value returned by the Browser tool (stored in the `message_content` variable) do not include the actual URLs of the web pages. 

The following `add_annotations_to_text` function takes a string `message_content` and a list of `annotations`. Each annotation contains information about a portion of the text that needs to be replaced with a URL. The function processes these annotations to replace the specified parts of the message_content with corresponding URLs.

In [5]:
def add_annotations_to_text(message_content, annotations):
    """
    Replaces source text in message_content with URLs from annotations.

    :param message_content: str, the original text with annotations
    :param annotations: list, list of annotation dictionaries with replacement information
    :return: str, the modified message content with URLs
    """
    # Sort annotations by start_index in descending order to avoid index shifting issues during replacement
    
    annotations = sorted(annotations, key=lambda x: x.start_index, reverse=True)
    
    for annotation in annotations:
        start_index = annotation.start_index
        end_index = annotation.end_index
        url = annotation.url_citation['url']
        title = annotation.url_citation['title']
        
        # Construct the replacement text
        replacement_text = f'[{title}]({url})'
        
        # Replace the text in message_content
        message_content = message_content[:start_index] + replacement_text + message_content[end_index:]
    
    return message_content

# Invoke the function to update the annotations within the text 
annotated_message_content = add_annotations_to_text(message_content, annotations)

print("----- Start of message content output with inline annotations -----")
print(annotated_message_content)
print("----- End of message content output with inline annotations -----")

----- Start of message content output with inline annotations -----
The U.S. housing market in the next two years, 2025 and 2026, is expected to experience various dynamics influenced by factors such as mortgage rates, inventory levels, and economic conditions. Here is a synthesis of expert predictions:

### Home Prices and Market Trends
1. **Moderate Price Growth**:
   - According to Goldman Sachs, home prices are projected to rise by approximately 3.8% in 2025 and 4.9% in 2026[Four years of housing market gridlock? Goldman Sachs issues U.S. home price predictions through 2026](https://finance.yahoo.com/news/four-years-housing-market-gridlock-183219157.html)[Goldman Sachs' 5-Year Housing Forecast from 2024 to 2027](https://www.noradarealestate.com/blog/goldman-sachs-housing-market-forecast/). This growth represents a recovery from the slower pace expected in 2024 but still reflects a tempered increase due to ongoing market corrections from previous highs during the pandemic.

2. **Sel

**Example 2: Retrieve and format the results as a reference JSON**

This example formats the text output in `message_content` using list of `annotations` generate a JSON List of web pages retrieved during the search in format: Title, Summary, and URL.  

In [6]:
import json

def chunk_document(message_content, annotations):
    """
    Chunks the document into a JSON format based on annotations.

    :param message_content: str, the original text with annotations
    :param annotations: list, list of FileCitationAnnotation objects with replacement information
    :return: list, chunked document in JSON format
    """
    # Sort annotations by start_index
    annotations = sorted(annotations, key=lambda x: x.start_index)
    
    chunks = []
    previous_end_index = 0

    for annotation in annotations:
        start_index = annotation.start_index
        end_index = annotation.end_index
        url = annotation.url_citation['url']
        title = annotation.url_citation['title']
        
        summary = message_content[previous_end_index:start_index].strip()
        
        # Post-processing for the annotations to get to the right format
        summary = summary.replace('\n','').replace('#','').replace("*",'').lstrip(".").lstrip("-").strip()    

        # Remove the link that don't have a summary
        if not summary == "": 
            # Create a chunk
            chunk = {
                "title": title,
                "Summary": summary,
                "URL": url
            }
            chunks.append(chunk)
        previous_end_index = end_index
    
        
    return json.dumps(chunks,  indent=4)


In [7]:
json_search_summary = chunk_document(message_content, annotations)
print(json_search_summary)

[
    {
        "title": "Four years of housing market gridlock? Goldman Sachs issues U.S. home price predictions through 2026",
        "Summary": "The U.S. housing market in the next two years, 2025 and 2026, is expected to experience various dynamics influenced by factors such as mortgage rates, inventory levels, and economic conditions. Here is a synthesis of expert predictions: Home Prices and Market Trends1. Moderate Price Growth:   - According to Goldman Sachs, home prices are projected to rise by approximately 3.8% in 2025 and 4.9% in 2026",
        "URL": "https://finance.yahoo.com/news/four-years-housing-market-gridlock-183219157.html"
    },
    {
        "title": "25+ Housing Market Predictions for the Next 5 Years [2024-2028]",
        "Summary": "This growth represents a recovery from the slower pace expected in 2024 but still reflects a tempered increase due to ongoing market corrections from previous highs during the pandemic.2. Sellers' Market Continues:   - The market

To recap: In this cookbook, we explored how to use the Browser tool with the Assistants API. We demonstrated how to retrieve content and annotations for use in various applications with the help of two examples.