#### Use Assistants API Browser tool to retrieve and summarize web search results with annotations   

**[Note: Assistants Browser tool is in private alpha at present, this cookbook is to be published when it is offered as public beta.]**

OpenAI's Assistants API allows developers to integrate AI-powered assistants into their applications. These assistants can perform a wide range of tasks, such as answering questions, generating text, translating languages, and more. Assistants API also provides developers access to built in tools such as File Search and Code Interpreter.  

Focus of this cookbook is to walk through the usage of the new Browser tool that allows developers to search, retrieve and summarize web content given an input prompt. Browser tool extends the capability of 

##### Step 1: Setup an assistant with the browser tool
You can learn more about setting up and invoking assistants https://platform.openai.com/docs/assistants/overview. In summary, you need to create an assistant, create a thread, add message to the thread, and invoke run function on the given thread and assistant id.   

In [118]:
from openai import OpenAI

# Model for assistant 
model = "gpt-4o"

# Definition of the browser tool 
BROWSER_TOOL = [
    {
        "type": "browser",
        "browser": {
            "locale": "en-US"
        }
    }
]

# System prompts 
system_prompt = "Conduct a thorough search on the topic provided by the user and output your findings. Provide more detailed text of each citation."

# Create a client 
client = OpenAI()

# Create an assistant 
assistant = client.beta.assistants.create(
            name="INTERNET SEARCH ASSISTANT",
            instructions=system_prompt,
            model=model,
            tools=BROWSER_TOOL
        )

##### Step 2: Invoke the assistant with a user message
In the step below, we'll create a thread, create and append message to the thread, and invoke run to execute assistant on the given thread (without streaming).  

In [119]:
user_message = "Housing market in the US in the next two years 2025 and 2026"

# Create a thread of conversation 
thread = client.beta.threads.create()

# Create a message on the thread 
messages = client.beta.threads.messages.create(
            thread_id=thread.id,
            role="user",
            content=user_message
        )

# Run the assistant with given thread 
run = client.beta.threads.runs.create_and_poll(
            thread_id=thread.id,
            assistant_id=assistant.id,
            temperature=1,
            top_p=1
        )

##### Step 3: Retrieve the message text value and annotations 
The Browser tool will return a response from the LLM that summarizes a variety of sources on the topic, with annotations. You can retrieve the message and annotations as outlined below

In [120]:
# get the message back from the run 
message = client.beta.threads.messages.list(
            thread_id=thread.id,
            after=messages.id,
            order="asc"
        )

message_content = message.data[0].content[0].text.value 

print(message_content)

The US housing market over the next two years (2025 and 2026) is expected to see modest growth, driven by a combination of demand from demographic trends and easing monetary policy, but tempered by ongoing affordability concerns.

### Home Price Projections

**Goldman Sachs** predicts a robust rebound in home prices, with growth rates of 3.8% for 2025 and 4.9% for 2026【7†source】. This outlook is supported by expectations of steady demand and a slight increase in housing supply. **Moody's**, in contrast, is more conservative, forecasting a slower growth rate of 0.3% in 2025 and 0.9% in 2026. They cite increasing housing supply from demographic changes and slightly lower mortgage rates as factors that will moderate price gains【5†source】.

### Factors Influencing the Market

The market dynamics will be influenced by several key factors:
1. **Demographic Trends**: The influx of Millennials and Generation Z entering prime home-buying years is expected to sustain demand. Additionally, new jo

The citations to the URLs retrieved by the API are stored in the `FileCitationAnnotation` object storied as list of annotations. This object includes fields that specify the annotation text appearing in the `message_content` variable, such as `text='【7†source】'`, the position of the annotation in the text (`start_index=371` to `end_index=381`), and the corresponding best-matching URL for the source ` `. For example, the URL citation for the first record in the list is:
```json
url_citation = {
    'url': 'https://www.noradarealestate.com/blog/goldman-sachs-housing-market-forecast/',
    'title': "Goldman Sachs' 5-Year Housing Forecast from 2024 to 2027"
}
```

In [121]:
annotations = message.data[0].content[0].text.annotations

print(annotations)

[FileCitationAnnotation(end_index=381, file_citation=None, start_index=371, text='【7†source】', type='url_citation', url_citation={'url': 'https://www.noradarealestate.com/blog/goldman-sachs-housing-market-forecast/', 'title': "Goldman Sachs' 5-Year Housing Forecast from 2024 to 2027"}), FileCitationAnnotation(end_index=744, file_citation=None, start_index=734, text='【5†source】', type='url_citation', url_citation={'url': 'https://www.fastcompany.com/91140840/housing-market-home-price-outlook-2024-2025-2026-2027', 'title': "Where will U.S. home prices go through 2027? Goldman Sachs Vs. Moody's\xa0forecasts - Fast Company"}), FileCitationAnnotation(end_index=1081, file_citation=None, start_index=1071, text='【6†source】', type='url_citation', url_citation={'url': 'https://gordcollins.com/real-estate/real-estate-forecast-next-5-years/', 'title': 'Housing Market Forecast 2024 2025 | Predictions for Next 5 Years'}), FileCitationAnnotation(end_index=1091, file_citation=None, start_index=1081, t

### Step 4: Post-Processing

After retrieving the message containing the text and its annotations, you can process this information based on your application's requirements using the data in the `Annotations`. Below are two examples to illustrate this.

**Example 1: Adding Annotations to the Text**

The annotations referenced in the text value returned by the Browser tool (stored in the `message_content` variable) do not include the actual URLs of the web pages. 

The following `add_annotations_to_text` function takes a string `message_content` and a list of `annotations`. Each annotation contains information about a portion of the text that needs to be replaced with a URL. The function processes these annotations to replace the specified parts of the message_content with corresponding URLs.

In [122]:
def add_annotations_to_text(message_content, annotations):
    """
    Replaces source text in message_content with URLs from annotations.

    :param message_content: str, the original text with annotations
    :param annotations: list, list of annotation dictionaries with replacement information
    :return: str, the modified message content with URLs
    """
    # Sort annotations by start_index in descending order to avoid index shifting issues during replacement
    
    annotations = sorted(annotations, key=lambda x: x.start_index, reverse=True)
    
    for annotation in annotations:
        start_index = annotation.start_index
        end_index = annotation.end_index
        url = annotation.url_citation['url']
        title = annotation.url_citation['title']
        
        # Construct the replacement text
        replacement_text = f'[{title}]({url})'
        
        # Replace the text in message_content
        message_content = message_content[:start_index] + replacement_text + message_content[end_index:]
    
    return message_content

# Invoke the function to update the annotations within the text 
annotated_message_content = add_annotations_to_text(message_content, annotations)

print(annotated_message_content)

The US housing market over the next two years (2025 and 2026) is expected to see modest growth, driven by a combination of demand from demographic trends and easing monetary policy, but tempered by ongoing affordability concerns.

### Home Price Projections

**Goldman Sachs** predicts a robust rebound in home prices, with growth rates of 3.8% for 2025 and 4.9% for 2026[Goldman Sachs' 5-Year Housing Forecast from 2024 to 2027](https://www.noradarealestate.com/blog/goldman-sachs-housing-market-forecast/). This outlook is supported by expectations of steady demand and a slight increase in housing supply. **Moody's**, in contrast, is more conservative, forecasting a slower growth rate of 0.3% in 2025 and 0.9% in 2026. They cite increasing housing supply from demographic changes and slightly lower mortgage rates as factors that will moderate price gains[Where will U.S. home prices go through 2027? Goldman Sachs Vs. Moody's forecasts - Fast Company](https://www.fastcompany.com/91140840/housi

**Example 2: Retrieve and format the results as a reference JSON**

This example formats the text output in `message_content` using list of `annotations` generate a JSON List of web pages retrieved during the search in format: Title, Summary, and URL.  

In [123]:
import json

def chunk_document(message_content, annotations):
    """
    Chunks the document into a JSON format based on annotations.

    :param message_content: str, the original text with annotations
    :param annotations: list, list of FileCitationAnnotation objects with replacement information
    :return: list, chunked document in JSON format
    """
    # Sort annotations by start_index
    annotations = sorted(annotations, key=lambda x: x.start_index)
    
    chunks = []
    previous_end_index = 0

    for annotation in annotations:
        start_index = annotation.start_index
        end_index = annotation.end_index
        url = annotation.url_citation['url']
        title = annotation.url_citation['title']
        
        summary = message_content[previous_end_index:start_index].strip()
        
        # Post-processing for the annotations to get to the right format
        summary = summary.replace('\n','').replace('#','').replace("*",'').lstrip(".").lstrip("-").strip()    

        # Remove the link that don't have a summary
        if not summary == "": 
            # Create a chunk
            chunk = {
                "title": title,
                "Summary": summary,
                "URL": url
            }
            chunks.append(chunk)
        previous_end_index = end_index
    
        
    return json.dumps(chunks,  indent=4)


In [124]:
json_search_summary = chunk_document(message_content, annotations)

In [125]:
print(json_search_summary)

[
    {
        "title": "Goldman Sachs' 5-Year Housing Forecast from 2024 to 2027",
        "Summary": "The US housing market over the next two years (2025 and 2026) is expected to see modest growth, driven by a combination of demand from demographic trends and easing monetary policy, but tempered by ongoing affordability concerns. Home Price ProjectionsGoldman Sachs predicts a robust rebound in home prices, with growth rates of 3.8% for 2025 and 4.9% for 2026",
        "URL": "https://www.noradarealestate.com/blog/goldman-sachs-housing-market-forecast/"
    },
    {
        "title": "Where will U.S. home prices go through 2027? Goldman Sachs Vs. Moody's\u00a0forecasts - Fast Company",
        "Summary": "This outlook is supported by expectations of steady demand and a slight increase in housing supply. Moody's, in contrast, is more conservative, forecasting a slower growth rate of 0.3% in 2025 and 0.9% in 2026. They cite increasing housing supply from demographic changes and slightly