[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/weaviate/recipes/blob/main/integrations/llm-agent-frameworks/function-calling/openai/openai-swarm.ipynb)

## Set Up Weaviate

In [35]:
import weaviate
import weaviate.classes.config as wc
import os
import re
from weaviate.util import get_valid_uuid
from uuid import uuid4
import json

### Connect to Weaviate

In [37]:
weaviate_client = weaviate.connect_to_wcs(
    cluster_url=CLUSTER_URL, 
    auth_credentials=weaviate.auth.AuthApiKey(AUTH_KEY),
      headers={ "X-OpenAI-Api-Key": OPENAI_KEY})

            Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.
  weaviate_client = weaviate.connect_to_wcs(


### Configure Schema

In [10]:
weaviate_client.collections.delete_all()

In [11]:
weaviate_client.collections.create(
    name="WeaviateBlogs",

    vectorizer_config=wc.Configure.Vectorizer.text2vec_openai( # specify the vectorizer and model
        model="text-embedding-3-large",
    ),
    generative_config=wc.Configure.Generative.openai(
        model = "gpt-4"
    ),

    properties=[
            wc.Property(name="content", data_type=wc.DataType.TEXT)
      ]
)

print("Successfully created collection: Blogs.")

Successfully created collection: Blogs.


### Ingest Blogs into Weaviate

In [14]:
import os
import re

def chunk_list(lst, chunk_size):
    """Break a list into chunks of the specified size."""
    return [lst[i:i + chunk_size] for i in range(0, len(lst), chunk_size)]

def split_into_sentences(text):
    """Split text into sentences using regular expressions."""
    sentences = re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', text)
    return [sentence.strip() for sentence in sentences if sentence.strip()]

def read_and_chunk_index_files(main_folder_path):
    """Read index.md files from subfolders, split into sentences, and chunk every 5 sentences."""
    blog_chunks = []
    for folder_name in os.listdir(main_folder_path):
        subfolder_path = os.path.join(main_folder_path, folder_name)
        if os.path.isdir(subfolder_path):
            index_file_path = os.path.join(subfolder_path, 'index.mdx')
            if os.path.isfile(index_file_path):
                with open(index_file_path, 'r', encoding='utf-8') as file:
                    content = file.read()
                    sentences = split_into_sentences(content)
                    sentence_chunks = chunk_list(sentences, 5)
                    sentence_chunks = [' '.join(chunk) for chunk in sentence_chunks]
                    blog_chunks.extend(sentence_chunks)
    return blog_chunks

# Example usage
main_folder_path = '../../data'
blog_chunks = read_and_chunk_index_files(main_folder_path)

In [16]:
len(blog_chunks)

894

In [18]:
blog_chunks[0]

'---\ntitle: ChatGPT for Generative Search\nslug: generative-search\nauthors: [zain, erika, connor]\ndate: 2023-02-07\ntags: [\'search\', \'integrations\']\nimage: ./img/hero.png\ndescription: "Learn how you can customize Large Language Models prompt responses to your own data by leveraging vector databases."\n---\n![ChatGPT for Generative Search](./img/hero.png)\n\n<!-- truncate -->\n\nWhen OpenAI launched ChatGPT at the end of 2022, more than one million people had tried the model in just a week and that trend has only continued with monthly active users for the chatbot service reaching over 100 Million, quicker than any service before, as reported by [Reuters](https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/) and [Yahoo Finance](https://finance.yahoo.com/news/chatgpt-on-track-to-surpass-100-million-users-faster-than-tiktok-or-instagram-ubs-214423357.html?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_

In [None]:
blogs = weaviate_client.collections.get("WeaviateBlogs")

for idx, blog_chunk in enumerate(blog_chunks):
    upload = blogs.data.insert(
        properties={
            "content": blog_chunk
        }
    )

In [9]:
collection = weaviate_client.collections.get("WeaviateBlogs")
response = collection.aggregate.over_all(total_count=True)

print(response.total_count)

1153


##  Swarm

In [20]:
from openai import OpenAI

from swarm import Swarm, Agent

In [21]:
swarm_client = Swarm()

### Functions

In [30]:
def search_weaviate(search_query: str) -> str:
    """Searches the Weaviate Collection with Hybrid Search."""
    
    response = collection.query.hybrid(
        query=search_query,
        limit=3,
        alpha=0.5,
        return_properties=["content"]
    )

In [42]:
def transfer_to_agent_a():
    return agent_a

def transfer_to_agent_b():
    return agent_b

def transfer_to_agent_c():
    return agent_c

agent_a = Agent(
    name="Lead Marketer",
    model= "gpt-4o",
    instructions="You are an expert in marketing and rely on experts from your company to sell new products.",
    functions=[transfer_to_agent_b, transfer_to_agent_c],
)

agent_b = Agent(
    name="Database Expert",
    model= "gpt-4o",
    instructions="You are a world class database engineer with an extensive knowledge in indexes, multi-tenancy, and more.",
    functions=[transfer_to_agent_a, search_weaviate]
)

agent_c = Agent(
    name="Search Expert",
    model= "gpt-4o",
    instructions="You are an expert in the field of information retrieval. You know everything there is to know about bm25, vector search, hybrid search, and all applications building off of search.",
    functions=[transfer_to_agent_a, search_weaviate]
)

In [43]:
prompt = """
My company is releasing a new feature called hybrid search, which combines keyword and vector search.

Come up with a piece on content that promotes the hybrid search feature. Please include relevant context that is in the database.

"""

response = swarm_client.run(
    agent=agent_a,
    messages=[{"role": "user", "content": prompt}],
)

print(response.messages[-1]["content"])

For promoting the new hybrid search feature, we should emphasize its innovative approach of combining keyword and vector search capabilities. Below is a suggested content piece that highlights the benefits of this feature, along with relevant context:

**Introducing Hybrid Search: The Future of Intelligent Searching**

In today's rapidly evolving digital landscape, accessing the right information efficiently is crucial to success. Whether you're looking for specific documents, data insights, or the latest industry trends, the challenge always lies in sifting through vast amounts of data quickly and accurately. Enter our groundbreaking Hybrid Search feature, a game-changer in the realm of information retrieval.

**What is Hybrid Search?**

Hybrid Search is a unique blend of traditional keyword search and advanced vector search technologies. By merging these two powerful methodologies, Hybrid Search offers unprecedented accuracy and relevance, ensuring users can find exactly what they’re