# Query Agent Demo

## Connect to the Weaviate Cloud instance

> Reminder: Weaviate Agents are only available for Weaviate Cloud instances.

Connect to your Weaviate instance, using credentials from the Weaviate Cloud console. Here, they are loaded from the `.env` file.

In [None]:
from dotenv import load_dotenv
import weaviate
import os

load_dotenv()

weaviate_url = os.getenv("WEAVIATE_URL")
weaviate_api_key = os.getenv("WEAVIATE_API_KEY")

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=weaviate_url,
    auth_credentials=weaviate_api_key,
)

assert client.is_ready()

## Add data

We add two datasets here, one with books and another with movies. The datasets are loaded from the Hugging Face Hub, and they are pre-vectorized using `Snowflake/snowflake-arctic-embed-l-v2.0`. 

### Load data & inspect it briefly

In [None]:
from datasets import load_dataset

movies_dataset = load_dataset("jphwang/weaviate-demos", "movies", split="train", streaming=True)
books_dataset = load_dataset("weaviate/agents", "query-agent-books", split="train", streaming=True)

In [None]:
for d in [movies_dataset, books_dataset]:
    print(f"Dataset: {d.config_name}")
    counter = 0
    for o in d:
        if counter >= 5:
            break
        print(o)
        counter += 1

### Prepare the Collections

Here we create collections and add the objects. 

> ❗️ The `QueryAgent` uses the descriptions of collections and properties to decide which ones to use when solving queries, and to access more information about properties. You can experiment with changing these descriptions, providing more detail, and more. It's good practice to provide property descriptions too.

In [None]:
# ONLY run this if you want to delete the existing collection & data
client.collections.delete(["Movie", "Book"])

In [None]:
from weaviate.classes.config import Configure, Property, DataType

if not client.collections.exists("Movie"):
    client.collections.create(
        "Movie",
        description="A dataset that lists movies, their ratings, original language etc..",
        properties=[
            Property(
                name="title",
                data_type=DataType.TEXT,
                description="The title of the movie",
            ),
            Property(
                name="release_year",
                data_type=DataType.INT,
                description="The release year of the movie",
            ),
            Property(
                name="overview",
                data_type=DataType.TEXT,
                description="Short description of the movie",
            ),
            Property(
                name="genres",
                data_type=DataType.TEXT_ARRAY,
                description="The genres of the movie, in an array format",
            ),
            Property(
                name="vote_average",
                data_type=DataType.NUMBER,
                description="The average user rating of the movie; range is 0-10",
            ),
            Property(
                name="vote_count",
                data_type=DataType.INT,
                description="The number of user votes for the movie",
            ),
            Property(
                name="popularity",
                data_type=DataType.NUMBER,
                description="Calculated popularity of the movie by weighing multiple factors; range is 0-100",
            ),
            Property(
                name="poster_url",
                data_type=DataType.TEXT,
                description="A TMDB URL of the movie poster image",
            ),
            Property(
                name="original_language",
                data_type=DataType.TEXT,
                description="A two-letter code (e.g. 'en') representing the original language of the movie",
            ),
        ],
        vectorizer_config=[
            Configure.NamedVectors.text2vec_weaviate(
                name="default",
                source_properties=["title", "description"],
                model="Snowflake/snowflake-arctic-embed-l-v2.0"
            )
        ],
    )

# Students to create a collection for books
if not client.collections.exists("Book"):
    client.collections.create(
        "Book",
        description="A dataset that lists books, their author, description and genres",
        properties=[
            # Create properties for 'title', 'author', 'description' (use text types), and 'genres' (text array)
        ],
        vectorizer_config=[
            # Add a `NameVectors.text2vec_weaviate` vectorizer called "default"
            # Use the "Snowflake/snowflake-arctic-embed-l-v2.0" model
            # And set the source properties to ["title", "description"]
        ],
    )


Import data

In [None]:
from tqdm import tqdm
from weaviate.util import generate_uuid5
from datetime import datetime, timezone

movies = client.collections.get("Movie")

with movies.batch.fixed_size(batch_size=100) as batch:
    for item in tqdm(movies_dataset):
        obj = item["properties"]

        # Convert release_date to release_year
        obj["release_year"] = obj["release_date"].year
        obj.pop("release_date")

        # Add object to batch for import
        batch.add_object(
            properties=item["properties"],
            uuid=generate_uuid5(item["properties"]["title"]),
            vector={"default": item["vector"]},
        )

# Check for any failed objects during import
if movies.batch.failed_objects:
    print(f"{len(movies.batch.failed_objects)} objects failed during import:")
    for failed in movies.batch.failed_objects[:3]:
        print(failed.message)

In [None]:

from tqdm import tqdm
from weaviate.util import generate_uuid5

books = client.collections.get("Book")

# Import data similarly to movies, but for books
# Generate the UUID using the title of the book
# Remember to set the vector using the "default" vectorizer like {"default": item["vector"]}

## Use Weaviate Query Agent

### Set up the query agent

In [None]:
from weaviate.agents.query import QueryAgent
from weaviate.agents.classes import QueryAgentCollectionConfig

agent = QueryAgent(client=client, collections=[
    QueryAgentCollectionConfig(name="Movie", target_vector="default"),
    QueryAgentCollectionConfig(name="Book", target_vector="default"),
])

The `QueryAgent` will determine wither a given query is a regular search query (vector search), whether it requires aggregations, or both.

In [None]:
response = agent.run("What are some good fantasy films that involve elves?")

In [None]:
print(response.final_answer)

In [None]:
response.display()

You can ask follow-up question as shown below, by providing the previous response as `context`.

In [None]:
new_response = agent.run(
    "And what are some books with a similar vibe to these films?",
    context=response,
)

In [None]:
print(new_response.final_answer)

In [None]:
new_response.display()

In [None]:
# Try your own query here

The agent will select the appropriate collection and query type, based on the user's query and the available data.

In [None]:
response = agent.run("Which author has the most books listed in our collection?")
print(response.final_answer)

In [None]:
response = agent.run("What genres are the most common for this author?", context=response)
print(response.final_answer)

In [None]:
response.display()

#### Considerations - data structure & query limitations

Note that the agent can only form queries based on the data structure and the collections available. Consider whether the following is easily executable by the agent.

In [None]:
response = agent.run("What movies do we have in the collection that are based on this author's books?", context=response)
print(response.final_answer)

In [None]:
response.display()

This above query may be a little tricky, as the current data doesn't provide an easy way to evaluate whether a movie is based on a book, and if so, who the original author is. 

### Search multiple collections at once

In [None]:
multi_collection_query = """
I'm interested in movies and book that are based on European historical events, modern or ancient.
Can you recommend any good ones?
"""

response = agent.run(multi_collection_query)

In [None]:
print(response.final_answer)

In [None]:
response.display()

In [None]:
client.close()