# Movies and Books Assistant with Weaviate Query Agent
**Demo for the Reimanining Data Workflows Webinar - April 29th 2025**

Example code for how we can build an agent which can answer quuestions about books and movies, using the [Weaviate Query Agent](https://weaviate.io/developers/agents?utm_source=webinar&utm_campaign=agents&utm_content=reimagining-data-workflows).

> 📚 You can read and learn more about this service in our ["Introducing the Weaviate Query Agent"](https://weaviate.io/blog/query-agent?utm_source=webinar&utm_campaign=agents&utm_content=reimagining-data-workflows) blog.

To get started, we've prepared a few open datasets, available on Hugging Face. The first step will be walking through how to populate your Weaviate Cloud collections.

- [**Movies:**](https://huggingface.co/datasets/weaviate/agents/viewer/personalization-agent-movies) A dataset that lists movies, their descriptions, ratings etc.
- [**Books:**](https://huggingface.co/datasets/weaviate/agents/viewer/query-agent-books) A dataset that lists books, their authors, descriptions and genres.


## 1. Setting Up Weaviate & Importing Data

To use the Weaviate Query Agent, first, create a [Weaviate Cloud](https://weaviate.io/deployment/serverless?utm_source=webinar&utm_campaign=agents&utm_content=reimagining-data-workflows) account👇
1. [Create Serverless Weaviate Cloud account](https://weaviate.io/deployment/serverless?utm_source=webinar&utm_campaign=agents&utm_content=reimagining-data-workflows) and setup a free [Sandbox](https://weaviate.io/developers/wcs/manage-clusters/create#sandbox-clusters?utm_source=webinar&utm_campaign=agents&utm_content=reimagining-data-workflows)
2. Go to 'Embedding' and enable it, by default, this will make it so that we use `Snowflake/snowflake-arctic-embed-l-v2.0` as the embedding model
3. Take note of the `WEAVIATE_URL` and `WEAVIATE_API_KEY` to connect to your cluster below

> Info: We recommend using [Weaviate Embeddings](https://weaviate.io/developers/weaviate/model-providers/weaviate?utm_source=webinar&utm_campaign=agents&utm_content=reimagining-data-workflows) so you do not have to provide any extra keys for external embedding providers.

In [None]:
!pip install 'weaviate-client[agents]' datasets

In [3]:
import os
from getpass import getpass

if "WEAVIATE_API_KEY" not in os.environ:
  os.environ["WEAVIATE_API_KEY"] = getpass("Weaviate API Key")
if "WEAVIATE_URL" not in os.environ:
  os.environ["WEAVIATE_URL"] = getpass("Weaviate URL")

In [4]:
import weaviate
from weaviate.auth import Auth

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.environ.get("WEAVIATE_URL"),
    auth_credentials=Auth.api_key(os.environ.get("WEAVIATE_API_KEY")),
)

### Prepare the Collections

In the following code blocks, we are pulling our demo datasets from Hugging Face and writing them to new collections in our Weaviate Serverless cluster.

> ❗️ The `QueryAgent` uses the descriptions of collections and properties to decide which ones to use when solving queries, and to access more information about properties. You can experiment with changing these descriptions, providing more detail, and more. It's good practice to provide property descriptions too.

In [None]:
from weaviate.classes.config import Configure, Property, DataType

# client.collections.delete("Movies")
# client.collections.delete("Books")
client.collections.create(
    "Books",
    description="A dataset that lists books, their author, description and genres",
    vectorizer_config=Configure.Vectorizer.text2vec_weaviate(),
    properties=[
        Property(name="title", data_type=DataType.TEXT, description="title of the book"),
        Property(name="author", data_type=DataType.TEXT, description="author of the book"),
        Property(name="description", data_type=DataType.TEXT, description="description of the book"),
        Property(name="genres", data_type=DataType.TEXT_ARRAY, description="genres of the book"),
      ]
)

client.collections.create(
    "Movies",
    description="A dataset that lists movies, their ratings, original language etc..",
    vectorizer_config=Configure.Vectorizer.text2vec_weaviate(),
    properties=[
        Property(
            name="title",
            data_type=DataType.TEXT,
            description="The title of the movie",
        ),
        Property(
            name="release_date",
            data_type=DataType.TEXT,
            description="The release date of the movie",
        ),
        Property(
            name="overview",
            data_type=DataType.TEXT,
            description="Short description of the movie",
        ),
        Property(
            name="genres",
            data_type=DataType.TEXT_ARRAY,
            description="The genres of the movie",
        ),
        Property(
            name="vote_average",
            data_type=DataType.NUMBER,
            description="vote average of the movie",
        ),
        Property(
            name="vote_count",
            data_type=DataType.INT,
            description="vote count of the movie",
        ),
        Property(
            name="popularity",
            data_type=DataType.NUMBER,
            description="popularity of the movie",
        ),
        Property(
            name="poster_url",
            data_type=DataType.TEXT,
            description="poster path of the movie",
            skip_vectorization=True,
        ),
        Property(
            name="original_language",
            data_type=DataType.TEXT,
            description="Code of the language of the movie",
            skip_vectorization=True,
        ),
    ]
)


In [5]:
from datasets import load_dataset

# movies_dataset = load_dataset("weaviate/agents", "personalization-agent-movies", split="train", streaming=True)
# books_dataset = load_dataset("weaviate/agents", "query-agent-books", split="train", streaming=True)

movies_collection = client.collections.get("Movies")
books_collection = client.collections.get("Books")

# with movies_collection.batch.dynamic() as batch:
#     for item in movies_dataset:
#         batch.add_object(properties=item["properties"])

# with books_collection.batch.dynamic() as batch:
#     for item in books_dataset:
#         batch.add_object(properties=item["properties"])

  from .autonotebook import tqdm as notebook_tqdm


## 2. Set Up the Query Agent

Let's start with a simple agent. Here, we're creating an `agent` that has access to our `Books` & `Movies` datasets.

In [6]:
from weaviate.agents.query import QueryAgent

agent = QueryAgent(
    client=client, collections=["Books", "Movies"],
)

## 3. Run the Query Agent
The `QueryAgent` will determine wither a given query is a regular searcg query (vector search), or whether it requires aggregations, or both.

In [7]:
response = agent.run("What are some good fantasy films that involve elves?")
response.display()





### Ask a follow up question

The agent can also be provided with additional context. For example, we can provide the previous response as context and get a `new_response`

In [8]:
new_response = agent.run("And what are some books with a similar vibe to these films?", context=response)
new_response.display()





Now let's try a question that sholud require an aggregation. Let's see which author has the most books in our collection.

In [9]:
response = agent.run("Which author has the most books listed in our collection?")
response.display()





In [10]:
response = agent.run("And are there any films based on this author's books?", context=response)
response.display()





### Search over multiple collections

In some cases, we need to combine the results of searches across multiple collections.

In [11]:
response = agent.run("I'm interested in historical fiction books, can you recommend any good ones?"
                     "Are there any films based on historical fiction? And on average, what's the original language that they were filmed in?")

response.display()





### Changing the System Prompt

In some cases, you may want to define a custom `system_prompt` for your agent. This can help you provide the agent with some default instructions as to how to behave. For example, let's create an agent that is designed to give short bullet point answers.


In [12]:
new_agent = QueryAgent(
    client=client, collections=["Books", "Movies"],
    system_prompt="You are a helpful movies and books assistant that always responds in short, bullet point answsers."
)

For example, this time lets ask something that is about weather!

In [13]:
response = new_agent.run("I'm interested in historical fiction books, can you recommend any good ones?"
                         "Are there any films based on historical fiction? And on average, what's the original language that they were filmed in?")
print(response.final_answer)

- Recommended historical fiction books:
  1. "Castles, Customs, and Kings: True Tales by English Historical Fiction Authors" by Debra Brown
  2. "Post Captain" by Patrick O'Brian
  3. "People of the Book" by Geraldine Brooks
  4. "The Shadow of the Wind" by Carlos Ruiz Zafón
  5. "The Winds of War" by Herman Wouk

- Films based on historical fiction:
  1. "Hamilton" (2020)
  2. "The Lost City of Z" (2017)
  3. "The Kashmir Files" (2022)
  4. "Suffragette" (2015)
  5. "Denial" (2016)
  6. "Hostiles" (2017)

- Average original language of films based on historical fiction is predominantly English.
