<a target="_blank" href="https://colab.research.google.com/github/weaviate/recipes/blob/main/weaviate-services/agents/transformation-agent-sleep-time-compute.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Sleep-time Compute with Weaviate Agents

This notebook will showcase how Weaviate Agents can help you build systems that utilize Sleep-time Compute!

> "Sleep-time compute is a new way to scale AI capabilities, letting models "think" during downtime. Instead of sitting idle between tasks, AI agents can now use their "sleep" time to process information and form new connections by rewriting their memory state." [Letta AI](https://www.letta.com/blog/sleep-time-compute)

This notebook reproduces the concept in the paper. Extracting predicted questions for each document in a collection of documents. We then use these predicted questions to create an enhanced set of documents. We achieve this with the Weaviate Transformation Agent!

1. We begin by chunking Weaviate's blog posts and importing them to Weaviate, resulting in 1,463 objects.

2. We then run the Transformation Agent to predict potential questions from each chunk. This produces 12 questions per chunk on average. This takes the Transformation Agent **2.5 minutes!**

3. We then create new objects for each `<chunk, predicted_user_query>` pair, resulting in 17,933 objects.

4. We then run the Transformation Agent to create `enhanced_content` for each of the 17,933 objects. This takes the Transformation Agent **12 minutes!**

This notebook also contains interleaved examples using the Query Agent to inspect the results of the Transformation Agent! ♻️


### 1. Import Weaviate Blogs to Weaviate

If interested in using this dataset particularly, you can get it by cloning `weaviate-io` on github from this [link](https://github.com/weaviate/weaviate-io). The markdown files for Weaviate's blogs are in the `blog` folder.

In [None]:
import os
import weaviate

weaviate_client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.getenv("WEAVIATE_URL"),
    auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY")),
)

In [24]:
import os
import tiktoken
import time

import weaviate
import weaviate.collections.classes.config as wvcc
from dotenv import load_dotenv
from weaviate.classes.init import AdditionalConfig, Timeout

load_dotenv()

local_blogs = []

main_folder_path = "./blog/"

for i, folder_name in enumerate(os.listdir(main_folder_path)):
    subfolder_path = os.path.join(main_folder_path, folder_name)
    if os.path.isdir(subfolder_path):
        index_file_path = os.path.join(subfolder_path, "index.mdx")
        if os.path.isfile(index_file_path):
            with open(index_file_path, "r", encoding="utf-8") as file:
                content = file.read()
                local_blogs.append(
                    {
                        "content": content,
                    }
                )

if weaviate_client.collections.exists("Blogs"):
    weaviate_client.collections.delete("Blogs")
blogs = weaviate_client.collections.create(
    name="Blogs",
    vectorizer_config=wvcc.Configure.Vectorizer.text2vec_weaviate(),
    properties=[
        wvcc.Property(name="content", data_type=wvcc.DataType.TEXT),
    ],
)

def chunk_text(text, max_tokens=300):
    enc = tiktoken.get_encoding("cl100k_base")
    tokens = enc.encode(text)
    chunks = []
    
    for i in range(0, len(tokens), max_tokens):
        chunk_tokens = tokens[i:i + max_tokens]
        chunk_text = enc.decode(chunk_tokens)
        chunks.append(chunk_text)
    
    return chunks

chunked_blogs = []
for blog in local_blogs:
    chunks = chunk_text(blog["content"])
    for chunk in chunks:
        chunked_blogs.append({
            "content": chunk
        })

start_time = time.time()
with weaviate_client.batch.dynamic() as batch:
    for blog_chunk in chunked_blogs:
        batch.add_object(
            collection="Blogs",
            properties={
                "content": blog_chunk["content"],
            }
        )
end_time = time.time()
upload_time = end_time - start_time

print(f"Successfully imported {len(chunked_blogs)} blog chunks into Weaviate.")
print(f"Upload time: {upload_time:.2f} seconds")

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/weaviate/collections/classes/config.py:1950: PydanticDeprecatedSince211: Accessing this attribute on the instance is deprecated, and will be removed in Pydantic V3. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
  for cls_field in self.model_fields:


Successfully imported 1463 blog chunks into Weaviate.
Upload time: 8.76 seconds


### 2. Predict Questions per Chunk

In [25]:
from weaviate.agents.transformation import TransformationAgent
from weaviate.agents.classes import Operations
from weaviate.collections.classes.config import DataType

create_questions = Operations.append_property(
    property_name="predicted_user_queries",
    data_type=DataType.TEXT_ARRAY,
    view_properties=["content"],
    instruction="Based on this document, generate as many specific questions as possible that users might ask about this content. Focus on questions that would require reasoning about the information presented.",
)

agent = TransformationAgent(
    client=weaviate_client,
    collection="Blogs",
    operations=[create_questions],
)

response = agent.update_all()

In [27]:
agent.get_status(workflow_id=response.workflow_id)

{'workflow_id': 'TransformationWorkflow-aa4204ddb0b56291823cb50cf8be9ff1',
 'status': {'batch_count': 6,
  'end_time': '2025-04-27 17:50:57',
  'start_time': '2025-04-27 17:48:13',
  'state': 'completed',
  'total_duration': 164.486092,
  'total_items': 1463}}

In [28]:
blogs_collection = weaviate_client.collections.get("Blogs")

response = blogs_collection.query.fetch_objects(limit=1)

# Extract and print content and predicted_user_queries in a readable format
for obj in response.objects:
    print("Content:")
    print(obj.properties.get("content"))
    print("\nPredicted User Queries:")
    for i, query in enumerate(obj.properties.get("predicted_user_queries", []), 1):
        print(f"{i}. {query}")

Content:
refs/schema/vector-index#hnsw-configuration-tips) to help with your use case!

### Text Search Configuration
Weaviate lets you tune how each property is indexed, or if they’re indexed at all! By skipping the indexes, you speed up the insert time and reduce the memory for indexing.

In this demo, we determined that search performance could be improved by exactly matching a particular `title` value. This requires creating an inverted index to find text matches on the `title` property. This is denoted in your schema by setting `indexFilterable` to `True`. We further want to use BM25 scoring on the title, so we create an inverted index for keyword scoring as well. This is achieved by setting `indexSearchable` to `True`. However, for the `url` and `content` properties, although we want the data stored, we do not need an index, so we turn it off by setting both `indexFilterable` and `indexSearchable` to `False`.

### Batch Imports
When importing data into Weaviate, we suggest using 

#### Inspect the results of the Transformation Agent with the Query Agent! ♻️

In [30]:
from weaviate.agents.query import QueryAgent
from weaviate.agents.utils import print_query_agent_response

qa = QueryAgent(
    client=weaviate_client, collections=["Blogs"]
)

response = qa.run("Can you analyze the predicted queries derived from the content? What are some common trends?")
print_query_agent_response(response)





#### 3. Create a Unique Object for each Predicted Query

In [31]:
from weaviate.classes.config import Configure, Property, DataType

if weaviate_client.collections.exists("EnhancedBlogs"):
    weaviate_client.collections.delete("EnhancedBlogs")
blogs = weaviate_client.collections.create(
    name="EnhancedBlogs",
    vectorizer_config=Configure.Vectorizer.text2vec_weaviate(),
    properties=[
        wvcc.Property(name="content", data_type=DataType.TEXT),
        wvcc.Property(name="predicted_user_query", data_type=DataType.TEXT),
    ],
)

In [32]:
from weaviate.classes.query import Filter
import time

total_objects_created = 0
skipped_objects = 0
objects_to_create = []
objects_with_queries = []

start_time = time.time()

# Collect data phase
print("Starting data collection...")
collection_start = time.time()
for blog in blogs_collection.iterator():
    content = blog.properties.get("content")
    predicted_queries = blog.properties.get("predicted_user_queries", [])
    
    if predicted_queries is None:
        predicted_queries = []
        skipped_objects += 1
    
    # For each predicted query, create a new object in EnhancedBlogs collection
    for query in predicted_queries:
        objects_to_create.append({
            "content": content,
            "predicted_user_query": query
        })
        total_objects_created += 1
    
    # Keep track of objects that need predicted_user_queries removed
    if predicted_queries and len(predicted_queries) > 0:
        objects_with_queries.append({
            "uuid": blog.uuid,
            "properties": {k: v for k, v in blog.properties.items() if k != "predicted_user_queries"}
        })
collection_end = time.time()
print(f"Data collection completed in {collection_end - collection_start:.2f} seconds")
print(f"Found {len(objects_to_create)} objects to create and {len(objects_with_queries)} objects to update")

# Batch insert phase for EnhancedBlogs
print("Starting batch insert for EnhancedBlogs...")
batch_start = time.time()
with weaviate_client.batch.dynamic() as batch:
    for obj in objects_to_create:
        batch.add_object(
            collection="EnhancedBlogs",
            properties=obj
        )
batch_end = time.time()
print(f"Batch insert completed in {batch_end - batch_start:.2f} seconds")
print(f"Inserted {len(objects_to_create)} objects into EnhancedBlogs collection")

# Batch update phase for Blogs collection
print("Starting batch update for Blogs collection...")
update_start = time.time()
with weaviate_client.batch.dynamic() as batch:
    for obj in objects_with_queries:
        batch.add_object(
            collection="Blogs",
            uuid=obj["uuid"],
            properties=obj["properties"]
        )
update_end = time.time()
print(f"Batch update completed in {update_end - update_start:.2f} seconds")
print(f"Updated {len(objects_with_queries)} objects in Blogs collection")

total_time = time.time() - start_time
print(f"Total execution time: {total_time:.2f} seconds")
print(f"Created {total_objects_created} new objects with individual predicted queries in EnhancedBlogs collection")
print(f"Skipped {skipped_objects} objects with no predicted queries")
print(f"Removed predicted_user_queries property from {len(objects_with_queries)} source objects in Blogs collection")

Starting data collection...
Data collection completed in 1.03 seconds
Found 17933 objects to create and 1463 objects to update
Starting batch insert for EnhancedBlogs...
Batch insert completed in 79.05 seconds
Inserted 17933 objects into EnhancedBlogs collection
Starting batch update for Blogs collection...
Batch update completed in 7.19 seconds
Updated 1463 objects in Blogs collection
Total execution time: 87.27 seconds
Created 17933 new objects with individual predicted queries in EnhancedBlogs collection
Skipped 0 objects with no predicted queries
Removed predicted_user_queries property from 1463 source objects in Blogs collection


### 4. Create Enhanced Content by Reasoning about the Predicted Query and the Original Content

In [33]:
create_enhanced_context = Operations.append_property(
    property_name="enhanced_content",
    data_type=DataType.TEXT,
    view_properties=["content", "predicted_user_query"],
    instruction="""
    Your task is to enhance this document to make it more useful for future queries.
    
    1. Identify the key concepts, entities, relationships, and facts in the original content
    2. Draw important inferences that might be useful for answering questions about this content
    3. Reorganize and augment the information to make it more accessible for future retrieval
    4. Pre-compute potentially useful calculations, summaries, or analyses
    5. Specifically address the predicted user query
    
    Create a comprehensive enhanced version of this context that includes:
    - All the factual information from the original content
    - Explicit connections between related concepts
    - Inferences and implications that might be useful for answering questions
    - Structured information that makes key relationships clear
    - Direct answers to the predicted user query
    
    The enhanced context should be self-contained and complete enough that when a user asks a question about this content, 
    the model can quickly provide an accurate answer by referencing this enhanced context without needing to perform extensive 
    additional reasoning. Make sure the enhanced content directly addresses the predicted user query.
    """
)

agent = TransformationAgent(
    client=weaviate_client,
    collection="EnhancedBlogs",
    operations=[create_enhanced_context],
)

response = agent.update_all()

In [34]:
agent.get_status(workflow_id=response.workflow_id)

{'workflow_id': 'TransformationWorkflow-1f5817a983af680f4830de8801cd8b2b',
 'status': {'batch_count': 72,
  'end_time': '2025-04-27 18:06:36',
  'start_time': '2025-04-27 17:54:42',
  'state': 'completed',
  'total_duration': 714.313995,
  'total_items': 17933}}

In [53]:
qa = QueryAgent(
    client=weaviate_client, collections=["EnhancedBlogs"]
)

query = """
Can you find an example where the enhanced content does a great job of cleaning up the original content based on the predicted user query?
"""

response = qa.run(query)
print_query_agent_response(response)





In [54]:
query = """
How many null entries are there in enhanced_content?
"""

response = qa.run(query)
print_query_agent_response(response)





In [63]:
enhanced_blogs_collection = weaviate_client.collections.get("EnhancedBlogs")

response = enhanced_blogs_collection.query.hybrid(
    query="How does the Weaviate Transformation Agent work?",
    limit=1
)

# Pretty print the response
print("QueryReturn with", len(response.objects), "objects:")
for i, obj in enumerate(response.objects):
    print(f"\nObject {i+1}:")
    print(f"  UUID: {obj.uuid}")
    
    # Print properties with better formatting
    print("  Properties:")
    if 'content' in obj.properties:
        content = obj.properties['content']
        print(f"    content: {content}")
    
    if 'enhanced_content' in obj.properties:
        enhanced = obj.properties['enhanced_content']
        print(f"    enhanced_content: {enhanced}")
    
    if 'predicted_user_query' in obj.properties and obj.properties['predicted_user_query']:
        print(f"    predicted_user_query: {obj.properties['predicted_user_query']}")

QueryReturn with 1 objects:

Object 1:
  UUID: 15b591a9-eaa3-4989-bf9a-ed469a4a045f
  Properties:
    content: /weaviate/recipes/blob/main/weaviate-services/agents/transformation-agent-get-started.ipynb#scrollTo=Uiu5C8n7v-Xp) that you can use to get started.

⚠️ Since this service is currently in preview, please do not demo it on production collections.
:::

## What is the Transformation Agent

This is our first step into a future of database management where we can start to leave the tedious task of designing updates and changes for our database to an LLM.

You can configure the `TransformationAgent` to access any given collection in Weaviate Cloud and provide it with a list of transformation operations you’d like to perform on it.

For example, think of a scenario where you may have quarterly reports from teams in your company in a collection. With the transformation agent, you can define new properties such as “team” or “quarter” with the instructions “Based on the contents of the r