# Vector playground - Pre-vectorized data

Learn how to work with pre-computed vectors in Weaviate. This notebook shows how to create collections without vectorizers and insert your own embeddings.

## Connect to Weaviate

Connect to a Weaviate instance.

In [1]:
# Refresh credentials & load the Weaviate IP
from helpers import update_creds

AWS_ACCESS_KEY, AWS_SECRET_KEY, AWS_SESSION_TOKEN = update_creds()

%store -r WEAVIATE_IP

In [2]:
import weaviate

client = weaviate.connect_to_local(
    WEAVIATE_IP,
    headers = {
        "X-AWS-Access-Key": AWS_ACCESS_KEY,
        "X-AWS-Secret-Key": AWS_SECRET_KEY,
        "X-AWS-Session-Token": AWS_SESSION_TOKEN,
    }        
)

client.is_ready()

True

## Create a collection with no vectorizer

When working with pre-computed vectors, configure the collection without an automatic vectorizer.

[Docs - Vector configuration](https://weaviate.io/developers/weaviate/config-refs/schema/vector-index)

In [3]:
from weaviate.classes.config import Configure, VectorDistances

# Delete the collection if it already exists
if client.collections.exists("PlaygroundCollection"):
    client.collections.delete("PlaygroundCollection")

client.collections.create(
    name="PlaygroundCollection",
    vector_config=Configure.Vectors.self_provided(
        vector_index_config=Configure.VectorIndex.hnsw(
            distance_metric=VectorDistances.COSINE  # Choose distance metric
        ),
    )
)

print(f"Successfully created collection: PlaygroundCollection")

Successfully created collection: PlaygroundCollection


## Insert an object with a vector

Add a single object with its pre-computed vector embedding.

In [4]:
playground = client.collections.use("PlaygroundCollection")

playground.data.insert(
    properties={
        "title": "First Document",
        "content": "This is a sample document about machine learning.",
        "category": "technology"
    },
    vector=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
)

UUID('0e77c825-35d9-4988-81ef-8f20afb370ab')

In [5]:
# Verify the object was inserted with its vector
response = playground.query.fetch_objects(include_vector=True, limit=1)

print("Properties:", response.objects[0].properties)
print("Vector:", response.objects[0].vector)

Properties: {'title': 'First Document', 'content': 'This is a sample document about machine learning.', 'category': 'technology'}
Vector: {'default': [0.10000000149011612, 0.10000000149011612, 0.10000000149011612, 0.10000000149011612, 0.10000000149011612, 0.10000000149011612]}


## Insert many objects with vectors using batch

Use batch processing for efficient bulk insertion of pre-vectorized data.

In [6]:
# Sample data with pre-computed vectors
sample_data = [
    {
        "title": "AI Research Paper",
        "content": "Deep learning advances in computer vision applications.",
        "category": "research",
        "vector": [0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
    },
    {
        "title": "Financial Analysis",
        "content": "Market trends and economic indicators for Q4.",
        "category": "finance",
        "vector": [0.3, 0.1, -0.1, -0.3, -0.5, -0.7]
    },
    {
        "title": "Cloud Computing Guide",
        "content": "Best practices for AWS infrastructure deployment.",
        "category": "technology",
        "vector": [0.4, 0.41, 0.42, 0.43, 0.44, 0.45]
    },
    {
        "title": "Healthcare Innovation",
        "content": "Digital transformation in medical diagnosis systems.",
        "category": "healthcare",
        "vector": [0.5, 0.5, 0, 0, 0, 0]
    },
]

print(f"Sample data prepared: {len(sample_data)} documents")

Sample data prepared: 4 documents


In [7]:
# Batch insert the sample data
with playground.batch.dynamic() as batch:
    for item in sample_data:
        batch.add_object(
            properties={
                "title": item["title"],
                "content": item["content"],
                "category": item["category"],
            },
            vector=item["vector"]
        )

print(f"Total objects in collection: {len(playground)}")

Total objects in collection: 5


## Vector search queries

Available query types when working with vector embeddings:

1. [near_vector](https://weaviate.io/developers/weaviate/search/similarity#search-with-a-vector) - Search with a query vector
2. [near_object](https://weaviate.io/developers/weaviate/search/similarity#search-with-an-existing-object) - Search with an existing object

### Near vector search

Search using a query vector to find similar documents.

In [8]:
# Basic vector search
response = playground.query.near_vector(
    near_vector=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
    limit=3,
)

print("Basic vector search results:")
for item in response.objects:
    print(f"- {item.properties['title']} ({item.properties['category']})")
    print(f"  UUID: {item.uuid}\n")

Basic vector search results:
- First Document (technology)
  UUID: 0e77c825-35d9-4988-81ef-8f20afb370ab



In [9]:
# Vector search with distance and vector output
from weaviate.classes.query import MetadataQuery

response = playground.query.near_vector(
    near_vector=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
    include_vector=True,
    return_metadata=MetadataQuery(distance=True),
    limit=2,
)

print("Vector search with metadata:")
for item in response.objects:
    print(f"Title: {item.properties['title']}")
    print(f"Distance: {item.metadata.distance:.4f}")
    print(f"Vector: {item.vector}\n")

Vector search with metadata:
Title: First Document
Distance: 0.1013
Vector: {'default': [0.10000000149011612, 0.10000000149011612, 0.10000000149011612, 0.10000000149011612, 0.10000000149011612, 0.10000000149011612]}



### Vector search with filters

Combine vector similarity with property-based filtering.

In [10]:
from weaviate.classes.query import Filter, MetadataQuery

# Search for technology-related documents only
response = playground.query.near_vector(
    near_vector=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
    return_metadata=MetadataQuery(distance=True),
    filters=Filter.by_property("category").equal("technology"),
    limit=5,
)

print("Technology documents (filtered search):")
for item in response.objects:
    print(f"- {item.properties['title']}")
    print(f"  Category: {item.properties['category']}")
    print(f"  Distance: {item.metadata.distance:.4f}\n")

Technology documents (filtered search):
- First Document
  Category: technology
  Distance: 0.1013



### Near object search

Find similar objects using an existing object as reference.

> Note: The first result is always the reference object itself (distance = 0).

In [11]:
# Get an object ID to use as reference
reference_response = playground.query.fetch_objects(limit=1)
reference_uuid = reference_response.objects[0].uuid

print(f"Using reference object: {reference_response.objects[0].properties['title']}")
print(f"Reference UUID: {reference_uuid}\n")

Using reference object: Financial Analysis
Reference UUID: 0dc498aa-c846-4639-b410-5598063ee573



In [12]:
from weaviate.classes.query import MetadataQuery

# Find objects similar to the reference object
response = playground.query.near_object(
    near_object=reference_uuid,
    return_metadata=MetadataQuery(distance=True),
    limit=4,
)

print("Objects similar to reference:")
for i, item in enumerate(response.objects):
    marker = "(reference)" if i == 0 else ""
    print(f"{i+1}. {item.properties['title']} {marker}")
    print(f"   Category: {item.properties['category']}")
    print(f"   Distance: {item.metadata.distance:.4f}\n")

Objects similar to reference:
1. First Document (reference)
   Category: technology
   Distance: 1.5053



## Data exploration

Explore the collection contents and verify data quality.

In [13]:
# Show all documents in the collection
response = playground.query.fetch_objects(limit=10)

print(f"All documents in PlaygroundCollection ({len(response.objects)} total):")
for item in response.objects:
    print(f"- {item.properties['title']} [{item.properties['category']}]")
    print(f"  Content: {item.properties['content'][:50]}...")
    print()

All documents in PlaygroundCollection (5 total):
- Financial Analysis [finance]
  Content: Market trends and economic indicators for Q4....

- First Document [technology]
  Content: This is a sample document about machine learning....

- Cloud Computing Guide [technology]
  Content: Best practices for AWS infrastructure deployment....

- Healthcare Innovation [healthcare]
  Content: Digital transformation in medical diagnosis system...

- AI Research Paper [research]
  Content: Deep learning advances in computer vision applicat...



## Close the client

Always close your connection when finished.

In [14]:
client.close()