## Weaviate 1.32 enablement session

### Agenda

- Collection aliases
- Replica movement
- Rotational quantization
- Python: Vectorizer DX refactor
- HNSW connection compression
- RBAC updates
- New models

### Prep

In [1]:
import weaviate
import os

client = weaviate.connect_to_local(
    headers={
        "X-Cohere-Api-Key": os.getenv("COHERE_API_KEY"),
        "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY"),
    }
)

client.is_ready()

True

In [2]:
client.collections.delete(["MoviesConfigA", "MoviesConfigB", "Movies"])
client.alias.delete(alias_name="Movies")

True

### Collection aliases


<small>Note to self - remember to change the Python client branch


uv pip install -U git+https://github.com/weaviate/weaviate-python-client@alias
</small>

#### How things work now

In [3]:
from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
    "MoviesConfigA",
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="body", data_type=DataType.TEXT),
    ],
    vectorizer_config=[
        Configure.NamedVectors.text2vec_cohere(name="default"),
    ],
    generative_config=Configure.Generative.cohere()
)

<weaviate.collections.collection.sync.Collection at 0x104237b50>

Import some objects

In [4]:
objects = [
    {
        "title": "Tropic Thunder",
        "body": "When their frustrated director (Steve Coogan) drops them in the middle of a jungle and dies in an accident, they are forced to rely on their acting skills to survive the real action and danger."
    },
    {
        "title": "Dodgeball",
        "body": "A group of unlikely misfits enter a Las Vegas dodgeball tournament, needing the prize money to save their cherished gym from being taken over by a corporate health fitness chain."
    },
    {
        "title": "Zoolander",
        "body": "Derek Zoolander (Stiller) is tricked by fashion mogul Jacobim Mugatu (Will Ferrell) into assassinating the Prime Minister of Malaysia, whose progressive laws on the fashion industry would harm his businesses.",
    },
    {
        "title": "Blades of Glory",
        "body": "A mismatched pair of banned figure skaters become teammates upon discovering a loophole that will allow them to compete in the sport again."
    }
]

In [5]:
movies_a = client.collections.get("MoviesConfigA")

movies_a.data.insert_many(objects=objects)

BatchObjectReturn(_all_responses=[UUID('d1422f2d-3ef1-4bc8-9e18-3099ed6a24cf'), UUID('ce711cab-ad11-4a90-a577-314f92081650'), UUID('c3d712c7-6337-4cde-91a8-b203d8c4205e'), UUID('2ac5ae03-16b0-49f8-a582-5b6e8825021e')], elapsed_seconds=0.25997209548950195, errors={}, uuids={0: UUID('d1422f2d-3ef1-4bc8-9e18-3099ed6a24cf'), 1: UUID('ce711cab-ad11-4a90-a577-314f92081650'), 2: UUID('c3d712c7-6337-4cde-91a8-b203d8c4205e'), 3: UUID('2ac5ae03-16b0-49f8-a582-5b6e8825021e')}, has_errors=False)

What we've configured so far:

![img/1_32_aliases/1_collection_before.png](img/1_32_aliases/1_collection_before.png)

We can now perform queries

In [6]:
r = movies_a.query.hybrid(
    query="sports comedy",
    limit=2
)

for o in r.objects:
    print(o.properties["title"])

Dodgeball
Blades of Glory


#### Introducing: Aliases

Let's see how to create an alias for the collection we just created.

In [7]:
# ADD YOUR CODE HERE

In [8]:
movies = client.collections.get("Movies")  # <-- Treat the alias as a collection

r = movies.query.hybrid(
    query="sports comedy",
    limit=2
)

for o in r.objects:
    print(o.properties["title"])

Dodgeball
Blades of Glory


This sets up:

![img/1_32_aliases/2_collection_with_alias.png](img/1_32_aliases/2_collection_with_alias.png)

#### Why aliases, though?

Imagine - what if the desired configuration of the collection changes?

You'd have to set up a new collection, like this:

In [9]:
from weaviate.classes.config import Configure, Property, DataType

# Create a new collection with your desired config
# ADD YOUR CODE HERE

<weaviate.collections.collection.sync.Collection at 0x117329c90>

Re-import data

In [10]:
objects = [
    {
        "title": "Tropic Thunder",
        "body": "When their frustrated director (Steve Coogan) drops them in the middle of a jungle and dies in an accident, they are forced to rely on their acting skills to survive the real action and danger.",
        "rating": 9.0,
        "release_year": 2008
    },
    {
        "title": "Dodgeball",
        "body": "A group of unlikely misfits enter a Las Vegas dodgeball tournament, needing the prize money to save their cherished gym from being taken over by a corporate health fitness chain.",
        "rating": 8.5,
        "release_year": 2004
    },
    {
        "title": "Zoolander",
        "body": "Derek Zoolander (Stiller) is tricked by fashion mogul Jacobim Mugatu (Will Ferrell) into assassinating the Prime Minister of Malaysia, whose progressive laws on the fashion industry would harm his businesses.",
        "rating": 7.5,
        "release_year": 2001
    },
    {
        "title": "Blades of Glory",
        "body": "A mismatched pair of banned figure skaters become teammates upon discovering a loophole that will allow them to compete in the sport again.",
        "rating": 8.0,
        "release_year": 2007
    }
]

In [11]:
movies_b = client.collections.get("MoviesConfigB")

movies_b.data.insert_many(objects=objects)

BatchObjectReturn(_all_responses=[UUID('a0b0adc9-5407-4618-be1b-c599f44626c5'), UUID('7ac44a76-4aa0-4c99-b06e-9a158bf69757'), UUID('f2602d38-768a-4a12-86b5-18ee28eaecd7'), UUID('6a2acd98-317d-4c4f-b4a2-38a3a7c9bec5')], elapsed_seconds=0.18184709548950195, errors={}, uuids={0: UUID('a0b0adc9-5407-4618-be1b-c599f44626c5'), 1: UUID('7ac44a76-4aa0-4c99-b06e-9a158bf69757'), 2: UUID('f2602d38-768a-4a12-86b5-18ee28eaecd7'), 3: UUID('6a2acd98-317d-4c4f-b4a2-38a3a7c9bec5')}, has_errors=False)

Then:
- Switch the application code to use the new collection (e.g. `MoviesConfigB`), or
- Have the application be down while re-importing the data into the new collection (with the same name, e.g. `Movies`)

![img/1_32_aliases/3_collection_to_another.png](img/1_32_aliases/3_collection_to_another.png)

#### Aliases

Now, with aliases... this becomes super easy! 

Just re-direct the alias to the new collection:

In [12]:
# ADD YOUR CODE HERE

True

This happens instantaneously, and the application code can continue to use the same alias name.

![img/1_32_aliases/4_collection_alias_swap.png](img/1_32_aliases/4_collection_alias_swap.png)

We can now even delete the old collection, if we want to.

In [14]:
client.collections.delete("MoviesConfigA")

Queries work the same as before!

In [15]:
r = movies.query.hybrid(
    query="sports comedy",
    limit=2
)

for o in r.objects:
    print(o.properties["title"])

Dodgeball
Blades of Glory


But now - we can take advantage of the new collection configuration (e.g. added year data)

In [16]:
# ADD YOUR CODE HERE

Blades of Glory
Tropic Thunder


If you have multiple aliases, you can list them :)

In [17]:
# Get details of one alias
print(client.alias.get(alias_name="Movies"))

# Get details of all aliases
print(client.alias.list_all())

alias='Movies' collection='MoviesConfigB'
{'Movies': AliasReturn(alias='Movies', collection='MoviesConfigB')}


<style>
.admonition {
    padding: 10px;
    margin: 10px; 
    border-left: 4px solid;
    border-radius: 4px;
}
.note { border-color: #007acc; background-color: #f0f8ff; }
.warning { border-color: #ff6b6b; background-color: #fff5f5; }
</style>

<div class="admonition note">
<strong>Note:</strong> Collection aliases are in technical preview in Weaviate 1.32
</div>

### Shard replica movement


<small>Note to self - remember to change the Python client branch


uv pip install -U git+https://github.com/weaviate/weaviate-python-client@1.31/move-replication-under-cluster
</small>

#### What are shards?

![img/1_32_shard_movement/1_what_are_shards.png](img/1_32_shard_movement/1_what_are_shards.png)

#### Shards in a multi-node cluster

- S1, S2, etc -> shard replicas

![img/1_32_shard_movement/2_shards_before.png](img/1_32_shard_movement/2_shards_before.png)

#### Shard movement in a multi-node cluster

1. If some nodes are more loaded than others 
- e.g. due to some nodes having more data, seeing more queries, etc.

![img/1_32_shard_movement/3_shard_replica_movement.png](img/1_32_shard_movement/3_shard_replica_movement.png)

Move shards between nodes to balance the load

2. If cluster is reaching its capacity
- Too much data / too many queries, etc. overall

![img/1_32_shard_movement/4_shard_replica_movement_to_new_node.png](img/1_32_shard_movement/4_shard_replica_movement_to_new_node.png)

#### Replica movement syntax

In [None]:
collection_name = "CollectionWithReplicas"

collection_sharding_state = client.cluster.query_sharding_state(
    collection=collection_name
)
if collection_sharding_state and collection_sharding_state.shards:
    print(f"Shards in '{collection_name}': {[s.name for s in collection_sharding_state.shards]}")

In [None]:
# Get 'shard_name' above ⬆️

shard_name = "tF9DtxC59ykC"
specific_shard_state = client.cluster.query_sharding_state(collection=collection_name, shard=shard_name)

if specific_shard_state and specific_shard_state.shards:
    print(f"Nodes for shard '{shard_name}': {specific_shard_state.shards[0].replicas}")

Initiate shard movement

In [None]:
from weaviate.cluster.models import TransferType

operation_id = client.cluster.replicate(
    collection=collection_name,
    shard=shard_name,
    source_node=source_node_name,
    target_node=target_node_name,
    transfer_type=TransferType.COPY,  # or TransferType.MOVE
)
print(f"Replication initiated, ID: {operation_id}")

Check operation status

In [None]:
op_status = client.cluster.replications.get(
    uuid=operation_id,
    # include_history=True
)
print(f"Status for {operation_id}: {op_status.status.state}")

Operations can be:
- Cancelled
- Deleted (cancel, then delete)

Can also list all operations.

### Rotational quantization

New quantization method for vectors. RQ:

- Does not require training
- Increases query speed
- Has negligible impact on search quality

In [None]:
from weaviate.classes.config import Configure, Property, DataType

collection_name = "Movies"

client.collections.delete(collection_name)

client.collections.create(
    collection_name,
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="body", data_type=DataType.TEXT),
    ],
    vectorizer_config=[
        Configure.NamedVectors.text2vec_cohere(
            vector_index_config=Configure.VectorIndex.hnsw(
                # ADD YOUR CODE HERE
            )
        )
    ]
)

![img/1_32_rq/rq-test-results.png](img/1_32_rq/rq-test-results.png)

<style>
.admonition {
    padding: 10px;
    margin: 10px; 
    border-left: 4px solid;
    border-radius: 4px;
}
.note { border-color: #007acc; background-color: #f0f8ff; }
.warning { border-color: #ff6b6b; background-color: #fff5f5; }
</style>

<div class="admonition note">
<strong>Note:</strong> Rotational quantization is in technical preview in Weaviate 1.32
</div>

Coming soon (1.33?) - further quantization improvements

- Lower bit precision
- Nudges to make it easier to use quantization

### HNSW connection compression

![img/1_32_hnsw/hnsw-overall.png](img/1_32_hnsw/hnsw-overall.png)

#### Connections

![img/1_32_hnsw/hnsw-params.png](img/1_32_hnsw/hnsw-params.png)

Lots of connections between nodes in HNSW
- Each node (vector) will typically have ~32/64 connections
- At millions of vectors, this can be a lot of memory
    - Especially for multi-vector embeddings

**Connection compression reduces the memory footprint of these connections**

#### User interface for compressed connections

Nothing! 😜

Just works out of the box, from 1.32 onwards.

### Vectorizer refactor


<small>Note to self - remember to change the Python client branch


uv pip install -U git+https://github.com/weaviate/weaviate-python-client@vectors/deprecate-legacy-introduce-named-vectors-only-syntax
</small>

History of our vectorizer syntax:

In [None]:
from weaviate.classes.config import Configure, Property, DataType

collection_name = "Movies"

client.collections.delete(collection_name)

client.collections.create(
    collection_name,
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(
            name="body",
            data_type=DataType.TEXT,
            skip_vectorization=True
        ),  # <-- indicator to not vectorize in vectorizer
    ],
    # Vectorizer
    vectorizer_config=Configure.Vectorizer.text2vec_cohere(),
    # Vectorize all properties unless indicated above
    # One vector per object
    # Vector index config separate
    vector_index_config=Configure.VectorIndex.hnsw(
        quantizer=Configure.VectorIndex.Quantizer.rq()
    )
)

In [None]:
from weaviate.classes.config import Configure, Property, DataType

collection_name = "Movies"

client.collections.delete(collection_name)

client.collections.create(
    collection_name,
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="body", data_type=DataType.TEXT),
    ],
    # Named vectors
    vectorizer_config=[
        # Enable multiple vectors per object
        Configure.NamedVectors.text2vec_cohere(
            # Source properties (to vectorize) defined here
            source_properties=["title"],
            # Vector index config inside each named vector
            vector_index_config=Configure.VectorIndex.hnsw(
                quantizer=Configure.VectorIndex.Quantizer.rq()
            )
        )
        # Syntax got a bit longer over time
    ]
)

In [None]:
from weaviate.classes.config import Configure, Property, DataType

collection_name = "Movies"

client.collections.delete(collection_name)

client.collections.create(
    collection_name,
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="body", data_type=DataType.TEXT),
    ],
    # If defining one vector
    # ADD YOUR CODE HERE
)

In [None]:
from weaviate.classes.config import Configure, Property, DataType

collection_name = "Movies"

client.collections.delete(collection_name)

client.collections.create(
    collection_name,
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="body", data_type=DataType.TEXT),
    ],
    # If defining multiple vectors
    # ADD YOUR CODE HERE
)

Then - the usage remains the same as before

In [None]:
movies = client.collections.get(collection_name)

movies.data.insert_many(objects=objects)

In [None]:
response = movies.query.fetch_objects(
    limit=3,
    include_vector=True
)

for o in response.objects:
    print(f"\nObject ID: {o.uuid}")
    for k, v in o.vector.items():
        print(f"Vector {k}: {v[:3]}...")

### Others

#### RBAC

- Option to include RBAC roles & users to Weaviate backups

#### New model integrations

Support added for:

- Gemini embedding models ([Google docs](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api))
  - `gemini-embedding-001`
  - `text-embedding-005`
  - `text-multilingual-embedding-002`

- Transformers models
  - `intfloat/multilingual-e5-large`
  - `Qwen/Qwen3-Embedding-0.6B` 
  - `Qwen/Qwen3-Embedding-4B` 