![Introduction to Weaviate](./img/01-cover.png)

![Introduction to Weaviate](./img/02-about-weaviate.png)

### Agenda:

#### What you will see:

- Examples of AI-powered searches
- Create and build a vector database
- Search with a vector database
- Retrieval augmented generation (RAG)
- Scalability considerations

### You will learn:

- About vector, keyword & hybrid searches
    - When to use each one
- How to perform RAG
- How to build a scalable vector DB

## Search: An Introduction

Try searches using this (pre-populated) toy dataset. 

```json
animal_objs = [
    {"description": "brown dog"},
    {"description": "small domestic black cat"},
    {"description": "orange cheetah"},
    {"description": "black bear"},
    {"description": "large white seagull"},
    {"description": "yellow canary"},
]
```

In [1]:
# ================================================================================
# Prep script: Just run the cell for now - don't worry about the details
# ================================================================================
#
# This script:
#     connects to Weaviate,
#     creates a collection,
#     and populates it with the above demo dataset
#
# ================================================================================


import weaviate
from weaviate.classes.config import Configure, Property, DataType
import os

client = weaviate.connect_to_local()

# Work with Weaviate

animals = client.collections.delete("Animals")

animals = client.collections.create(
    name="Animals",
    properties=[
        Property(name="description", data_type=DataType.TEXT),
    ],
    vectorizer_config=[
        Configure.NamedVectors.text2vec_ollama(
            name="description",
            source_properties=["description"],
            api_endpoint="http://host.docker.internal:11434",  # If using Docker, use this to contact your local Ollama instance
            model="nomic-embed-text",  # The model to use, e.g. "nomic-embed-text"
        )
    ],
    generative_config=Configure.Generative.ollama(
        api_endpoint="http://host.docker.internal:11434",  # If using Docker, use this to contact your local Ollama instance
        model="gemma2:2b"
    ),
)

animal_objs = [
    {"description": "brown dog"},
    {"description": "small domestic black cat"},
    {"description": "orange cheetah"},
    {"description": "black bear"},
    {"description": "large white seagull"},
    {"description": "yellow canary"},
]

animals.data.insert_many(animal_objs)

BatchObjectReturn(_all_responses=[UUID('36416803-a5c5-45ef-8de5-dfe619830c24'), UUID('2d1945c4-6b1b-408d-b0e1-d04fc17dc3ef'), UUID('f4f5b6c0-f2d5-4db0-925b-cfcd2d9f60cb'), UUID('b5ce2c1f-9c45-4e1d-95a9-dc9ac17a74ee'), UUID('34a2db49-607a-428d-92e5-0f4d2356a85e'), UUID('98980a42-9342-48d5-84d4-f3aee4bd3718')], elapsed_seconds=0.9282820224761963, errors={}, uuids={0: UUID('36416803-a5c5-45ef-8de5-dfe619830c24'), 1: UUID('2d1945c4-6b1b-408d-b0e1-d04fc17dc3ef'), 2: UUID('f4f5b6c0-f2d5-4db0-925b-cfcd2d9f60cb'), 3: UUID('b5ce2c1f-9c45-4e1d-95a9-dc9ac17a74ee'), 4: UUID('34a2db49-607a-428d-92e5-0f4d2356a85e'), 5: UUID('98980a42-9342-48d5-84d4-f3aee4bd3718')}, has_errors=False)

### Traditional search

In [2]:
query = "cat"

response = animals.query.bm25(query)

print(f"{len(response.objects)} results returned:")
for o in response.objects:
    print(o.properties)

1 results returned:
{'description': 'small domestic black cat'}


But, traditional searches are not very robust. 

In [3]:
query = "kitty"  # Try synonyms or even typos

response = animals.query.bm25(query)

print(f"{len(response.objects)} results returned:")
for o in response.objects:
    print(o.properties)

0 results returned:


### Vector search

But vector search is based on similarity, allowing more forgiving, nuanced search:

In [4]:
query = "cat"

response = animals.query.near_text(query)

print(f"{len(response.objects)} results returned:")
for o in response.objects:
    print(o.properties)

6 results returned:
{'description': 'small domestic black cat'}
{'description': 'orange cheetah'}
{'description': 'yellow canary'}
{'description': 'black bear'}
{'description': 'large white seagull'}
{'description': 'brown dog'}


In [5]:
query = "kitty"  # Try synonyms or even typos

response = animals.query.near_text(query)

print(f"{len(response.objects)} results returned:")
for o in response.objects:
    print(o.properties)

6 results returned:
{'description': 'small domestic black cat'}
{'description': 'orange cheetah'}
{'description': 'black bear'}
{'description': 'yellow canary'}
{'description': 'large white seagull'}
{'description': 'brown dog'}


Vector searches provide forgiving, nuanced, meaning-based similarity search. 

But - what is a vector?

## Introduction to Vectors

![Introduction to Vectors](./img/04-vectors-intro-01.png)

![Introduction to Vectors](./img/04-vectors-intro-02.png)

![Introduction to Vectors](./img/04-vectors-intro-03.png)

![Introduction to Vectors](./img/04-vectors-intro-04.png)

![Introduction to vectors](./img/04-vectors-intro-05.png)

## Why use vector search?

- Better search
    - Find contextually relevant info
    - Allow synonyms, different languages
    - More value from data
- Work together with generative AI models
    - Overcome hallucinations or lack of specific / prioprietary information

![Introduction to RAG](./img/06-rag-intro-01.png)

![Introduction to RAG](./img/06-rag-intro-02.png)

![Introduction to RAG](./img/06-rag-intro-03.png)

![Introduction to RAG](./img/06-rag-intro-04.png)

### Example RAG prompts:

- Summarise the corporate strategy of ACME Co for FY2024-25.
- What is our internal policy on food expenses?
- What smartphone issues do users commonly complain about?

#### 🤔 How can we find data for these prompts with *just* keyword searches?

It's very difficult!

### Example:

#### `What smartphone issues do users commonly complain about?`

How would you search for "smartphone" issues in your data?

- "*phone*"?
- "tablet"?
- "android" and "iphone"?
- Include every smartphone maker, model and name?

### Example:

#### `What smartphone issues do users commonly complain about?`

How would you search for "smartphone" issues in your data?

With vector DBs - you can just use "smartphone" because semantic search takes these into account.

![Introduction to RAG](./img/06-rag-intro-05.png)

# Weaviate in practice

## Build a database

### Preparation: Get the data

We'll use a dataset of movies from TMDB. Let's download the data, and preview it.

In [6]:
import pandas as pd

# movie_df = pd.read_csv("./data/movies.csv")
movie_df = pd.read_csv("https://raw.githubusercontent.com/weaviate-tutorials/intro-workshop/main/data/movies.csv")
movie_df.head()

Unnamed: 0,backdrop_path,genre_ids,id,original_language,original_title,overview,popularity,poster_path,release_date,title,video,vote_average,vote_count,year
0,/rH0DPF7pB35jxLxKb3JRUgCrrnp.jpg,"[10751, 14, 16, 10749]",11224,en,Cinderella,Cinderella has faith her dreams of a better li...,100.819,/avz6S9HYWs4O8Oe4PenBFNX4uDi.jpg,1950-02-22,Cinderella,False,7.044,6523,1950
1,/p47ihFj4A7EpBjmPHdTj4ipyq1S.jpg,[18],599,en,Sunset Boulevard,A hack screenwriter writes a screenplay for a ...,57.74,/sC4Dpmn87oz9AuxZ15Lmip0Ftgr.jpg,1950-08-10,Sunset Boulevard,False,8.312,2485,1950
2,/zyO6j74DKMWfp5snWg6Hwo0T3Mz.jpg,"[80, 18, 9648]",548,ja,羅生門,Brimming with action while incisively examinin...,21.011,/vL7Xw04nFMHwnvXRFCmYYAzMUvY.jpg,1950-08-26,Rashomon,False,8.091,2121,1950
3,/b4yiLlIFuiULuuLTxT0Pt1QyT6J.jpg,"[16, 10751, 14, 12]",12092,en,Alice in Wonderland,"On a golden afternoon, young Alice follows a W...",75.465,/20cvfwfaFqNbe9Fc3VEHJuPRxmn.jpg,1951-07-28,Alice in Wonderland,False,7.2,5697,1951
4,/mxf8hJJkHTCqZP3m4o8E1TtwHHs.jpg,"[35, 10749]",872,en,Singin' in the Rain,"In 1927 Hollywood, a silent film production co...",31.407,/w03EiJVHP8Un77boQeE7hg9DVdU.jpg,1952-04-09,Singin' in the Rain,False,8.2,3036,1952


### Step 1: Connect to Weaviate

You can run Weaviate as:

- a hosted instance on Weaviate Cloud, or
- install Weaviate anywhere using the open-source distribution, such as on AWS, GCP, etc., or locally. 

Today, we will run a local instance on our own devices with Docker.

In fact - you should have Weaviate running already (see the workshop README file). 

In [7]:
import weaviate

# If you have got Weaviate running locally Docker:
client = weaviate.connect_to_local()

Retrieve Weaviate instance information to check our configuration.

In [8]:
client.is_ready()

True

### Step 2: Add data to Weaviate

#### Add collection definition

The equivalent of a SQL "table" is called a "collection" in Weaviate.

We'll create a new collection definition here for "Movie":
- Two "named vectors" -> which will save different "meanings" of the data,
- A "generative" module -> which will allow us to use LLMs with our data, and
- Properties to save our movie data (which are like SQL columns).
    - Just the title, overview, year and popularity for now.

In [9]:
from weaviate.classes.config import Configure, DataType, Property

# DO NOT DO THIS IN PRODUCTION - THIS IS TO DELETE DATA FROM MY PREVIOUS DEMOS
if client.collections.exists("Movie"):
    client.collections.delete("Movie")

# Create a collection
client.collections.create(
    name="Movie",
    # ================================================================================
    # Using our Ollama integration: https://weaviate.io/developers/weaviate/model-providers/ollama
    # Many other integrations available. See https://weaviate.io/developers/weaviate/model-providers/
    # ================================================================================
    vectorizer_config=[
        Configure.NamedVectors.text2vec_ollama(
            name="title",
            source_properties=["title"],
            api_endpoint="http://host.docker.internal:11434",  # If using Docker, use this to contact your local Ollama instance
            model="nomic-embed-text",  # The model to use, e.g. "snowflake-arctic-embed"
        ),
        Configure.NamedVectors.text2vec_ollama(
            name="all_text",
            source_properties=["title", "overview"],
            api_endpoint="http://host.docker.internal:11434",  # If using Docker, use this to contact your local Ollama instance
            model="nomic-embed-text",  # The model to use, e.g. "snowflake-arctic-embed"
        ),
    ],
    generative_config=Configure.Generative.ollama(
        api_endpoint="http://host.docker.internal:11434",
        model="gemma2:2b"
    ),
    # ================================================================================
    # OPTIONAL - SPECIFY YOUR DATA SCHEMA OR HAVE IT INFERRED BY WEAVIATE
    # ================================================================================
    # properties=[
    #     Property(
    #         name="title",
    #         data_type=DataType.TEXT,
    #     ),
    #     Property(
    #         name="overview",
    #         data_type=DataType.TEXT,
    #     ),
    #     Property(
    #         name="popularity",
    #         data_type=DataType.NUMBER,
    #     ),
    #     Property(
    #         name="year",
    #         data_type=DataType.INT,
    #     ),
    # ],
)

<weaviate.collections.collection.sync.Collection at 0x1102ecc50>

Was our collection created successfully? Let's take a look

In [10]:
client.collections.exists("Movie")

True

#### Add data

Let's add objects (SQL rows) to our data. 

In [11]:
movies = client.collections.get("Movie")

movies.data.insert({"title": "SpongeBob in vector space", "year": 2055})

UUID('7dadf9df-fed7-430b-acb0-399e8045e3ae')

In [13]:
len(movies)

6

In [12]:
new_movies_to_add = [
    {"title": f"SpongeBob in vector space {i+2}", "year": 2056 + i} for i in range(5)
]

movies.data.insert_many(new_movies_to_add)

BatchObjectReturn(_all_responses=[UUID('1a87ea83-3da5-4bd9-8319-5a141dc58273'), UUID('6f23b768-ff86-45c3-aa64-b1eae0c71703'), UUID('7a3696bc-d373-4982-a346-969bb918e9ce'), UUID('8319a603-4f02-40fe-b3bb-03640047c59e'), UUID('40e813ff-3393-4f92-8f39-773abbd11797')], elapsed_seconds=0.14514994621276855, errors={}, uuids={0: UUID('1a87ea83-3da5-4bd9-8319-5a141dc58273'), 1: UUID('6f23b768-ff86-45c3-aa64-b1eae0c71703'), 2: UUID('7a3696bc-d373-4982-a346-969bb918e9ce'), 3: UUID('8319a603-4f02-40fe-b3bb-03640047c59e'), 4: UUID('40e813ff-3393-4f92-8f39-773abbd11797')}, has_errors=False)

In [14]:
len(movies)

6

But... what if we have lots of data? (like our movies)

Use `batch` imports!

#### Batch imports

First, let's build objects to add - and take a look at a couple.

In [15]:
data_columns = ['title', 'overview', 'year', 'popularity']

df = movie_df[data_columns]

df.head()

Unnamed: 0,title,overview,year,popularity
0,Cinderella,Cinderella has faith her dreams of a better li...,1950,100.819
1,Sunset Boulevard,A hack screenwriter writes a screenplay for a ...,1950,57.74
2,Rashomon,Brimming with action while incisively examinin...,1950,21.011
3,Alice in Wonderland,"On a golden afternoon, young Alice follows a W...",1951,75.465
4,Singin' in the Rain,"In 1927 Hollywood, a silent film production co...",1952,31.407


> If it all looks fine - let's add objects:
> - https://weaviate.io/developers/weaviate/manage-data/import

In [16]:
from tqdm import tqdm

with movies.batch.fixed_size(200) as batch:
    for i, row in tqdm(df.iterrows()):
        obj_body = {
            c: row[c] for c in data_columns
        }
        batch.add_object(
            properties=obj_body
        )

1322it [00:17, 76.63it/s]


#### Confirm data load

Do we have data? 

Let's get an object count

In [17]:
print(len(movies))

1328


Does the data look right?

Let's grab a few objects from Weaviate!

In [18]:
response = movies.query.fetch_objects(limit=3)
for o in response.objects:
    print(o.properties)

{'overview': 'In October of 1994 three student filmmakers disappeared in the woods near Burkittsville, Maryland, while shooting a documentary. A year later their footage was found.', 'year': 1999.0, 'title': 'The Blair Witch Project', 'popularity': 87.325}
{'overview': "In director Baz Luhrmann's contemporary take on William Shakespeare's classic tragedy, the Montagues and Capulets have moved their ongoing feud to the sweltering suburb of Verona Beach, where Romeo and Juliet fall in love and secretly wed. Though the film is visually modern, the bard's dialogue remains.", 'year': 1996.0, 'title': 'Romeo + Juliet', 'popularity': 31.488}
{'year': 1981.0, 'title': 'An American Werewolf in London', 'overview': 'American tourists David and Jack are savaged by an unidentified vicious animal whilst hiking on the Yorkshire Moors. Retiring to the home of a beautiful nurse to recuperate, David soon experiences disturbing changes to his mind and body.', 'popularity': 40.785}


Let's pause for a second - because we've done a lot!

#### What did we just do?

Here is a conceptual diagram

![img](https://github.com/weaviate-tutorials/intro-workshop/blob/main/images/object_import_process_full.png?raw=1)

### Step 3: Work with the data

Let's try a few more involved queries

#### Filtering (similar to WHERE filter in SQL)

A filter reduces the number of objects based on specific criteria.

In [19]:
from weaviate.classes.query import Filter

response = movies.query.fetch_objects(
    filters=Filter.by_property("year").greater_than(2015),
    limit=3
)

for o in response.objects:
    print(o.properties["title"])

Jackie
The Magnificent Seven
Fantastic Beasts and Where to Find Them


But this does not rank the result in any meaningful way. 

For that, we need a keyword search (as opposed to a keyword *filter*).

#### Keyword search

Keyword search ranks results based on keyword match "scores", according to the BM25 algorithm. These scores are based on how often tokens in the query appear in each data object. 

In [20]:
from weaviate.classes.query import MetadataQuery

response = movies.query.bm25(
    query="galaxy",
    limit=5,
    return_metadata=MetadataQuery(score=True, last_update_time=True)
)

for o in response.objects:
    print(o.metadata.score)
    print(o.metadata.last_update_time)
    print(o.properties)

3.2652101516723633
2024-09-17 08:46:24.670000+00:00
{'year': 2017.0, 'title': 'Guardians of the Galaxy Vol. 2', 'overview': "The Guardians must fight to keep their newfound family together as they unravel the mysteries of Peter Quill's true parentage.", 'popularity': 142.267}
3.2652101516723633
2024-09-17 08:46:32.760000+00:00
{'year': 2023.0, 'title': 'Guardians of the Galaxy Vol. 3', 'overview': 'Peter Quill, still reeling from the loss of Gamora, must rally his team around him to defend the universe along with protecting one of their own. A mission that, if not completed successfully, could quite possibly lead to the end of the Guardians as we know them.', 'popularity': 165.416}
2.1303980350494385
2024-09-17 08:46:15.943000+00:00
{'year': 2002.0, 'title': 'Star Wars: Episode II - Attack of the Clones', 'overview': 'Following an assassination attempt on Senator Padmé Amidala, Jedi Knights Anakin Skywalker and Obi-Wan Kenobi investigate a mysterious plot that could change the galaxy f

#### Semantic search

A semantic search, on the other hand, searches objects based on similarity

In [21]:
import json

response = movies.query.near_text(
    query="galaxy",
    limit=3,
    target_vector="title",
)

for o in response.objects:
    print(json.dumps(o.properties, indent=2))

{
  "year": 1999.0,
  "title": "Galaxy Quest",
  "overview": "For four years, the courageous crew of the NSEA protector - \"Commander Peter Quincy Taggart\" (Tim Allen), \"Lt. Tawny Madison (Sigourney Weaver) and \"Dr.Lazarus\" (Alan Rickman) - set off on a thrilling and often dangerous mission in space...and then their series was cancelled! Now, twenty years later, aliens under attack have mistaken the Galaxy Quest television transmissions for \"historical documents\" and beam up the crew of has-been actors to save the universe. With no script, no director and no clue, the actors must turn in the performances of their lives.",
  "popularity": 62.01
}
{
  "year": 1977.0,
  "title": "Star Wars",
  "overview": "Princess Leia is captured and held hostage by the evil Imperial forces in their effort to take over the galactic Empire. Venturesome Luke Skywalker and dashing captain Han Solo team together with the loveable robot duo R2-D2 and C-3PO to rescue the beautiful princess and restore p

#### How does this work?

- Under the hood, this uses a vector search. It looks for objects which are the most similar to a text input.
- We can inspect the similarity along with the results.

In [22]:
import json

response = movies.query.near_text(
    query="galaxy",
    limit=3,
    target_vector="title",
    return_metadata=MetadataQuery(distance=True)
)

for o in response.objects:
    print(o.metadata)
    print(json.dumps(o.properties, indent=2))

MetadataReturn(creation_time=None, last_update_time=None, distance=0.2529594898223877, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None)
{
  "year": 1999.0,
  "title": "Galaxy Quest",
  "overview": "For four years, the courageous crew of the NSEA protector - \"Commander Peter Quincy Taggart\" (Tim Allen), \"Lt. Tawny Madison (Sigourney Weaver) and \"Dr.Lazarus\" (Alan Rickman) - set off on a thrilling and often dangerous mission in space...and then their series was cancelled! Now, twenty years later, aliens under attack have mistaken the Galaxy Quest television transmissions for \"historical documents\" and beam up the crew of has-been actors to save the universe. With no script, no director and no clue, the actors must turn in the performances of their lives.",
  "popularity": 62.01
}
MetadataReturn(creation_time=None, last_update_time=None, distance=0.4024823307991028, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_scor

This is where "vectors" come in. 

Each object in Weaviate includes a vector - like so:

In [23]:
response = movies.query.near_text(
    query="galaxy",
    limit=3,
    target_vector="title",  # or "overview"
    include_vector=True,
    return_metadata=MetadataQuery(distance=True)
)

for o in response.objects:
    print(o.metadata.distance)
    print(json.dumps(o.properties, indent=2))
    print(o.vector["title"][:5])

0.2529594898223877
{
  "title": "Galaxy Quest",
  "overview": "For four years, the courageous crew of the NSEA protector - \"Commander Peter Quincy Taggart\" (Tim Allen), \"Lt. Tawny Madison (Sigourney Weaver) and \"Dr.Lazarus\" (Alan Rickman) - set off on a thrilling and often dangerous mission in space...and then their series was cancelled! Now, twenty years later, aliens under attack have mistaken the Galaxy Quest television transmissions for \"historical documents\" and beam up the crew of has-been actors to save the universe. With no script, no director and no clue, the actors must turn in the performances of their lives.",
  "year": 1999.0,
  "popularity": 62.01
}
[-0.5893528461456299, 0.5016435980796814, -3.231532096862793, 0.6001119017601013, -0.4437905550003052]
0.4024823307991028
{
  "year": 1977.0,
  "title": "Star Wars",
  "overview": "Princess Leia is captured and held hostage by the evil Imperial forces in their effort to take over the galactic Empire. Venturesome Luke Sk

These vector representations come from deep learning models to those that power LLMs. They capture meaning, and are called vector "embeddings".

#### Generative search

A generative search transforms your data at retrieval time. 

In [24]:
response = movies.generate.near_text(
    query="galaxy",
    limit=5,
    target_vector="title",
    single_prompt="Write a tweet promoting the movie with TITLE: {title} and OVERVIEW: {overview}.",
    grouped_task="What audience demographic might enjoy this group of movies?"
)

In [25]:
print(response.generated)

These movies appeal to a **broad audience** but primarily resonate with those who enjoy **genre-based escapism**:

* **Sci-Fi/Space Opera:**  The core themes of space exploration, aliens, and intergalactic conflict are staples in the genre. Each movie has its own unique take on this common trope.
* **Comedic Relief:** "Galaxy Quest" offers a hilariously lighthearted approach to science fiction, playing on pop culture references. This appeals to those who appreciate clever comedy mixed with action.
* **Classic Sci-Fi Fans:**  "Star Wars," specifically the original trilogy (and its prequel, Episode I) holds nostalgic significance for many fans and remains relevant in terms of storytelling and special effects. 
* **Fans of Adventure/Fantasy:** "Interstellar" appeals to those who enjoy space exploration with a touch of fantastical elements, while "Guardians of the Galaxy Vol.2" offers a more lighthearted but emotional journey of familial bonds.

**In summary,** this group of films caters t

In [26]:
for o in response.objects:
    print(o.generated)
    print(json.dumps(o.properties, indent=2))

🚀 Still haven't seen #GalaxyQuest?  Time travel through the cosmos! 🌌 This hilarious cult classic about washed-up actors saving the galaxy is a must-watch! 😂

Check it out now! 👉 [link to movie]  #TimAllen #SigourneyWeaver #AlanRickman #SpaceAdventure #ComedyGold 🎬 

{
  "overview": "For four years, the courageous crew of the NSEA protector - \"Commander Peter Quincy Taggart\" (Tim Allen), \"Lt. Tawny Madison (Sigourney Weaver) and \"Dr.Lazarus\" (Alan Rickman) - set off on a thrilling and often dangerous mission in space...and then their series was cancelled! Now, twenty years later, aliens under attack have mistaken the Galaxy Quest television transmissions for \"historical documents\" and beam up the crew of has-been actors to save the universe. With no script, no director and no clue, the actors must turn in the performances of their lives.",
  "year": 1999.0,
  "title": "Galaxy Quest",
  "popularity": 62.01
}
🚀🌌 **Get ready for a galactic adventure!** 💥  Princess Leia is captured 

Each object has been transformed into a tweet by the LLM based on our prompt!

## Bonus: Multi-tenancy

![Introduction to Weaviate](./img/08-mt-intro-01.png)

![Introduction to Weaviate](./img/08-mt-intro-03.png)

![Introduction to Weaviate](./img/08-mt-intro-03.png)

![Introduction to Weaviate](./img/08-mt-intro-04.png)

![Introduction to Weaviate](./img/08-mt-intro-05.png)

![Introduction to Weaviate](./img/08-mt-intro-06.png)

### Create a multi-tenant collection

At collection creation time, add:

```python
from weaviate.classes.config import Configure

collection = client.collections.create(
    name=collection_name,
    # ...
    multi_tenancy_config=Configure.multi_tenancy(
        enabled=True,
        # auto_tenant_creation=True,
        # auto_tenant_activation=True,
    ),
    # ...
)
```

#### Actual collection definition:

In [74]:
import weaviate
from weaviate.classes.config import Property, DataType, Configure


client = weaviate.connect_to_local()


if client.collections.exists("SupportChatMT"):
    client.collections.delete("SupportChatMT")


mt_collection = client.collections.create(
    name="SupportChatMT",
    # ========== MT Config ==========
    multi_tenancy_config=Configure.multi_tenancy(
        enabled=True,
        auto_tenant_creation=True,
        auto_tenant_activation=False
    ),
    # ========================================
    properties=[
        Property(name="text", data_type=DataType.TEXT),
        Property(name="dialogue_id", data_type=DataType.INT),
        Property(name="company_author", data_type=DataType.TEXT),
        Property(name="created_at", data_type=DataType.DATE),
    ],
    # ================================================================================
    # Using our Ollama integration: https://weaviate.io/developers/weaviate/model-providers/ollama
    # Many other integrations available. See https://weaviate.io/developers/weaviate/model-providers/
    # ================================================================================
    vectorizer_config=[
        Configure.NamedVectors.text2vec_ollama(
            name="text",
            source_properties=["text"],
            # ========== Use the Dynamic Index ==========
            vector_index_config=Configure.VectorIndex.dynamic(),
            # ========================================
            api_endpoint="http://host.docker.internal:11434",
            model="nomic-embed-text",
        ),
    ],
    generative_config=Configure.Generative.ollama(
        api_endpoint="http://host.docker.internal:11434", model="gemma2:2b"
    )
)

### Interact with tenants

#### Create tenants

In [75]:
new_tenants = ["AcmeCo", "FancyPhones", "BudgetAir", "LightningCars"]

mt_collection.tenants.create(new_tenants)

#### MT `tenant` == ST `collection`

In [76]:
tenant_coll = mt_collection.with_tenant("AcmeCo")

In [77]:
tenant_coll.data.insert(
    properties={"text": "Hello, I have a problem with my phone", "dialogue_id": 1},
)

UUID('2533e902-34a9-499c-898d-ea780f66e7fc')

In [78]:
with client.batch.fixed_size(batch_size=100) as batch:
    for tenant_name in new_tenants:
        for i in range(3):
            batch.add_object(
                collection="SupportChatMT",
                properties={
                    "text": f"Conversation {i}",
                    "dialogue_id": i,
                    "company_author": "AcmeCo",
                },
                tenant=tenant_name
            )

In [79]:
for tenant_name in new_tenants:
    tenant_coll = mt_collection.with_tenant(tenant_name)
    print(tenant_name, len(tenant_coll))

AcmeCo 4
FancyPhones 3
BudgetAir 3
LightningCars 3


### Manage tenants

**Tenants**: data isolation & easy on/off boarding

#### Isolation

In [80]:
# Work with tenant data
tenant_coll = mt_collection.with_tenant("AcmeCo")
response = tenant_coll.query.hybrid("water damage", limit=2)
print(len(response.objects))

2


In [81]:
# Cannot search across multiple tenants
# response = mt_collection.query.hybrid("water damage", limit=2)  # <-- This throws an error

#### Off-boarding

In [82]:
mt_collection.tenants.remove("FancyPhones") 

This deletes all of this tenant's data!

#### On-boarding

In [83]:
mt_collection.tenants.create("NewStartup1") 

You can now add data to this tenant!

#### Manage resources

Tenants can be moved between hot (memory), warm (disk), cold (cloud) storage to manage resource used. 

This is done with the "activity status".

Create a tenant with a particular status:

In [84]:
from weaviate.classes.tenants import Tenant, TenantActivityStatus

mt_collection.tenants.create(
    Tenant(name="NewStartup2", activity_status=TenantActivityStatus.INACTIVE)
)

Inactive (or offloaded) tenants are not available:

In [88]:
my_tenant = mt_collection.with_tenant("NewStartup2")
# my_tenant.data.insert({"text": "Your product is amazing!"})  # Run this - you should see an error

Update a tenant status

In [86]:
mt_collection.tenants.update(
    Tenant(name="NewStartup2", activity_status=TenantActivityStatus.ACTIVE)
)

Now, we can interact with it

In [87]:
my_tenant.data.insert({"text": "Your product is amazing!"})

UUID('9663bed0-64fa-4513-a7c6-948b3107e378')

#### Tenant status: Balance resource use & availability

This allows us to balance resource usage & cost with availability. 

- Active: Available, on RAM/disk (depending on index type)
- Inactive: Unavailable, on disk
- Offloaded: Unavailable, on cloud storage

### When you're done... 

Remember to close the client connection with `client.close()` to close sockets and resources.

In [None]:
client.close()