## Scenario 2 -  Multi-tenancy

### **SupportWizard** - Support Analyis SaaS Platform

- Allow its users to sign up and upload their own customer support data
- They would use the platform to information to identify where they could improve their support processes

### Solution 

Each end user will have their own isolated "space", to which they can uplaod data. Then, they can use SupportWizard dashboards / platform to see analyses of their own data. 

## Set your preferred model type here

In [1]:
# model_type = "ollama"
model_type = "cohere"

### Then, run the cell below

In [2]:
from weaviate.classes.config import Configure

if model_type == "ollama":
    vectorizer_config = Configure.NamedVectors.text2vec_ollama(
        name="text_with_metadata",
        source_properties=["text", "company_author"],
        vector_index_config=Configure.VectorIndex.hnsw(),
        api_endpoint="http://host.docker.internal:11434",
        model="nomic-embed-text",
    )
    generative_config = Configure.Generative.ollama(
        api_endpoint="http://host.docker.internal:11434",
        model="gemma2:2b"
    )
else:
    vectorizer_config = Configure.NamedVectors.text2vec_cohere(
        name="text_with_metadata",
        source_properties=["text", "company_author"],
        vector_index_config=Configure.VectorIndex.hnsw(),
        model="embed-multilingual-light-v3.0",
    )

    generative_config = Configure.Generative.cohere(
        model="command-r-plus"
    )



### Create the collection


In [3]:
import os
import weaviate
from weaviate.classes.config import Property, DataType, Configure
from dotenv import load_dotenv

load_dotenv()

client = weaviate.connect_to_local(
    headers={"X-Cohere-Api-Key": os.getenv("WORKSHOP_COHERE_KEY")}
)

collection_name = "SupportChat"

# For re-running the demo only: Delete existing collection if it exists
client.collections.delete(collection_name)

# Create a new collection with specified properties and vectorizer configuration
chunks = client.collections.create(
    name=collection_name,
    properties=[
        Property(name="text", data_type=DataType.TEXT),
        Property(name="dialogue_id", data_type=DataType.INT),
        Property(name="company_author", data_type=DataType.TEXT),
        Property(name="created_at", data_type=DataType.DATE),
    ],
    vectorizer_config=[vectorizer_config],
    generative_config=generative_config,
    # ============================================================
    # ⬇️⬇️ This is the only change from the previous script ⬇️⬇️
    # ============================================================
    multi_tenancy_config=Configure.multi_tenancy(enabled=True, auto_tenant_creation=True)
)

### Helper functions for loading data

In [4]:
import h5py
import json
import numpy as np
from typing import Literal
from pathlib import Path


def get_hdf5_obj(file_path):
    with h5py.File(file_path, "r") as hf:
        for uuid in hf.keys():
            src_obj = hf[uuid]

            # Get the object properties
            properties = json.loads(src_obj["object"][()])

            # Get the vector(s)
            vectors = {}
            for key in src_obj.keys():
                if key.startswith("vector_"):
                    vector_name = key.split("_", 1)[1]
                    vectors[vector_name] = np.asarray(src_obj[key])

            yield uuid, properties, vectors


def get_data_obj(model_type: Literal["ollama", "cohere"]):
    file_path = Path("data/twitter_customer_support_nomic.h5")
    if model_type == "cohere":
        file_path = Path("data/twitter_customer_support_cohere.h5")

    for uuid, properties, vectors in get_hdf5_obj(file_path):
        yield uuid, properties, vectors

### Load data

In [None]:
from tqdm import tqdm

tenant_names = ["AcmeCo", "Globex", "Initech", "UmbrellaCorp", "WayneEnterprises"]

with client.batch.fixed_size(batch_size=200) as batch:
    for uuid, properties, vectors in tqdm(get_data_obj(model_type)):

        # Assign a tenant to object based on the company author
        tenant_index = len(properties['company_author']) % 5
        tenant_name = tenant_names[tenant_index]

        # Add the object to the batch
        batch.add_object(
            collection=collection_name,
            uuid=uuid,
            properties=properties,
            vector={"text_with_metadata": vectors["text_with_metadata"]},
            tenant=tenant_name  # <===== This is the only line that changes during import
        )


In [None]:
print(f"Processed {len(client.batch.results.objs.all_responses)} objects.")

In [7]:
if len(client.batch.failed_objects) > 0:
    print("*" * 80)
    print(f"***** Failed to add {len(client.batch.failed_objects)} objects *****")
    print("*" * 80)
    print(client.batch.failed_objects[:3])

### Confirm data load

In [8]:
support_chats = client.collections.get(collection_name)

In [None]:
support_chats.tenants.get()


In [10]:
tenant_data = support_chats.with_tenant(tenant_names[0])

In [11]:
response = tenant_data.query.fetch_objects(limit=2, include_vector=True)

In [None]:
print(response.objects[0].uuid)

In [None]:
for k, v in response.objects[0].properties.items():
    print(f"\n|| {k} || \n{v}")

In [None]:
for k, v in response.objects[0].vector.items():
    print(k, v[:3])

### Queries

#### Helper function for displaying objects

In [15]:
def display_objects(response):
    for o in response.objects:
        print(o.uuid, "\n")
        print(o.properties["text"][:200], "\n")

In [None]:
response = tenant_data.query.near_text("return process", limit=3)
display_objects(response)

In [None]:
response = tenant_data.query.bm25("return process", limit=3)
display_objects(response)

In [None]:
response = tenant_data.query.hybrid("return process", limit=3)
display_objects(response)

In [19]:
response = tenant_data.generate.fetch_objects(
    limit=20,
    grouped_task="What patterns are we seeing here in these issues?"
)

In [None]:
print(response.generated)

## Example use cases

- Each end user (tenant) can upload & analyse their own data
- Analyse different aspects of their own support processes

In [21]:
response = tenant_data.generate.near_text(
    query="return process",
    limit=15,
    grouped_task="What are some of the problems our customers are having, and suggest areas to investigate for improvement.",
)

In [None]:
print(response.generated)

In [23]:
response = tenant_data.generate.near_text(
    query="phone battery",
    limit=15,
    grouped_task="What types of issues are our users having with their phone batteries?",
)

In [None]:
print(response.generated)

## Tenant management

Given that our "tenants" represent different end users, it would be useful to have a way to manage them.

What can we do when:

- A new user signs up?
- A user wants to delete their account?
- A user asks about data privacy?
- A user is inactive for a long time?

#### Tenant deletion

In [None]:
support_chats.tenants.get()

#### Tenant creation

In [26]:
from weaviate.classes.tenants import Tenant

support_chats.tenants.create(
    tenants=[
        Tenant(name="MarvellousCorp"),
        Tenant(name="InGenCompany"),
    ]
)

In [None]:
marvel_tenant = support_chats.with_tenant("MarvellousCorp")

some_objs = [
    {"text": "This comic is great", "dialogue_id": 123, "company_author": "Marvel"},
    {"text": "I am very excited about the new movie", "dialogue_id": 124, "company_author": "Marvel"},
]

marvel_tenant.data.insert_many(some_objs)

In [None]:
response = marvel_tenant.query.fetch_objects(limit=2)
for o in response.objects:
    print(o.properties["text"])

#### Tenant privacy

Can multiple tenants be queried at once?

In [None]:
response = support_chats.query.fetch_objects(limit=2)

print(response.objects)

#### Tenant state management



You can set tenant activity statues to manage their resource usage, and trade off between availability.

In [30]:
from weaviate.classes.tenants import Tenant, TenantActivityStatus

support_chats.tenants.update(tenants=[
    Tenant(
        name="UmbrellaCorp",
        activity_status=TenantActivityStatus.INACTIVE
    ),
    Tenant(
        name="Globex",
        activity_status=TenantActivityStatus.INACTIVE
    ),
    Tenant(
        name="WayneEnterprises",
        activity_status=TenantActivityStatus.INACTIVE
    ),
])

#### Tenant deletion

Off-boarding customers is super important, but easy with Weaviate. 

Deleting a tenant deletes all of the associated data.


### Demo application

- Outside of the notebook
