## Scenario 2 -  Multi-tenancy

### **SupportWizard** - Support Analyis SaaS Platform

- Allow its users to sign up and upload their own customer support data
- They would use the platform to information to identify where they could improve their support processes

### Solution 

Each end user will have their own isolated "space", to which they can uplaod data. Then, they can use SupportWizard dashboards / platform to see analyses of their own data. 

## Set your preferred model type here

In [1]:
# model_type = "ollama"
model_type = "cohere"

### Then, run the cell below

In [2]:
from weaviate.classes.config import Configure

if model_type == "ollama":
    vectorizer_config = Configure.NamedVectors.text2vec_ollama(
        name="text_with_metadata",
        source_properties=["text", "company_author"],
        vector_index_config=Configure.VectorIndex.hnsw(),
        api_endpoint="http://host.docker.internal:11434",
        model="nomic-embed-text",
    )
    generative_config = Configure.Generative.ollama(
        api_endpoint="http://host.docker.internal:11434",
        model="gemma2:2b"
    )
else:
    vectorizer_config = Configure.NamedVectors.text2vec_cohere(
        name="text_with_metadata",
        source_properties=["text", "company_author"],
        vector_index_config=Configure.VectorIndex.hnsw(),
        model="embed-multilingual-light-v3.0",
    )

    generative_config = Configure.Generative.cohere(
        model="command-r-plus"
    )



### Create the collection


In [3]:
import os
import weaviate
from weaviate.classes.config import Property, DataType, Configure
from dotenv import load_dotenv

load_dotenv()

client = weaviate.connect_to_local(
    headers={"X-Cohere-Api-Key": os.getenv("WORKSHOP_COHERE_KEY")}
)

collection_name = "SupportChat"

# For re-running the demo only: Delete existing collection if it exists
client.collections.delete(collection_name)

# Create a new collection with specified properties and vectorizer configuration
chunks = client.collections.create(
    name=collection_name,
    properties=[
        Property(name="text", data_type=DataType.TEXT),
        Property(name="dialogue_id", data_type=DataType.INT),
        Property(name="company_author", data_type=DataType.TEXT),
        Property(name="created_at", data_type=DataType.DATE),
    ],
    vectorizer_config=[vectorizer_config],
    generative_config=generative_config,
    # ============================================================
    # ⬇️⬇️ This is the only change from the previous script ⬇️⬇️
    # ============================================================
    # STUDENT TODO: Add multi-tenancy configuration with Configure.multi_tenancy
    # ============================================================
    # Enable multi-tenancy and auto-tenant creation
    # multi_tenancy_config=
)

### Helper functions for loading data

In [4]:
import h5py
import json
import numpy as np
from typing import Literal
from pathlib import Path


def get_hdf5_obj(file_path):
    with h5py.File(file_path, "r") as hf:
        for uuid in hf.keys():
            src_obj = hf[uuid]

            # Get the object properties
            properties = json.loads(src_obj["object"][()])

            # Get the vector(s)
            vectors = {}
            for key in src_obj.keys():
                if key.startswith("vector_"):
                    vector_name = key.split("_", 1)[1]
                    vectors[vector_name] = np.asarray(src_obj[key])

            yield uuid, properties, vectors


def get_data_obj(model_type: Literal["ollama", "cohere"]):
    file_path = Path("data/twitter_customer_support_nomic.h5")
    if model_type == "cohere":
        file_path = Path("data/twitter_customer_support_cohere.h5")

    for uuid, properties, vectors in get_hdf5_obj(file_path):
        yield uuid, properties, vectors

### Load data

In [None]:
from tqdm import tqdm

tenant_names = ["AcmeCo", "Globex", "Initech", "UmbrellaCorp", "WayneEnterprises"]

with client.batch.fixed_size(batch_size=200) as batch:
    for uuid, properties, vectors in tqdm(get_data_obj(model_type)):

        # Assign a tenant to object based on the company author
        tenant_index = len(properties['company_author']) % 5
        tenant_name = tenant_names[tenant_index]

        # Add the object to the batch
        batch.add_object(
            collection=collection_name,
            uuid=uuid,
            properties=properties,
            vector={"text_with_metadata": vectors["text_with_metadata"]},

            # ============================================================
            # STUDENT TODO - Add data to a specific tenant
            # ============================================================
            # tenant=
        )


In [None]:
print(f"Processed {len(client.batch.results.objs.all_responses)} objects.")

In [7]:
if len(client.batch.failed_objects) > 0:
    print("*" * 80)
    print(f"***** Failed to add {len(client.batch.failed_objects)} objects *****")
    print("*" * 80)
    print(client.batch.failed_objects[:3])

### Confirm data load

In [8]:
support_chats = client.collections.get(collection_name)

In [None]:
support_chats.tenants.get()


In [10]:
# Instantiate a tenant object - analogous to a collection object in a single-tenant environment
tenant_data = support_chats.with_tenant(tenant_names[0])

In [11]:
# STUDENT TODO:
# Fetch the first two objects from the tenant with the vector included
# Hint - use the 'query.fetch_objects' method with the 'limit' and 'include_vector' parameters

In [None]:
# STUDENT TODO:
# Print the UUID of the first object in the response
# Hint - The response will have an `.objects` attribute which is a list of objects

In [None]:
# STUDENT TODO:
# Inspect the properties of the first object in the response
# Hint - the object will have a 'properties' attribute which is a dictionary of properties

In [None]:
# STUDENT TODO:
# Inspect the first few dimensions of the object's vector
# HINT - the object will have a 'vector' attribute which is a dictionary of vectors

### Queries

#### Helper function for displaying objects

In [15]:
def display_objects(response):
    for o in response.objects:
        print(o.uuid, "\n")
        print(o.properties["text"][:200], "\n")

In [None]:
# Near text search: Semantic search example
response = tenant_data.query.near_text("return process", limit=3)
display_objects(response)

In [None]:
# STUDENT TODO:
# Run a `bm25` query with the search term "return process" and a limit of 3, and display the results
# Hint - start with the previous cell, and vary the query method

In [None]:
# STUDENT TODO:
# Run a `hybrid` query with the same parameters and display the results
# Hint - start with the previous cell, and vary the query method

In [19]:
# Generative search (RAG) example
response = tenant_data.generate.fetch_objects(
    limit=20,
    grouped_task="What patterns are we seeing here in these issues?"
)

In [None]:
print(response.generated)

## Example use cases

- Each end user (tenant) can upload & analyse their own data
- Analyse different aspects of their own support processes

In [21]:
# How might our example end user use these capabilties?
# What types of RAG queries would be useful for them?

In [None]:
print(response.generated)

In [23]:
# Student TODO:
# Try your own `grouped_task` query with a different question

In [None]:
print(response.generated)

## Tenant management

Given that our "tenants" represent different end users, it would be useful to have a way to manage them.

What can we do when:

- A new user signs up?
- A user wants to delete their account?
- A user asks about data privacy?
- A user is inactive for a long time?

#### Retrieve Tenant names

In [None]:
# Student TODO:
# Fetch a list of all tenants in the collection
# Hint - start with the collection object, and look in the `tenants` namespace

#### Tenant creation

We can create new tenants at any time

In [26]:
from weaviate.classes.tenants import Tenant

support_chats.tenants.create(
    tenants=[
        Tenant(name="MarvellousCorp"),
        Tenant(name="InGenCompany"),
    ]
)

And ingest data into each tenant's shard

In [None]:
marvel_tenant = support_chats.with_tenant("MarvellousCorp")

some_objs = [
    {"text": "This comic is great", "dialogue_id": 123, "company_author": "Marvel"},
    {"text": "I am very excited about the new movie", "dialogue_id": 124, "company_author": "Marvel"},
]

marvel_tenant.data.insert_many(some_objs)

In [None]:
# STUDENT TODO:
# Fetch the first two objects from the new tenant - and inspect the results - they should be the objects you just added

#### Tenant privacy

Can multiple tenants be queried at once?

In [None]:
# STUDENT TODO:
# Try to fetch the first two objects from the entire collection - what happens?

In a multi-tenant collection, each tenant's data is isolated from the others. This means that a query will only return data from the tenant that the user is querying.

You cannot query multiple tenants at once. This ensures that each tenant's data is kept private.

It also means that if you want to be able to query the entire collection, multi-tenancy may not be the best choice for your use case.

#### Tenant state management



You can set tenant activity statues to manage their resource usage, and trade off between availability.

In [30]:
from weaviate.classes.tenants import Tenant, TenantActivityStatus

# STUDENT TODO:
# Set the activity status of the following tenants to INACTIVE
# UmbrellaCorp, Globex, WayneEnterprises
# Hint - use the `TenantActivityStatus` enum. The first example is partly completed for you

support_chats.tenants.update(tenants=[
    Tenant(
        name="UmbrellaCorp",
        # activity_status=
    ),
    # Globex
    # WayneEnterprises
])

In [None]:
# STUDENT TODO:
# Try to fetch the first two objects from one of our updated tenants (e.g. UmbrellaCorp) - what happens?

In [32]:
# STUDENT TODO:
# Update the activity status of one of the INACTIVE tenants to ACTIVE

In [None]:
# STUDENT TODO:
# Now, try to fetch the first two objects from the tenant you just updated - what happens?

#### Tenant deletion

Off-boarding customers is super important, but easy with Weaviate. 

Deleting a tenant deletes all of the associated data.


### Demo application

- Outside of the notebook
