## Scenario 1 -  Single collection RAG

### **SupportPatterns** - Support Training & Education Platform

- Develops training materials and courses for customer support professionals
- Uses aggregated, anonymized support conversations to create realistic training scenarios

### Solution

Collect as much conversation data between support agents and customers as possible. 

Analyse this data to identify common patterns and develop training materials based on these patterns.

### Helper functions for downloads

In [1]:
from pathlib import Path
from typing import Literal


def download_datafiles(setup: Literal["ollama", "cohere"]):
    filepaths_set = {
        "ollama": (
            "https://weaviate-workshops.s3.eu-west-2.amazonaws.com/odsc-europe-2024/twitter_customer_support_weaviate_export_50000_nomic.h5",
            Path("data/twitter_customer_support_nomic.h5")
        ),
        "cohere": (
            "https://weaviate-workshops.s3.eu-west-2.amazonaws.com/odsc-europe-2024/twitter_customer_support_weaviate_export_50000_cohere-embed-multilingual-light-v3.0.h5",
            Path("data/twitter_customer_support_cohere.h5"),
        )
    }

    filepaths = filepaths_set[setup]

    if not filepaths[1].exists():
        print(f"Downloading {filepaths[0]}")
        filepaths[1].parent.mkdir(parents=True, exist_ok=True)
        import urllib.request
        urllib.request.urlretrieve(filepaths[0], filepaths[1])
    else:
        print(f"File already exists: {filepaths[1]}")
    return True

## AI Models

This workshop is set up for you to work with local, Ollama models, or API-based Cohere models. Follow either [Ollama](#ollama) or [Cohere](#cohere) instructions below.


In [None]:
!ollama pull nomic-embed-text && ollama pull gemma2:2b

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest 
pulling 7462734796d6... 100% ▕████████████████▏ 1.6 GB                         
pulling e0a42594d802... 100% ▕████████████████▏  358 B                         
pulling 097a36493f71... 100% ▕████████████████▏ 8.4 KB                         
pulling 2490e7468436... 100% ▕████████████████▏   65 B                         
pulling e18ad7af7efb... 100% ▕████████████████▏  487 B                         
verifying sha256 digest 
writing manifest 
success [?25h
[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% ▕████████████████▏ 274 MB                         
pulling c71d239df917... 100% ▕████████████████▏  11 KB         

In [None]:
download_datafiles("ollama")

model_type = "ollama"

File already exists: data/twitter_customer_support_nomic.h5


### Cohere 

To use the Cohere API for this workshop, run the below code cell to configure the variables:

In [4]:
download_datafiles("cohere")

model_type = "cohere"

File already exists: data/twitter_customer_support_cohere.h5



### Create the collection


In [None]:
from weaviate.classes.config import Configure

if model_type == "ollama":
    vectorizer_config = Configure.NamedVectors.text2vec_ollama(
        name="text_with_metadata",
        source_properties=["text", "company_author"],
        vector_index_config=Configure.VectorIndex.hnsw(),
        api_endpoint="http://host.docker.internal:11434",
        model="nomic-embed-text",
    )
    generative_config = Configure.Generative.ollama(
        api_endpoint="http://host.docker.internal:11434",
        model="gemma2:2b"
    )
else:
    vectorizer_config = Configure.NamedVectors.text2vec_cohere(
        name="text_with_metadata",
        source_properties=["text", "company_author"],
        vector_index_config=Configure.VectorIndex.hnsw(),
        model="embed-multilingual-light-v3.0",
    )

    generative_config = Configure.Generative.cohere(
        model="command-r-plus"
    )


In [5]:
import os
import weaviate
from weaviate.classes.config import Property, DataType, Configure
from dotenv import load_dotenv

load_dotenv()

client = weaviate.connect_to_local(
    headers={"X-Cohere-Api-Key": os.getenv("WORKSHOP_COHERE_KEY")}
)

collection_name = "SupportChat"

# For re-running the demo only: Delete existing collection if it exists
client.collections.delete(collection_name)

# Create a new collection with specified properties and vectorizer configuration
chunks = client.collections.create(
    name=collection_name,
    properties=[
        Property(name="text", data_type=DataType.TEXT),
        Property(name="dialogue_id", data_type=DataType.INT),
        Property(name="company_author", data_type=DataType.TEXT),
        Property(name="created_at", data_type=DataType.DATE),
    ],
    vectorizer_config=[vectorizer_config],
    generative_config=generative_config,
)

### Helper functions for loading data

In [6]:
import h5py
import json
import numpy as np
from typing import Literal
from pathlib import Path


def get_hdf5_obj(file_path):
    with h5py.File(file_path, "r") as hf:
        for uuid in hf.keys():
            src_obj = hf[uuid]

            # Get the object properties
            properties = json.loads(src_obj["object"][()])

            # Get the vector(s)
            vectors = {}
            for key in src_obj.keys():
                if key.startswith("vector_"):
                    vector_name = key.split("_", 1)[1]
                    vectors[vector_name] = np.asarray(src_obj[key])

            yield uuid, properties, vectors


def get_data_obj(model_type: Literal["ollama", "cohere"]):
    file_path = Path("data/twitter_customer_support_nomic.h5")
    if model_type == "cohere":
        file_path = Path("data/twitter_customer_support_cohere.h5")

    for uuid, properties, vectors in get_hdf5_obj(file_path):
        yield uuid, properties, vectors

### Load data

In [7]:
from tqdm import tqdm

with client.batch.fixed_size(batch_size=200) as batch:
    for uuid, properties, vectors in tqdm(get_data_obj(model_type)):
        batch.add_object(
            collection=collection_name,
            uuid=uuid,
            properties=properties,
            vector={"text_with_metadata": vectors["text_with_metadata"]},
        )

50000it [00:18, 2707.28it/s]


In [8]:
print(f"Processed {len(client.batch.results.objs.all_responses)} objects.")

Processed 50000 objects.




In [9]:
if len(client.batch.failed_objects) > 0:
    print("*" * 80)
    print(f"***** Failed to add {len(client.batch.failed_objects)} objects *****")
    print("*" * 80)
    print(client.batch.failed_objects[:3])

### Confirm data load

In [10]:
support_chats = client.collections.get(collection_name)

In [11]:
response = support_chats.query.fetch_objects(limit=2, include_vector=True)

In [12]:
print(response.objects[0].uuid)

00014ba3-ea82-524b-a45c-99b4153b74bc


In [76]:
for k, v in response.objects[0].properties.items():
    print(f"\n|| {k} || \n{v}")


|| created_at || 
2017-10-03 14:47:07+00:00

|| text || 
User_164303: NRC Feature Request: Support for interval training. 😀
NikeSupport: Here to help. Have you checked out the option to run a Speed run which is interval training?
User_164303: I have not. Will definitely check it out. Thanks!

|| company_author || 
NikeSupport

|| dialogue_id || 
204089


In [13]:
for k, v in response.objects[0].vector.items():
    print(k, v[:3])

text_with_metadata [-0.042388916015625, 0.041412353515625, -0.0445556640625]


### Queries

#### Helper function for displaying objects

In [23]:
def display_objects(response):
    for o in response.objects:
        print(o.uuid, "\n")
        print(o.properties["text"][:100], "\n")

In [15]:
response = support_chats.query.near_text("return process", limit=3)
display_objects(response)

31a21f1b-61aa-5c01-bcbf-6dce68b2cb5e 

User_119904: @115850 I have bought a product and now it's size is not matching I want to return it and also requested return process.
AmazonHelp: You may refer here: https://t.co/M27c4qF86m for detail 

3247d280-de61-515f-a388-caaac81770c8 

User_206228: please check the DM sent to @AmazonHelp abd revert.
AmazonHelp: Hi, we have responded to you via DM. Please refer. ^RD
User_206228: @AmazonHelp Your response is of no help to me.I have al 

32f8710d-8ea7-5e04-a1f2-4be40ce92dab 

User_207558: Second DOA harddrive from @NeweggService Starting the return process again :/ Trying from Best Buy this time
NeweggService: Let us know if you need any assistance with setting up your ret 



In [16]:
response = support_chats.query.bm25("return process", limit=3)
display_objects(response)

32f8710d-8ea7-5e04-a1f2-4be40ce92dab 

User_207558: Second DOA harddrive from @NeweggService Starting the return process again :/ Trying from Best Buy this time
NeweggService: Let us know if you need any assistance with setting up your ret 

31a21f1b-61aa-5c01-bcbf-6dce68b2cb5e 

User_119904: @115850 I have bought a product and now it's size is not matching I want to return it and also requested return process.
AmazonHelp: You may refer here: https://t.co/M27c4qF86m for detail 

0e2bf8ee-425e-5936-b703-dd9d4494f9dc 

User_234840: Hi, I was charged $1.00 while making a return shipping label online. Why was this the case?
UPSHelp: When creating an Electronic Return Label there is a $1.00 fee involved to process it.  



In [17]:
response = support_chats.query.hybrid("return process", limit=3)
display_objects(response)

31a21f1b-61aa-5c01-bcbf-6dce68b2cb5e 

User_119904: @115850 I have bought a product and now it's size is not matching I want to return it and also requested return process.
AmazonHelp: You may refer here: https://t.co/M27c4qF86m for detail 

32f8710d-8ea7-5e04-a1f2-4be40ce92dab 

User_207558: Second DOA harddrive from @NeweggService Starting the return process again :/ Trying from Best Buy this time
NeweggService: Let us know if you need any assistance with setting up your ret 

3247d280-de61-515f-a388-caaac81770c8 

User_206228: please check the DM sent to @AmazonHelp abd revert.
AmazonHelp: Hi, we have responded to you via DM. Please refer. ^RD
User_206228: @AmazonHelp Your response is of no help to me.I have al 



In [18]:
response = support_chats.generate.fetch_objects(
    limit=20,
    grouped_task="What patterns are we seeing here in these issues?"
)

In [19]:
print(response.generated)

Based on the provided data, there seem to be several patterns emerging from these customer issues:

- **Delivery and Shipping Issues**: Multiple users have raised concerns about late deliveries, missing packages, and issues with tracking. This suggests that delivery and shipping are areas where customers frequently encounter problems, and it might be beneficial to review and improve these processes.

- **Product or Service Quality**: Some customers have complained about the quality of products or services, such as in-flight meals, food delivery times, and technical issues with phone updates. This indicates that maintaining consistent product and service quality is crucial to customer satisfaction.

- **Account and Ordering Issues**: A few users have experienced problems with their accounts, such as issues with guest accounts or business accounts, which caused delays in resolving their concerns. This highlights the importance of having robust account management systems and user-friendly

## Example use cases

- Develop training materials
    - Investigate common patterns in support conversations
    - Identify common issues and resolutions

In [24]:
response = support_chats.generate.near_text(
    query="return process",
    limit=15,
    grouped_task="Describe some common problems that customers may complain about.",
)

In [25]:
print(response.generated)

Here is a list of common problems that customers may complain about, based on the provided data:

- Issues with returning products, including incorrect sizing, missing parts, or damaged items.
- Confusion or dissatisfaction with return shipping charges and processes.
- Delays in receiving refunds for returned items.
- Difficulty in arranging return pick-ups or drop-offs.
- Inability to provide order details or personal information over social media, leading to challenges in resolving issues.
- Miscommunication or dissatisfaction with customer support responses.
- Threats of legal action if issues are not resolved promptly or satisfactorily.
- Concerns about being charged for returns that were not their fault.
- Frustration with the time taken for returns to be processed and refunds to be issued.
- Inconsistent information or updates on the status of refunds from different support channels.

These complaints highlight potential areas of improvement for customer service and support proce

In [26]:
response = support_chats.generate.near_text(
    query="return process",
    limit=15,
    grouped_task="Describe some common problems that customers may complain about, and suggest top 5 training scenaior that may be useful for new support agents.",
)

In [27]:
print(response.generated)

Common customer complaints:
- Issues with returns and refunds: This is the most prevalent issue in the provided data. Customers are confused about the return process, unexpected charges for return shipping, and delays in receiving refunds. Some also express dissatisfaction with having to pay for return shipping.
- Delivery issues: Some customers complain about not receiving their orders or experiencing significant delays.
- Threatening legal action: A few customers, frustrated with the lack of response or resolution to their issues, threaten to take legal action or contact consumer courts.
- Inadequate or delayed responses from customer support: Several customers express frustration with the speed and effectiveness of the company's responses to their queries.
- Product quality issues: There are mentions of receiving defective or damaged products, missing parts, or products that do not match their descriptions.

Top 5 training scenarios for new support agents:
1. Return and Refund Proce

### Resource management

- How much memory are we using?
- How will this scale with more data?

## When to use this pattern

- Is any of the data isolated from the others?
- What use cases might not be covered by this architecture?


## Demo application

- Outside of the notebook
