## Scenario 1 -  Single collection RAG

### **SupportPatterns** - Support Training & Education Platform

- Develops training materials and courses for customer support professionals
- Uses aggregated, anonymized support conversations to create realistic training scenarios

### Solution

Collect as much conversation data between support agents and customers as possible. 

Analyse this data to identify common patterns and develop training materials based on these patterns.

In [12]:
from pathlib import Path
from typing import Literal


def download_datafiles(setup: Literal["ollama", "cohere"]):
    filepaths_set = {
        "ollama": (
            "https://weaviate-workshops.s3.eu-west-2.amazonaws.com/odsc-europe-2024/twitter_customer_support_weaviate_export_50000_nomic.h5",
            Path("data/twitter_customer_support_nomic.h5")
        ),
        "cohere": (
            "https://weaviate-workshops.s3.eu-west-2.amazonaws.com/odsc-europe-2024/twitter_customer_support_weaviate_export_50000_cohere-embed-multilingual-light-v3.0.h5",
            Path("data/twitter_customer_support_cohere.h5"),
        )
    }

    filepaths = filepaths_set[setup]

    if not filepaths[1].exists():
        print(f"Downloading {filepaths[0]}")
        import urllib.request
        urllib.request.urlretrieve(filepaths[0], filepaths[1])
    else:
        print(f"File already exists: {filepaths[1]}")
    return True

## AI Models

This workshop is set up for you to work with local, Ollama models, or API-based Cohere models. Follow either [Ollama](#ollama) or [Cohere](#cohere) instructions below.


### Ollama 

To use Ollama for this workshop, run the below code cells to download the models, and configure the variables:

In [8]:
!ollama pull gemma2:2b && ollama pull nomic-embed-text

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest 
pulling 7462734796d6... 100% ▕████████████████▏ 1.6 GB                         
pulling e0a42594d802... 100% ▕████████████████▏  358 B                         
pulling 097a36493f71... 100% ▕████████████████▏ 8.4 KB                         
pulling 2490e7468436... 100% ▕████████████████▏   65 B                         
pulling e18ad7af7efb... 100% ▕████████████████▏  487 B                         
verifying sha256 digest 
writing manifest 
success [?25h
[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% ▕████████████████▏ 274 MB                         
pulling c71d239df917... 100% ▕████████████████▏  11 KB                         
pulling ce4a164fc046..

In [13]:
from weaviate.classes.config import Configure

vectorizer_config = Configure.NamedVectors.text2vec_ollama(
    name="text_with_metadata",
    source_properties=["text", "company_author"],
    vector_index_config=Configure.VectorIndex.hnsw(),
    api_endpoint="http://host.docker.internal:11434",
    model="nomic-embed-text",
),
generative_config = Configure.Generative.ollama(
    api_endpoint="http://host.docker.internal:11434",
    model="gemma2:2b"
)

download_datafiles("ollama")

File already exists: data/twitter_customer_support_nomic.h5


True

### Cohere 

To use the Cohere API for this workshop, run the below code cell to configure the variables:

In [14]:
from weaviate.classes.config import Configure

vectorizer_config = Configure.NamedVectors.text2vec_cohere(
    name="text_with_metadata",
    source_properties=["text", "company_author"],
    vector_index_config=Configure.VectorIndex.hnsw(),
    model="embed-multilingual-light-v3.0",
),
generative_config = Configure.Generative.cohere(
    model="command-r-plus"
)

download_datafiles("cohere")

File already exists: data/twitter_customer_support_cohere.h5


True


### Create the collection


In [15]:
import os
import weaviate
from weaviate.classes.config import Property, DataType, Configure
from dotenv import load_dotenv

load_dotenv()

client = weaviate.connect_to_local(
    headers={"X-Cohere-Api-Key": os.getenv("WORKSHOP_COHERE_KEY")}
)

collection_name = "SupportChat"

# For re-running the demo only: Delete existing collection if it exists
client.collections.delete(collection_name)

# Create a new collection with specified properties and vectorizer configuration
chunks = client.collections.create(
    name=collection_name,
    properties=[
        Property(name="text", data_type=DataType.TEXT),
        Property(name="dialogue_id", data_type=DataType.INT),
        Property(name="company_author", data_type=DataType.TEXT),
        Property(name="created_at", data_type=DataType.DATE),
    ],
    vectorizer_config=[vectorizer_config],
    generative_config=generative_config,
)

WeaviateConnectionError: Connection to Weaviate failed. Details: Error: All connection attempts failed. 
Is Weaviate running and reachable at http://localhost:8080?

In [90]:
import h5py
import json
import numpy as np


def get_hdf5_obj(file_path):
    with h5py.File(file_path, "r") as hf:
        for uuid in hf.keys():
            src_obj = hf[uuid]

            # Get the object properties
            properties = json.loads(src_obj["object"][()])

            # Get the vector(s)
            vectors = {}
            for key in src_obj.keys():
                if key.startswith("vector_"):
                    vector_name = key.split("_", 1)[1]
                    vectors[vector_name] = np.asarray(src_obj[key])

            yield uuid, properties, vectors

In [91]:
from tqdm import tqdm

with client.batch.fixed_size(batch_size=200) as batch:
    for uuid, properties, vectors in tqdm(get_hdf5_obj("data/twitter_customer_support_nomic.h5")):
        batch.add_object(
            collection=collection_name,
            uuid=uuid,
            properties=properties,
            vector={"text_with_metadata": vectors["text_with_metadata"]},
        )

50000it [00:22, 2239.12it/s]


In [71]:
print(f"Processed {len(client.batch.results.objs.all_responses)} objects.")

Processed 50000 objects.




In [72]:
if len(client.batch.failed_objects) > 0:
    print("*" * 80)
    print(f"***** Failed to add {len(client.batch.failed_objects)} objects *****")
    print("*" * 80)
    print(client.batch.failed_objects[:3])

In [73]:
support_chats = client.collections.get(collection_name)

            Please make sure to close the connection using `client.close()`.


In [74]:
response = support_chats.query.fetch_objects(limit=2, include_vector=True)

In [75]:
print(response.objects[0].uuid)

00014ba3-ea82-524b-a45c-99b4153b74bc


In [76]:
for k, v in response.objects[0].properties.items():
    print(f"\n|| {k} || \n{v}")


|| created_at || 
2017-10-03 14:47:07+00:00

|| text || 
User_164303: NRC Feature Request: Support for interval training. 😀
NikeSupport: Here to help. Have you checked out the option to run a Speed run which is interval training?
User_164303: I have not. Will definitely check it out. Thanks!

|| company_author || 
NikeSupport

|| dialogue_id || 
204089


In [77]:
for k, v in response.objects[0].vector.items():
    print(k, v[:3])

text_with_metadata [-0.052056338638067245, 2.2214295864105225, -3.808542490005493]


### Queries

In [78]:
def display_objects(response):
    for o in response.objects:
        print(o.uuid, "\n")
        print(o.properties["text"][:200], "\n")

In [79]:
response = support_chats.query.near_text("return process", limit=3)
display_objects(response)

03975da7-7526-5ff2-8d7e-4035dfa73451 

User_227758: hello how long do refunds usually take please tracker says you received phone back on the 28th.thankyou
O2: Hi Chris 👋 Are you returning an order and waiting for a refund? Please chat wit 

3e350821-8e26-5746-8769-408223cef412 

User_187419: I've returned one product through self ship for which i beared courier charges. What is the procedure to take refund of the same?
AmazonHelp: Kindly reach out to our support team here: ht 

0de3be60-85ab-5011-a278-a5aa857992f5 

User_178546: Hi, how long do returns usually take to get back to you? Tracking status hasn't changed for 2 days on that pass my parcel site
AmazonHelp: After the carrier receives the item, it can take 



In [80]:
response = support_chats.query.bm25("return process", limit=3)
display_objects(response)

32f8710d-8ea7-5e04-a1f2-4be40ce92dab 

User_207558: Second DOA harddrive from @NeweggService Starting the return process again :/ Trying from Best Buy this time
NeweggService: Let us know if you need any assistance with setting up your ret 

31a21f1b-61aa-5c01-bcbf-6dce68b2cb5e 

User_119904: @115850 I have bought a product and now it's size is not matching I want to return it and also requested return process.
AmazonHelp: You may refer here: https://t.co/M27c4qF86m for detail 

0e2bf8ee-425e-5936-b703-dd9d4494f9dc 

User_234840: Hi, I was charged $1.00 while making a return shipping label online. Why was this the case?
UPSHelp: When creating an Electronic Return Label there is a $1.00 fee involved to process it.  



In [81]:
response = support_chats.query.hybrid("return process", limit=3)
display_objects(response)

31a21f1b-61aa-5c01-bcbf-6dce68b2cb5e 

User_119904: @115850 I have bought a product and now it's size is not matching I want to return it and also requested return process.
AmazonHelp: You may refer here: https://t.co/M27c4qF86m for detail 

03975da7-7526-5ff2-8d7e-4035dfa73451 

User_227758: hello how long do refunds usually take please tracker says you received phone back on the 28th.thankyou
O2: Hi Chris 👋 Are you returning an order and waiting for a refund? Please chat wit 

0de3be60-85ab-5011-a278-a5aa857992f5 

User_178546: Hi, how long do returns usually take to get back to you? Tracking status hasn't changed for 2 days on that pass my parcel site
AmazonHelp: After the carrier receives the item, it can take 



In [85]:
response = support_chats.generate.fetch_objects(
    limit=20,
    grouped_task="What patterns are we seeing here in these issues?"
)

In [86]:
print(response.generated)

Based on the provided data, there seem to be several patterns emerging from these customer issues:

- **Delivery and Shipping Issues**: Multiple users have raised concerns about late deliveries, missing packages, and issues with tracking. This suggests that delivery and shipping are areas where customers frequently encounter problems, and it might be beneficial to review and improve these processes.

- **Product or Service Quality**: Some customers have expressed dissatisfaction with the quality of products or services, such as in-flight meals, food orders, and technical issues with phone updates. Addressing these concerns could enhance customer satisfaction and loyalty.

- **Customer Service Accessibility**: A few users mentioned difficulties in reaching customer support or receiving timely responses. This indicates that there might be room for improvement in the accessibility and responsiveness of customer service channels.

- **Account and Ordering Issues**: There are instances wher

In [87]:
from weaviate.classes.query import Filter

response = support_chats.generate.fetch_objects(
    limit=20,
    grouped_task="What patterns are we seeing here in these issues?",
    filters=Filter.by_property("company_author").equal("AmazonHelp")
)

In [88]:
print(response.generated)

Based on the provided data, here are some patterns that can be observed:

- Many of the issues revolve around delivery, including delayed deliveries, missing packages, and incorrect delivery status updates.
- There are also several instances of customers reporting problems with refunds, including not receiving expected refunds and double charges on their accounts.
- Some customers have reported issues with making purchases, such as being unable to apply discounts or having difficulty checking out.
- In a few cases, customers have complained about the behavior or responsiveness of Amazon's customer support team, including accusations of being rude or unhelpful.
- The data also suggests that there may be some language barriers in customer support, as some conversations are in languages other than English.
- Finally, it appears that Amazon's support team often directs customers to fill out forms or contact specific teams to resolve their issues.

These patterns can provide insights for Am


### Demo application

- Outside of the notebook
