In [1]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

## Basic RAG with Weaviate

Now - let's try performing RAG with the chunks that we've created. 

We will:
- Load & chunk a document
- Add the chunks to Weaviate, and generate vectors
- And perform RAG

We assume some familiarity with Weaviate here. 

(If not, check out the [Weaviate Quickstart](https://docs.weaviate.io/weaviate/quickstart), or ask questions in the live session!)

### Load and chunk a document

In [2]:
from pathlib import Path

def get_chunks_using_markers(src_text: str) -> list[str]:
    """
    Split the source text into chunks using markers.
    """
    marker = "\n##"

    # Split by marker and reconstruct with markers (except first chunk)
    parts = src_text.split(marker)
    chunks = []

    # Add first chunk if it exists and isn't empty
    if parts[0].strip():
        chunks.append(parts[0].strip())

    # Add remaining chunks with markers reattached
    for part in parts[1:]:
        if part.strip():
            chunks.append(marker + part.strip())

    return chunks


md_file = Path("data/parsed/manual_bosch_WGG254Z0GR-parsed-text.md")
md_text = md_file.read_text(encoding="utf-8")
chunk_texts = get_chunks_using_markers(md_text)

### Set up Weaviate

In [3]:
import weaviate
import os

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.getenv("WEAVIATE_URL"),
    auth_credentials=os.getenv("WEAVIATE_API_KEY"),
    headers={
        "X-Cohere-Api-Key": os.getenv("COHERE_API_KEY"),
    },
)



### Set up a collection

In [4]:
client.collections.delete("Chunks")

In [5]:
from weaviate.classes.config import Property, DataType, Configure, Tokenization

client.collections.create(
    name="Chunks",
    properties=[
        Property(
            name="document_title",
            data_type=DataType.TEXT,
        ),
        Property(
            name="chunk",
            data_type=DataType.TEXT,
        ),
        Property(
            name="chunk_number",
            data_type=DataType.INT,
        ),
        Property(
            name="filename",
            data_type=DataType.TEXT,
            tokenization=Tokenization.FIELD
        ),
    ],
    vector_config=[
        # Add `Configure.Vectors.text2vec_cohere` vector to the collection with:
        # name: "default", source properties: ["document_title", "chunk"], and model: "embed-v4.0"
        # BEGIN_SOLUTION
        Configure.Vectors.text2vec_cohere(
            name="default",
            source_properties=["document_title", "chunk"],
            model="embed-v4.0"
        )
        # END_SOLUTION
    ],
    generative_config=Configure.Generative.cohere()
)

<weaviate.collections.collection.sync.Collection at 0x105e3bd60>

### Import data

In [6]:
chunks = client.collections.get("Chunks")

In [7]:
from tqdm import tqdm

with chunks.batch.fixed_size(batch_size=100) as batch:
    for i, chunk_text in tqdm(enumerate(chunk_texts)):
        obj = {
            "document_title": "Bosch WGG254Z0GR Manual",
            "filename": "data/pdfs/manual_bosch_WGG254Z0GR.pdf",
            "chunk": chunk_text,
            "chunk_number": i + 1,
        }

        # Add object to batch for import with (batch.add_object())
        # BEGIN_SOLUTION
        batch.add_object(
            properties=obj
        )
        # END_SOLUTION

127it [00:00, 54488.20it/s]


### RAG queries



In [8]:
# Try a RAG query with:
# query (what to search for): "how to clean the washing machine" and
# grouped_task (prompt): "Briefly, what tasks do I need to perform to regularly maintain and clean the washing machine?"
# limit (how many objects to fetch): 10
# BEGIN_SOLUTION
response = chunks.generate.near_text(
    query="how to clean the washing machine",
    limit=10,
    grouped_task="Briefly, what tasks do I need to perform to regularly maintain and clean the washing machine?"
)
# END_SOLUTION

print("Query response:")
print(response.generative.text)

Query response:
Here's a list of tasks to maintain and clean your washing machine:

1. Run an empty wash cycle periodically to clean the drum using a bleach-containing detergent. This helps prevent damage to the drum from low-temperature washing and lack of ventilation.

2. Clean the detergent drawer by removing and rinsing it with water. Ensure to clean the opening for the drawer as well.

3. Leave the appliance door and detergent drawer open after each use to allow residual water to evaporate.

4. Brush off sand and soil from laundry before washing. Sort and prepare your laundry appropriately.

5. Clean the drain pump by unscrewing the pump cap and removing any dirt and debris. Ensure the impeller can rotate freely.

6. Wipe down the rubber gasket around the door to remove any foreign objects and dry it.

7. Run a draining program after each wash to prevent unused detergent from flowing into the outlet.



### Recap - what's happening under the hood

![assets/llm_3_rag_weaviate.png](assets/llm_3_rag_weaviate.png)

We can review the passages:

In [9]:
print("Supporting passages:")
for o in response.objects:
    print(f"\n> Object: {o.uuid}:")
    print(o.properties['chunk'][:200]+"...")

Supporting passages:

> Object: 2cdff00f-7a22-49ab-9f70-015503ce52af:

##6.1 Starting an empty washing cycle

Your appliance was inspected thoroughly before leaving the factory. To remove any residual water, run the first wash cycle without any laundry.

1. Turn the pro...

> Object: 5b5f1a47-5aec-45bd-86df-12fc6403b52c:

##17.2 Cleaning the detergent drawer

1. Pull out the detergent drawer.
2. Press down on the insert and remove the detergent drawer.

<!-- image -->

- en Cleaning and servicing
3. Pull out the inser...

> Object: c5984496-bdff-4a9c-9c1d-3c93b4456e23:

##Risk of injury!

Permanently washing at low temperatures and a lack of ventilation for the appliance may damage the drum and cause injury.

- Regularly run a programme for cleaning the drum or wash...

> Object: b6c69941-c03f-4127-9aa9-0afb9333f1db:

##17.1 Cleaning the drum

<!-- image -->...

> Object: f044bd00-bd51-47fb-b43c-636bae58c429:

##Note

The appliance and fabrics are protected when you prepare your laundr

In [10]:
client.close()