# Programming Assignment: Improving a RAG System

---

You've made it to the final assignment of the RAG course—congrats on reaching this stage! Now that you’ve got a handle on building RAG systems, it's time to shift gears and make them even better. In this assignment, you’ll dive into:

1. **Cost Measurement**: Figure out the potential costs of running a RAG application.
2. **Prompt Improvement**: Enhance your prompts to speed up response times, while finding the right balance between time, performance, and cost.
3. **Tracing System**: Set up a system to keep track of the inputs and outputs during interactions with the RAG system.

These tasks are in your hands now. Good luck, and have fun with it!

---
<h4 style="color:white; font-weight:bold;">USING THE TABLE OF CONTENTS</h4>

JupyterLab provides an easy way for you to navigate through your assignment. It's located under the Table of Contents tab, found in the left panel, as shown in the picture below.

![TOC Location](images/toc.png)

---

<h4 style="color:green; font-weight:bold;">TIPS FOR SUCCESSFUL GRADING OF YOUR ASSIGNMENT:</h4>

- All cells are frozen except for the ones where you need to submit your solutions or when explicitly mentioned you can interact with it.

- You can add new cells to experiment but these will be omitted by the grader, so don't rely on newly created cells to host your solution code, use the provided places for this.

- Avoid using global variables unless you absolutely have to. The grader tests your code in an isolated environment without running all cells from the top. As a result, global variables may be unavailable when scoring your submission. Global variables that are meant to be used will be defined in UPPERCASE.

- - To submit your notebook for grading, first save it by clicking the 💾 icon on the top left of the page and then click on the <span style="background-color: blue; color: white; padding: 3px 5px; font-size: 16px; border-radius: 5px;">Submit assignment</span> button on the top right of the page.
---


# Table of Contents
- [ 1 - Introduction: Your Role at Fashion Forward Hub](#1)
  - [ 1.1 Importing the libraries](#1-1)
  - [ 1.2 Loading the Weaviate client](#1-2)
  - [ 1.3 Preparing the Tracing with Phoenix](#1-3)
  - [ 1.4 (Optional) Setting model cost per token](#1-4)
- [ 2 - A Quick Recap on the Database Structure](#2)
  - [ 2.1 Products Database](#2-1)
  - [ 2.2 FAQ Database](#2-2)
- [ 3 - Recap on LLM calls and new output](#3)
  - [ 3.1 Function to generate the parameters dictionary](#3-1)
- [ 4 - Improving task handling](#4)
  - [ 4.1 Refactoring the function to decide whether it is an FAQ or product-related question](#4-1)
    - [ Exercise 1](#ex01)
  - [ 4.2 Answering a FAQ question](#4-2)
  - [ 4.3 Querying on FAQ](#4-3)
    - [ Exercise 2](#ex02)
  - [ 4.4 Improving the Decision Between Creative or Technical Product Queries](#4-4)
    - [ Exercise 3](#ex03)
  - [ 4.5 Retrieving the parameters for a given task](#4-5)
- [ 5 - Retrieving Items Based on Metadata from a Query](#5)
  - [ 5.1 Generate metadata](#5-1)
  - [ 5.2 Loading the Weaviate Product Collection](#5-2)
  - [ 5.3 Filtering by Metadata](#5-3)
    - [ Exercise 4](#ex04)
  - [ 5.4 Generating the retrieved items as context](#5-4)
  - [ 5.5 Query on Products](#5-5)
- [ 6 - The final function! ](#6)
  - [ 6.1 The function to rule them all](#6-1)
- [ 7 - The ChatBot](#7)


<a id='1'></a>
## 1 - Introduction: Your Role at Fashion Forward Hub
---

Congratulations on the success of your ChatBot at Fashion Forward Hub! Customers are thrilled with its functionality, which has significantly reduced calls to customer service and increased sales by providing detailed information and personalized fashion suggestions. However, success has brought some challenges that need your attention:

1. **Rising Costs**: As more customers use the ChatBot, operating costs have increased significantly. This issue has caught the attention of directors, who need better cost monitoring. Currently, there's no system in place to identify where these costs are originating.

2. **High Response Times**: Occasionally, response times are too long for certain user queries, leading to customer dissatisfaction.

As the RAG expert responsible for this ChatBot, your next task is to address these challenges by:

1. Monitoring and controlling costs and response times effectively.
2. Enhancing prompts to strike a balance between cost, performance, and speed.

To accomplish these goals, you will need to develop new functions and refine existing ones in the RAG framework.

<a id='1-1'></a>
### 1.1 Importing the libraries



In [1]:
import json
from weaviate.classes.query import Filter
import weaviate
import joblib
import pandas as pd

In [2]:
import flask_app
import weaviate_server
import unittests
import json
from utils import (
    ChatWidget, 
    generate_with_single_input,
    parse_json_output,
    get_filter_by_metadata,
    generate_filters_from_query,
    process_and_print_query,
    print_properties,
    make_url
)

 * Serving Flask app 'flask_app'
 * Debug mode: off


<a id='1-2'></a>
### 1.2 Loading the Weaviate client

In this assignment you will use again the Weaviate API to load the vector database. Do not worry, you won't need to load the database. It is already given to you!

In [3]:
client = weaviate.connect_to_local(port=8079, grpc_port=50050)

<a id='1-3'></a>
### 1.3 Preparing the Tracing with Phoenix

Now you'll load the necessary libraries and setup the telemetry to be able to add the telemetry to your ChatBot! You can learn more about tracing with Phoenix in [this ungraded lab](https://www.coursera.org/learn/retrieval-augmented-generation-rag/item/zgDvm)! Don't worry, you won't be graded in the tracing part!

In [4]:
import phoenix as px
from phoenix.otel import register
from opentelemetry.trace import Status, StatusCode

In [5]:
# Launch the lab and the URL
make_url()
session = px.launch_app()

[1mFOLLOW THIS URL TO OPEN THE UI: http://emffpxgwdflg.labs.coursera.org[0m
🌍 To view the Phoenix app in your browser, visit http://localhost:6006/
📖 For more information on how to use Phoenix, check out https://arize.com/docs/phoenix


<a id='1-4'></a>
### 1.4 (Optional) Setting model cost per token

This part is optional, so you can ignore it if you want to. However, here you will setup in Phoenix the two models that will be used in this assignment:

- meta-llama/Llama-3.2-3B-Instruct-Turbo
- meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo

Alongside with their cost per million of tokens. This will allow you to see the cost of each operation. 

**NOTE**: For illustration purposes, let's assume a cost of **1000 USD** per million tokens for `meta-llama/Llama-3.2-3B-Instruct-Turbo` and **2000 USD** for `meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo`. The real cost per token for these models is **MUCH LOWER** than this (together.ai offers 0.08 USD per million tokens for `meta-llama/Llama-3.2-3B-Instruct-Turbo`, for instance). However, since this is a small project, a higher value will help you visualize the costs better, as you will generally be dealing with a low token usage.

1. Access the settings in the UI here:

In [6]:
make_url("/settings/models")

[1mFOLLOW THIS URL TO OPEN THE UI: http://emffpxgwdflg.labs.coursera.org/settings/models[0m


![iu 1](images/settings_1.png)

2. Then click on Add Model:

![UI 1](images/create_model_1.png)

3. In this page, add the following:

- Model Name: meta-llama/Llama-3.2-3B-Instruct-Turbo
- Name Pattern: meta-llama/Llama-3.2-3B-Instruct-Turbo
- 1000 USD for input and 1000 USD for output tokens

![UI 2](images/create_model_2.png)

Then click in Create Model

4. Again, click in create model and insert:

- Model Name: meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
- Name Pattern: meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
- 2000 USD for input and 2000 USD for output tokens

Then you're ready to go!

In [7]:
# Setting up the telemetry

phoenix_project_name = "chatbot"

# With phoenix, we just need to register to get the tracer provider with the appropriate endpoint. 
# Different from the ungraded lab, you will NOT use auto_instrument = True, as there are LLM calls not needed to be traced (examples, calls within unittests etc.)

tracer_provider_phoenix = register(project_name=phoenix_project_name, endpoint="http://127.0.0.1:6006/v1/traces")

# Retrieve a tracer for manual instrumentation
tracer = tracer_provider_phoenix.get_tracer(__name__)

🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: chatbot
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: http://127.0.0.1:6006/v1/traces
|  Transport: HTTP + protobuf
|  Transport Headers: {}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



<a id='2'></a>
## 2 - A Quick Recap on the Database Structure
---

Let's have a quick recap on both the FAQ and Products databases.

Recall:
- Product database: Contains the products and their information.
- FAQ database: Contains the FAQ data.

<a id='2-1'></a>
### 2.1 Products Database

Let's explore the products database that Fashion Forward Hub has available. To make it easier to understand, let's load it as a list of JSON files first.

In [8]:
# Loading products data
products_data = joblib.load('dataset/clothes_json.joblib')

In [9]:
# Let's get one example
products_data[0]

{'gender': 'Men',
 'masterCategory': 'Apparel',
 'subCategory': 'Topwear',
 'articleType': 'Shirts',
 'baseColour': 'Navy Blue',
 'season': 'Fall',
 'year': 2011,
 'usage': 'Casual',
 'productDisplayName': 'Turtle Check Men Navy Blue Shirt',
 'price': 67.0,
 'product_id': 15970}

The features each product has are:

- **Gender:** Target audience for the product, such as "Men," "Women," or "Unisex."
- **Master Category:** Broad classification like "Apparel" or "Footwear."
- **Sub Category:** Specific category within a master category, such as "Topwear."
- **Article Type:** Exact type of product, e.g., "Shirts" or "Jackets."
- **Base Colour:** Main color of the product, important for customer choice.
- **Season:** Intended season for the product, e.g., "Summer" or "Winter."
- **Year:** Year of release or collection.
- **Usage:** Intended use or occasion, like "Casual" or "Formal."
- **Product Display Name:** Descriptive name used in marketing.
- **Price:** Cost of the product.
- **Product ID:** Unique identifier for managing and tracking inventory.

<a id='2-2'></a>
### 2.2 FAQ Database

Now, let's load the FAQ database and explore it.

In [10]:
faq = joblib.load("dataset/faq.joblib")

In [11]:
# Get an example
faq[:2]

[{'question': 'What are your store hours?',
  'answer': 'Our online store is open 24/7. Customer service is available from 9:00 AM to 6:00 PM, Monday through Friday.',
  'type': 'general information'},
 {'question': 'Where is Fashion Forward Hub located?',
  'answer': 'Fashion Forward Hub is primarily an online store. Our corporate office is located at 123 Fashion Lane, Trend City, Style State.',
  'type': 'general information'}]

The FAQs are organized in a list, where each entry is a dictionary containing the following keys: `question`, `answer`, and `type`.

<a id='3'></a>
## 3 - Recap on LLM calls and new output
---

Let's recap the previous function you used to generate prompts. Now it has been enhanced to output the full OpenAI object! This includes a several parameters that will help you trace your application.
Recall:

```Python
generate_with_single_input(prompt: str,
                           role: str = 'user',
                           top_p: float = 1,
                           temperature: float = 1,
                           max_tokens: int = 500,
                           model: str = "meta-llama/Llama-3.2-3B-Instruct-Turbo")
```

Let's now understand the new behavior:

In [12]:
# The output is a dictionary containing the role and content from the LLM call, as well as the token usage.:
result = generate_with_single_input("What are the primary colors?")
print(json.dumps(result, indent = 2))

{
  "id": "o5hraHh-4msxKE-96980e90fdcefa7e",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "The primary colors are:\n\n1. Red\n2. Blue\n3. Yellow\n\nThese colors cannot be created by mixing other colors together, and they are the base colors used to create all other colors.",
        "refusal": null,
        "role": "assistant",
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": []
      },
      "seed": 17444515820048284000
    }
  ],
  "created": 1754247616,
  "model": "meta-llama/Llama-3.2-3B-Instruct-Turbo",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 42,
    "prompt_tokens": 41,
    "total_tokens": 83,
    "completion_tokens_details": null,
    "prompt_tokens_details": null,
    "cached_tokens": 0
  },
  "prompt": []
}


In [13]:
# To retreive the content, then you should do as follows:
print(result['choices'][0]['message']['content'])

The primary colors are:

1. Red
2. Blue
3. Yellow

These colors cannot be created by mixing other colors together, and they are the base colors used to create all other colors.


In [14]:
# The total tokens count (input + output) for this is:
print(result['usage']['total_tokens'])

83


Note that there is now a new key called `total_tokens`. Usually, LLM costs are measured by token count (both input and output).

You may think this is a complicated way of getting the information, but this is how OpenAI structure works!

<a id='3-1'></a>
### 3.1 Function to generate the parameters dictionary

This function was used in the previous assignment. You will also need it in this assignment.

In [15]:
def generate_params_dict(
    prompt: str,
    temperature: float = 1.0,
    role: str = 'user',
    top_p: float = 1.0,
    max_tokens: int = 500,
    model: str = "meta-llama/Llama-3.2-3B-Instruct-Turbo"
) -> dict:
    """
    Generates a dictionary of parameters for calling a Language Learning Model (LLM),
    allowing for the customization of several key options that can affect the output from the model. 

    Args:
        prompt (str): The input text that will be provided to the model to guide text generation.
        temperature (float): A value between 0 and 1 that controls the randomness of the model's output; 
            lower values result in more repetitive and deterministic results, while higher values enhance randomness.
        role (str): The role designation to be used in context, typically identifying the initiator of the interaction.
        top_p (float): A value between 0 and 1 that manages diversity through the technique of nucleus sampling; 
            this parameter limits the set of considered words to the smallest possible while maintaining 'top_p' cumulative probability.
        max_tokens (int): The maximum number of tokens that the model is allowed to generate in response, where a token can 
            be as short as one character or as long as one word.
        model (str): The specific model identifier to be utilized for processing the request. This typically specifies both 
            the version and configuration of the LLM to be employed.

    Returns:
        dict: A dictionary containing all specified parameters which can then be used to configure and execute a call to the LLM.
    """
    # Create the dictionary with the necessary parameters
    kwargs = {
        "prompt": prompt,
        "role": role,
        "temperature": temperature,
        "top_p": top_p,
        "max_tokens": max_tokens,
        "model": model
    }
    return kwargs

In [16]:
kwargs = generate_params_dict("Solve 3x^2 + 5 = 0")
print(kwargs)

{'prompt': 'Solve 3x^2 + 5 = 0', 'role': 'user', 'temperature': 1.0, 'top_p': 1.0, 'max_tokens': 500, 'model': 'meta-llama/Llama-3.2-3B-Instruct-Turbo'}


In [17]:
# Now you can call the LLM 
result = generate_with_single_input(**kwargs)
content = result['choices'][0]['message']['content']
total_tokens = result['usage']['total_tokens']
print(f"Content: {content}\n\nTotal Tokens: {total_tokens}")

Content: To solve the quadratic equation 3x^2 + 5 = 0, we can start by subtracting 5 from both sides:

3x^2 = -5

Next, we can divide both sides by 3 to isolate x^2:

x^2 = -5/3

Since the square of any real number cannot be negative, we know that this equation has no real solutions.

Total Tokens: 132


<a id='4'></a>
## 4 - Improving task handling
---

<a id='4-1'></a>
### 4.1 Refactoring the function to decide whether it is an FAQ or product-related question


This is the previous function you built to check if a query is related to FAQs or Products. Now, there are two main changes:

1. It now returns the total number of tokens used in the process (including both the input and output of the LLM) — this value is already given to you.
2. It has a new parameter called `simplified`. If `True`, it uses a shorter prompt — that’s your task in this exercise.

<a id='ex01'></a>

<a id='ex01'></a>
### Exercise 1

---

In this exercise, you need to improve the prompt that checks whether a query is FAQ-related or Product-related. The new prompt must use less than 180 tokens in total. It also needs to keep the same classification accuracy for the test set below — that means it must give the same result for each query.

<details>
    <summary style="color: green;"><strong>Hint 1</strong></summary>
Try making the text shorter and removing some examples. Make sure to include the user query in the prompt.
</details>

<details>
    <summary style="color: green;"><strong>Hint 2</strong></summary>
If your prompt doesn’t work for a specific query, consider adding that query (or a more difficult version of it) as an example. The better you handle tricky cases, the more reliable your prompt will be.
</details>


In [18]:
# GRADED CELL 
def check_if_faq_or_product(query, simplified = False):
    """
    Determines whether a given instruction prompt is related to a frequently asked question (FAQ) or a product inquiry.

    Parameters:
    - query (str): The instruction or query that needs to be labeled as either FAQ or Product related.
    - simplified (bool): If True, uses a simplified prompt.

    Returns:
    - str: The label 'FAQ' if the prompt is deemed a frequently asked question, 'Product' if it is related to product information, or
      None if the label is inconclusive.
    """
 
    # If not simplified, uses a more complex prompt
    if not simplified:
        PROMPT = f"""Label the following instruction as an FAQ related answer or a product related answer for a clothing store.
        Product related answers are answers specific about product information or that needs to use the products to give an answer.
        Examples:
                Is there a refund for incorrectly bought clothes? Label: FAQ
                Where are your stores located?: Label: FAQ
                Tell me about the cheapest T-shirts that you have. Label: Product
                Do you have blue T-shirts under 100 dollars? Label: Product
                What are the available sizes for the t-shirts? Label: FAQ
                How can I contact you via phone? Label: FAQ
                How can I find the promotions? Label: FAQ
                Give me ideas for a sunny look. Label: Product
        Return only one of the two labels: FAQ or Product, nothing more.
        Query to classify: {query}
                 """

    ##############################################
    ######### GRADED PART STARTS HERE ############
    ##############################################
    
    ### START CODE HERE ###

    # If simlpified, uses a simplified prompt.
    else:
        PROMPT = f"""Label the following instruction as an FAQ related answer or a product related answer for a clothing store.
        Product related answers are answers specific about product information or that needs to use the products to give an answer.
        Examples:
                Is there a refund for incorrectly bought clothes? Label: FAQ
                Where are your stores located?: Label: FAQ
                Tell me about the cheapest T-shirts that you have. Label: Product
                Do you have blue T-shirts under 100 dollars? Label: Product
        Return only one of the two labels: FAQ or Product, nothing more.
        Query to classify: {query}
                 """

        
    ### END CODE HERE ###

    ##############################################
    ######### GRADED PART ENDS HERE ############
    ##############################################
        
    with tracer.start_as_current_span("routing_faq_or_product", openinference_span_kind = 'tool') as span:
        span.set_input(str({"query":query, "simplified": simplified}))
        
        # Get the kwargs dictinary to call the llm, with PROMPT as prompt, low temperature (0 or near 0) and max_tokens = 10
        kwargs = generate_params_dict(PROMPT, temperature = 0, max_tokens = 10)

        # Call generate_with_single_input with **kwargs
        with tracer.start_as_current_span("router_call", openinference_span_kind = 'llm') as router_span:
            router_span.set_input(kwargs)
            try:
                response = generate_with_single_input(**kwargs) 
            except Exception as error:
                router_span.record_exception(error)
                router_span.set_status(Status(StatusCode.ERROR))
            else:
                # OpenInference Semantic Conventions for computing Costs
                router_span.set_attribute("llm.token_count.prompt", response['usage']['prompt_tokens'])
                router_span.set_attribute("llm.token_count.completion", response['usage']['completion_tokens'])
                router_span.set_attribute("llm.token_count.total", response['usage']['total_tokens'])
                router_span.set_attribute("llm.model_name", response['model'])
                router_span.set_attribute("llm.provider", 'together.ai')
                router_span.set_output(response)
                router_span.set_status(Status(StatusCode.OK))
        
    
        # Get the Label by accessing the content key of the response dictionary
        label = response['choices'][0]['message']['content']
        total_tokens = response['usage']['total_tokens']
        span.set_output(str({"label": label, 'total_tokens':total_tokens}))
        span.set_status(Status(StatusCode.OK))

        # Improvement to prevent cases where LLM outputs more than one word
        if 'faq' in label.lower():
            label = 'FAQ'
        elif 'product' in label.lower():
            label = 'Product'
        else:
            label = 'undefined'
    
        return label, total_tokens

In [20]:
unittests.test_check_if_faq_or_product(check_if_faq_or_product)

[91mFailed test case: Incorrect label for query=What is your return policy?.
Expected: FAQ
Got: Product

[91mFailed test case: Incorrect label for query=How can I contact the user support?.
Expected: FAQ
Got: Product




Let's test both versions:

In [21]:
queries = [
    'What is your return policy?', 
    'Give me three examples of blue T-shirts you have available.', 
    'How can I contact the user support?', 
    'Do you have blue Dresses?',
    'Create a look suitable for a wedding party happening during dawn.'
]

labels = ['FAQ', 'Product', 'FAQ', 'Product', 'Product']

for query, correct_label in zip(queries, labels):
    # Call check_if_faq_or_product and store the results
    response_std, tokens_std = check_if_faq_or_product(query, simplified=False)
    response_simp, tokens_simp = check_if_faq_or_product(query, simplified=True)
    
    # Print results
    process_and_print_query(query, correct_label, response_std, tokens_std, response_simp, tokens_simp)

Query: What is your return policy?
  Standard    → Label: [32mFAQ[0m | Tokens: [31m218[0m
  Simplified  → Label: [31mProduct[0m | Tokens: [32m165[0m

Query: Give me three examples of blue T-shirts you have available.
  Standard    → Label: [32mProduct[0m | Tokens: [31m224[0m
  Simplified  → Label: [32mProduct[0m | Tokens: [32m171[0m

Query: How can I contact the user support?
  Standard    → Label: [32mFAQ[0m | Tokens: [31m220[0m
  Simplified  → Label: [31mProduct[0m | Tokens: [32m167[0m

Query: Do you have blue Dresses?
  Standard    → Label: [32mProduct[0m | Tokens: [31m218[0m
  Simplified  → Label: [32mProduct[0m | Tokens: [32m165[0m

Query: Create a look suitable for a wedding party happening during dawn.
  Standard    → Label: [32mProduct[0m | Tokens: [31m224[0m
  Simplified  → Label: [32mProduct[0m | Tokens: [32m171[0m



<a id='4-2'></a>
### 4.2 Answering a FAQ question

Let's recap how to generate the FAQ layout. This won't be touched in this assignment. This function, given a list of dictionaries with the FAQ questions, returns a formatted string with the question/answer pairs.

In [22]:
@tracer.tool
def generate_faq_layout(faq_dict):
    """
    Generates a formatted string layout for a list of FAQs.

    This function iterates through a dictionary of frequently asked questions (FAQs) and constructs
    a string where each question is followed by its corresponding answer and type.

    Parameters:
    - faq_dict (list): A list of dictionaries, each containing keys 'question', 'answer', and 'type' 
      representing an FAQ entry.

    Returns:
    - str: A string representing the formatted layout of FAQs, with each entry on a separate line.
    """
    # Initialize an empty string
    t = ""
    
    # Iterate over every FAQ question in the FAQ list
    for f in faq_dict:
        # Append the question with formatted string (remember to use f-string and access the values as f['question'], f['answer'] and so on)
        # Also, do not forget to add a new line character (\n) at the end of each line.
        t += f"Question: {f['question']} Answer: {f['answer']} Type: {f['type']}\n" 

    return t

In [23]:
# You can generate a full faq_layout with the entire FAQ questions
faq_layout = generate_faq_layout(faq)
print(faq_layout[:1000])

Question: What are your store hours? Answer: Our online store is open 24/7. Customer service is available from 9:00 AM to 6:00 PM, Monday through Friday. Type: general information
Question: Where is Fashion Forward Hub located? Answer: Fashion Forward Hub is primarily an online store. Our corporate office is located at 123 Fashion Lane, Trend City, Style State. Type: general information
Question: Do you have a physical store location? Answer: At this time, we operate exclusively online. This allows us to offer a broader selection and lower prices directly to you. Type: general information
Question: How can I create an account with Fashion Forward Hub? Answer: Click on 'Sign Up' in the top right corner of our website and follow the instructions to set up your account. Type: general information
Question: How do I subscribe to your newsletter? Answer: To receive the latest updates and promotions, sign up for our newsletter at the bottom of our homepage. Type: general information
Question:

In [24]:
# You can choose some faq questions and generate a layout from them. 
# They just need to be in a list with dictionaries with the necessary keys: 'question', 'answer' and 'type'
print(generate_faq_layout(faq[1:2]))

Question: Where is Fashion Forward Hub located? Answer: Fashion Forward Hub is primarily an online store. Our corporate office is located at 123 Fashion Lane, Trend City, Style State. Type: general information



<a id='4-3'></a>
### 4.3 Querying on FAQ

In the previous assignment, the entire FAQ was added in the query. This approach is useful to provide the entire information to the LLM, but it significantly increases token usage and execution time. Now that you are refining your chatbot, it's time to use a more efficient collection to handle it! Let's load the collection.

In [25]:
faq_collection = client.collections.get("Faq")

Let's add the FAQ questions into a collection.

In [26]:
from tqdm import tqdm
from weaviate.util import generate_uuid5
# Set up a batch process with specified fixed size and concurrency
with faq_collection.batch.fixed_size(batch_size=20, concurrent_requests=5) as batch:
    # Iterate over a subset of the dataset
    for document in tqdm(faq):
        # Generate a UUID based on the chunk text for unique identification
        uuid = generate_uuid5(document['question'])

        # Add the chunk object to the batch with properties and UUID
        batch.add_object(
            properties=document,
            uuid=uuid,
        )

100%|██████████| 25/25 [00:00<00:00, 18016.77it/s]


Now you can query them! Let's go over an example.

In [27]:
res = faq_collection.query.near_text("What is the return policy?", limit = 5)

In [28]:
for obj in res.objects:
    print_properties(obj)

{
  "answer": "We accept returns within 30 days of delivery. Conditions apply for specific categories like accessories.",
  "question": "What is your return policy timeframe?",
  "type": "returns and exchanges"
}
{
  "answer": "Sale items are final sale and cannot be returned or exchanged, unless stated otherwise.",
  "question": "Can I return a sale item?",
  "type": "returns and exchanges"
}
{
  "answer": "Return processing typically takes 5-7 business days from when the item is received at our warehouse.",
  "question": "How long does it take to process a return?",
  "type": "returns and exchanges"
}
{
  "answer": "We provide a prepaid return label for domestic returns. For international returns, shipping is at the customer's cost.",
  "question": "Are return shipping costs covered?",
  "type": "returns and exchanges"
}
{
  "answer": "Initiate an exchange through our Returns Center, selecting the item you wish to exchange and the desired replacement.",
  "question": "How do I exchan

<a id='ex02'></a>

<a id='ex02'></a>
### Exercise 2
---

Below is the function used to answer a FAQ question.
**It will only run if the question is already labeled as a FAQ.**
You’ve seen this question in a previous assignment, but now there’s one change:

1. A new parameter called `simplified` was added.
   This controls whether the function uses the full `faq` list or a smaller selection from it.
   If `simplified` is `True`, you should run a semantic search on the FAQ collection and use only the top 5 results.

That’s your task in this exercise.

Your solution must return **fewer than 500 tokens** for the query below.

<details>
    <summary style="color: green;"><strong>Hint 1</strong></summary>
To run a semantic search on the Weaviate database (Module 3), use:  
<code>faq_collection.query.near_text(query, limit=5)</code>
</details>

<details>
    <summary style="color: green;"><strong>Hint 2</strong></summary>
    Don’t forget to call <code>generate_faq_layout()</code> with the result list as its argument.
</details>

In [39]:
# GRADED CELL 

def query_on_faq(query, simplified = False, **kwargs):
    """
    Constructs a prompt to query an FAQ system and generates a response.

    This function integrates an FAQ layout into the prompt to help generate a suitable answer to the given query
    using a language model. It supports additional keyword arguments to customize the prompt generation process.

    Parameters:
    - query (str): The query about which the function seeks to provide an answer from the FAQ.
    - simplified (bool): If True, uses semantic search to extract a relevant subset of FAQ questions
    - **kwargs: Optional keyword arguments for extra configuration of prompt parameters.

    Returns:
    - str: The response generated from the language model based on the input query and FAQ layout.

    """

    
    # If not simplified, generate the faq layout with the entire FAQ questions
    if not simplified:
        # Set the tracer as a chain type, since in non-simplified version, the full FAQ is used
        with tracer.start_as_current_span("query_on_faq", openinference_span_kind="tool") as span:
            
            span.set_input({"query": query, "simplified": simplified})
            faq_layout = generate_faq_layout(faq)
            
            # Generate the prompt
            PROMPT = f"""You will be provided with an FAQ for a clothing store. 
        Answer the instruction based on it. You might use more than one question and answer to make your answer. Only answer the question and do not mention that you have access to a FAQ. 
        <FAQ_ITEMS>
        PROVIDED FAQ: {faq_layout}
        </FAQ_ITEMS>
        Question: {query}
            """ 
            span.set_attribute("prompt", PROMPT)

            # Generate the parameters dict with PROMPT and **kwargs 
            kwargs = generate_params_dict(PROMPT, **kwargs) 

            span.set_attribute("output", str(kwargs))
            span.set_status(Status(StatusCode.OK))
    
            return kwargs
        
   
    
    else:
        with tracer.start_as_current_span("query_on_faq", openinference_span_kind="tool") as span:
            span.set_input({"query": query, "simplified": simplified})
            with tracer.start_as_current_span("retrieve_faq_questions", openinference_span_kind="retriever") as retrieve_span:
                
                ##############################################
                ######### GRADED PART STARTS HERE ############
                ##############################################
                
                ### START CODE HERE ###
                
                # Get the 5 most relevant FAQ objects, in this case limit = None
                results = faq_collection.query.near_text(query, limit=5)

                ### END CODE HERE ###

                ##############################################
                ######### GRADED PART ENDS HERE ##############
                ##############################################
                
                # Set the retrieved documents as attributes on the span
                for i, document in enumerate(results.objects): 
                    retrieve_span.set_attribute(f"retrieval.documents.{i}.document.id", str(document.uuid)) 
                    retrieve_span.set_attribute(f"retrieval.documents.{i}.document.metadata", str(document.metadata)) 
                    retrieve_span.set_attribute( 
                        f"retrieval.documents.{i}.document.content", str(document.properties) 
                    )  
            # Transform the results in a list of dictionary
                results = [x.properties for x in results.objects] 
                # Reverse the order to add the most relevant objects in the bottom, so it gets closer to the end of the input
                results.reverse() 
                # Generate the faq layout with the new list of FAQ questions `results`
                faq_layout = generate_faq_layout(results) 

            # Different prompt to deal with this new scenario. 
            PROMPT = (f"You will be provided with a query for a clothing store regarding FAQ. It will be provided relevant FAQ from the clothing store." 
        f"Answer the query based on the relevant FAQ provided. They are ordered in decreasing relevance, so the first is the most relevant FAQ and the last is the least relevant."  
        f"Answer the instruction based on them. You might use more than one question and answer to make your answer. Only answer the question and do not mention that you have access to a FAQ.\n"  
        f"<FAQ>\n"  
        f"RELEVANT FAQ ITEMS:\n{faq_layout}\n"  
        f"</FAQ>\n" 
        f"Query: {query}")

    
        
            span.set_attribute("prompt", PROMPT)
        
            # Generate the parameters dict with PROMPT and **kwargs 
            kwargs = generate_params_dict(PROMPT, **kwargs) 
        
            span.set_attribute("output", str(kwargs))
            span.set_status(Status(StatusCode.OK))
    
            return kwargs

In [40]:
unittests.test_query_on_faq(query_on_faq)

[92m All tests passed!


In [41]:
# Get the dictionary of arguments
kwargs = query_on_faq("I received the dress I ordered but I don't like it. How can I return it?")

In [42]:
# The number of split tokens in this prompt is:
print(len(kwargs['prompt'].split()))

788


Note: The number mentioned above doesn’t match the exact token count the model will use. Tokenization is more complex than just splitting by words, so the actual count might be higher. Still, this gives you a rough idea of the text size and is useful for comparison.

In [43]:
# Run the inference
result = generate_with_single_input(**kwargs)

Let's check the content without the simplified version:

In [44]:
print(result['choices'][0]['message']['content'])

To initiate a return, please follow these steps:

1. Go to our Returns Center at the bottom of the webpage.
2. Select the order containing the item you wish to return and the item itself.
3. Choose the reason for return (if applicable) and click 'Continue.'
4. You will be provided with a prepaid return label, which can be printed and attached to the outside of your return package. If you are shipping internationally, please note that you are responsible for the return shipping costs.

Once we receive your returned item, we will process your refund or exchange according to the original payment method and the conditions stated for the specific category of the item being returned.


In [45]:
# Get the total tokens
print(result['usage']['total_tokens'])

1190


Note that for one query, the total tokens is around 1220.

Now let's check the simplified version.

In [46]:
# Get the dictionary of arguments
kwargs = query_on_faq("I received the dress I ordered but I don't like it. How can I return it?", simplified = True)

In [47]:
# The number of split tokens in this prompt is:
print(len(kwargs['prompt'].split()))

250


In [48]:
# Run the inference
result = generate_with_single_input(**kwargs)

In [49]:
print(result['choices'][0]['message']['content'])

You can initiate a return through our Returns Center. Select the dress you wish to return and the option for a return. We will provide a prepaid return label for domestic returns. For your convenience, our Returns Center can guide you through the rest of the process.


In [50]:
# Get the total tokens
print(result['usage']['total_tokens'])

400


Note that the answer is still correct and the final token count is way smaller!

<a id='4-4'></a>

<a id='4-4'></a>
### 4.4 Improving the Decision Between Creative or Technical Product Queries

<a id='ex03'></a>

<a id='ex03'></a>
### Exercise 3

---

This task is similar to what you did when deciding whether a query was about a product or a FAQ. The goal is the same: reduce token usage while keeping good accuracy.

This function has two updates compared to the previous version:

1. It now returns the total number of tokens used during processing.
2. It includes a new argument called `simplified`.

Your solution must meet both of the following conditions:

* Accuracy of at least **80%** on the test set (you can get **at most one** question wrong).
* Use **fewer than 150 tokens** for **every** query.

<details>
    <summary style="color: green;"><strong>Hint 1</strong></summary>
Try shortening the examples and removing any that aren’t essential. Don’t forget to include the query in the prompt!
</details>

<details>
    <summary style="color: green;"><strong>Hint 2</strong></summary>
If your prompt struggles with a particular query, try adding it—or a more challenging version of it—as an example. The more difficult edge cases you cover, the better the model will perform.
</details>

In [52]:
# GRADED CELL 
def decide_task_nature(query, simplified = True):
    """
    Determines the nature of a query, labeling it as either creative or technical.

    This function constructs a prompt for a language model to decide if a given query requires a creative response,
    such as making suggestions or composing ideas, or a technical response, like providing product details or prices.

    Parameters:
    - query (str): The query to be evaluated for its nature.
    - simplified (bool): If True, uses a simplified prompt.

    Returns:
    - str: The label 'creative' if the query requires creative input, or 'technical' if it requires technical information.
    """


    
    if not simplified:
        PROMPT = f"""Decide if the following query is a query that requires creativity (creating, composing, making new things) or technical (information about products, prices etc.). Label it as creative or technical.
          Examples:
          Give me suggestions on a nice look for a nightclub. Label: creative
          What are the blue dresses you have available? Label: technical
          Give me three Tshirts for summer. Label: technical
          Give me a look for attending a wedding party. Label: creative
          Give me suggestions on clothes that match a green Tshirt. Label: creative
          I would like a suggestion on which products match a green Tshirt I already have. Label: creative

          Query to be analyzed: {query}. Only output one token with the label
          """

    # If simplified, uses a simplified query

    ##############################################
    ######### GRADED PART STARTS HERE ############
    ##############################################

    ### START CODE HERE ###

    else:
        PROMPT = f"""Decide if the following query is a query that requires creativity (creating, composing, making new things) or technical (information about products, prices etc.). Label it as creative or technical.
          Examples:
          Give me suggestions on a nice look for a nightclub. Label: creative
          What are the blue dresses you have available? Label: technical
          Query to be analyzed: {query}. Only output one token with the label
          """

    ### END CODE HERE ###

    ##############################################
    ######### GRADED PART ENDS HERE ##############
    ##############################################

    
    with tracer.start_as_current_span("decide_task_nature", openinference_span_kind="tool") as span:
    # Generate the kwards dictionary by passing the PROMPT, low temperature and max_tokens = 1
        span.set_input({"query":query, "simplified": simplified})
        kwargs = generate_params_dict(PROMPT, temperature = 0, max_tokens = 1)

        with tracer.start_as_current_span("router_call", openinference_span_kind = 'llm') as router_span:
            router_span.set_input(kwargs)
            try:
                response = generate_with_single_input(**kwargs) 
            except Exception as error:
                router_span.record_exception(error)
                router_span.set_status(Status(StatusCode.ERROR))
            else:
                # OpenInference Semantic Conventions for computing Costs
                router_span.set_attribute("llm.token_count.prompt", response['usage']['prompt_tokens'])
                router_span.set_attribute("llm.token_count.completion", response['usage']['completion_tokens'])
                router_span.set_attribute("llm.token_count.total", response['usage']['total_tokens'])
                router_span.set_attribute("llm.model_name", response['model'])
                router_span.set_attribute("llm.provider", 'together.ai')
                router_span.set_output(response)
                router_span.set_status(Status(StatusCode.OK))

        # Get the Label by accessing the content key of the response dictionary
        label = response['choices'][0]['message']['content']
        total_tokens = response['usage']['total_tokens']
        span.set_output(str({"label": label, 'total_tokens':total_tokens}))
        span.set_status(Status(StatusCode.OK))    
    
        return label, total_tokens

In [53]:
unittests.test_decide_task_nature(decide_task_nature)

[92m All tests passed!


In [54]:
queries = ["Give me two sneakers with vibrant colors.",
           "What are the most expensive clothes you have in your catalogue?",
           "I have a green Dress and I like a suggestion on an accessory to match with it.",
           "Give me three trousers with vibrant colors you have in your catalogue.",
           "Create a look for a woman walking in a park on a sunny day. It must be fresh due to hot weather."
           ]

labels = ['technical', 'technical', 'creative', 'technical', 'creative']

In [55]:
for query, correct_label in zip(queries, labels):
    response, total_tokens = decide_task_nature(query, simplified = True)
    label = response
    if label == correct_label:
        label = "\033[32m" + label + "\033[0m" 
    else:
        label = "\033[31m" + label + "\033[0m"
    if total_tokens > 150:
        total_tokens = "\033[31m"  + str(total_tokens) + "\033[0m"
    else:
        total_tokens = "\033[32m"  + str(total_tokens) + "\033[0m"
    print(f"Query: {query} Label Predicted: {label}. Correct Label: {correct_label} Total Tokens: {total_tokens}")

Query: Give me two sneakers with vibrant colors. Label Predicted: [32mtechnical[0m. Correct Label: technical Total Tokens: [32m128[0m
Query: What are the most expensive clothes you have in your catalogue? Label Predicted: [32mtechnical[0m. Correct Label: technical Total Tokens: [32m132[0m
Query: I have a green Dress and I like a suggestion on an accessory to match with it. Label Predicted: [31mtechnical[0m. Correct Label: creative Total Tokens: [32m138[0m
Query: Give me three trousers with vibrant colors you have in your catalogue. Label Predicted: [32mtechnical[0m. Correct Label: technical Total Tokens: [32m133[0m
Query: Create a look for a woman walking in a park on a sunny day. It must be fresh due to hot weather. Label Predicted: [32mcreative[0m. Correct Label: creative Total Tokens: [32m144[0m


<a id='4-5'></a>
### 4.5 Retrieving the parameters for a given task

This is the same function as the previous assignment. It uses different parameters for creative and technical questions.

In [56]:
@tracer.tool
def get_params_for_task(task):
    """
    Retrieves specific language model parameters based on the task nature.

    This function provides parameter sets tailored for creative or technical tasks to optimize
    language model behavior. For creative tasks, higher randomness is encouraged, while technical
    tasks are handled with more focus and precision. A default parameter set is provided for unexpected cases.

    Parameters:
    - task (str): The nature of the task ('creative' or 'technical').

    Returns:
    - dict: A dictionary containing 'top_p' and 'temperature' settings for the specified task.
    """
    # Create the parameters dict for technical and creative tasks
    PARAMETERS_DICT = {"creative": {'top_p': 0.9, 'temperature': 1},
                       "technical": {'top_p': 0.7, 'temperature': 0.3}} 
    
    # If task is technical, return the value for the key technical in PARAMETERS_DICT
    if task == 'technical':
        param_dict = PARAMETERS_DICT['technical'] 

    # If task is creative, return the value for the key creative in PARAMETERS_DICT
    if task == 'creative':
        param_dict = PARAMETERS_DICT['creative'] 

    # If task is a different value, fallback to another set of parameters
    else: # Fallback to a standard value
        param_dict = {'top_p': 0.5, 'temperature': 1} 

    
    return param_dict

<a id='5'></a>
## 5 - Retrieving Items Based on Metadata from a Query

---

In the previous framework, when a query is identified as a product query, you need to find and return relevant products from the vector database. This process works in three main steps:

1. **Generate a metadata JSON** — Use the LLM to guess likely values for some product categories based on the query.
2. **Run a semantic search** — Use those values as filters when querying the database.
3. **Return the results** — Provide the most relevant products found.

The metadata should include values for the following features:

* Gender
* Master Category
* Article Type
* Base Color
* Season
* Usage

These categories offer a good trade-off between being specific enough to improve relevance and general enough to avoid missing results. Using too many or overly detailed filters could lead to no matches, while including too few could make the query too broad and inefficient. This balance helps keep the system fast, accurate, and cost-effective in real-world use.

In [57]:
# Let's remember the data structure of a product
products_data[0]

{'gender': 'Men',
 'masterCategory': 'Apparel',
 'subCategory': 'Topwear',
 'articleType': 'Shirts',
 'baseColour': 'Navy Blue',
 'season': 'Fall',
 'year': 2011,
 'usage': 'Casual',
 'productDisplayName': 'Turtle Check Men Navy Blue Shirt',
 'price': 67.0,
 'product_id': 15970}

This is a dictionary with every possible value for the categories the LLM can pick from to generate a JSON. Note that this dictionary can become huge.

In [58]:
# Run this cell to generate the dictionary with the possible values for each key
values = {}
for d in products_data:
    for key, val in d.items():
        if key in ('product_id', 'price', 'productDisplayName', 'subCategory', 'year'):
            continue
        if key not in values.keys():
            values[key] = set()
        values[key].add(val)

In [59]:
# Example of possible values for the feature 'season'
values['season']

{'All seasons', 'Fall', 'Spring', 'Summer', 'Winter'}

<a id='5-1'></a>
### 5.1 Generate metadata

This function generates a metadata JSON with possible values for each clothing category. The possible values are passed through the dictionary "values". 

Note that the prompt is huge. Let's investigate the total tokens for a query.

In [60]:
def generate_metadata_from_query(query):
    """
    Generates metadata in JSON format based on a given query to filter clothing items.

    This function constructs a prompt for a language model to create a JSON object that will
    guide the filtering of a vector database query for clothing items. It takes possible values from
    a predefined set and ensures only relevant metadata is included in the output JSON.

    Parameters:
    - query (str): The query describing specific clothing-related needs.

    Returns:
    - str: A JSON string representing metadata with keys like gender, masterCategory, articleType,
      baseColour, price, usage, and season. Each value in the JSON is within a list, with prices specified
      as a dict containing "min" and "max" values. Unrestricted keys should use ["Any"] and unspecified
      prices should default to {"min": 0, "max": "inf"}.
    """

    # Set the prompt. Remember to include the query, the desired JSON format, the possible values (passing {values} at some point) 
    # and explain to the LLM what is going on. 
    # Explicitly tell the llm to include gender, masterCategory, ArticleType, baseColour, price, usage and season as keys.
    # Also mention to the llm that price key must be a json with "min" and "max" values (0 if no lower bound and inf if no upper bound)
    # If there is no price set, add min = 0 and max = inf.
    PROMPT = f"""
    One query will be provided. For the given query, there will be a call on vector database to query relevant clothing items. 
    Generate a JSON with useful metadata to filter the products in the query. Possible values for each feature is in the following json: {values}

    Provide a JSON with the features that best fit in the query (can be more than one, write in a list). Also, if present, add a price key, saying if there is a price range (between values, greater than or smaller than some value).
    Only return the JSON, nothing more. price key must be a JSON with "min" and "max" values (0 if no lower bound and inf if no upper bound). 
    Always include gender, masterCategory, articleType, baseColour, price, usage and season as keys. All values must be within lists.
    If there is no price set, add min = 0 and max = inf.
    Only include values that are given in the json above. 
    
    Example of expected JSON:

    {{
    "gender": ["Women"],
    "masterCategory": ["Apparel"],
    "articleType": ["Dresses"],
    "baseColour": ["Blue"],
    "price": {{"min": 0, "max": "inf"}},
    "usage": ["Formal"],
    "season": ["All seasons"]
    }}

    Query: {query}
             """
    with tracer.start_as_current_span("generate_metadata_from_query", openinference_span_kind="tool") as span:
        span.set_input(query)
        with tracer.start_as_current_span("llm_call", openinference_span_kind="llm") as metadata_span:
            # Generate the response with the generate_with_single_input, PROMPT, temperature = 0 (low randomness) and max_tokens = 1500.
            kwargs = {"prompt": PROMPT, 'temperature': 0, "max_tokens": 1500}  # @REPLACE EQUALS None
            metadata_span.set_input(kwargs)
            try:
                response = generate_with_single_input(**kwargs) 
            except Exception as error:
                metadata_span.record_exception(error)
                metadata_span.set_status(Status(StatusCode.ERROR))
            else:
                # OpenInference Semantic Conventions for computing Costs
                metadata_span.set_attribute("llm.token_count.prompt", response['usage']['prompt_tokens'])
                metadata_span.set_attribute("llm.token_count.completion", response['usage']['completion_tokens'])
                metadata_span.set_attribute("llm.token_count.total", response['usage']['total_tokens'])
                metadata_span.set_attribute("llm.model_name", response['model'])
                metadata_span.set_attribute("llm.provider", 'together.ai')
                metadata_span.set_output(response)
                metadata_span.set_status(Status(StatusCode.OK))

        # Get the Label by accessing the content key of the response dictionary
        content = response['choices'][0]['message']['content']
        total_tokens = response['usage']['total_tokens']
        span.set_output({"content": content, 'total_tokens':total_tokens})
        span.set_status(Status(StatusCode.OK))   

    
    return content, total_tokens

In [61]:
content, total_tokens = generate_metadata_from_query("Create a look for a man that suits a sunny day in the park. I don't want to spend more than 300 dollars on each piece.")

In [62]:
print(content)

{
    "gender": ["Men"],
    "masterCategory": ["Apparel"],
    "articleType": ["Shirts", "Shorts", "Sunglasses"],
    "baseColour": ["Yellow", "Orange", "Green", "Blue", "Red"],
    "price": {"min": 0, "max": 300},
    "usage": ["Casual", "Smart Casual"],
    "season": ["Summer"]
}


In [63]:
print(total_tokens)

1463


So far, each product query has involved processing around **1,500 tokens**—mainly because we generate a set of filters across multiple categories before searching.

You will now **simplify** this process.

Instead of creating detailed filters for each category (like gender, color, etc.), the system will just use **semantic search directly on the user query**. This means:

* No more generating metadata.
* Just take the user’s question and run a semantic search on the product collection.

This approach is faster, uses fewer tokens, and is still effective for most queries.

<a id='5-2'></a>
### 5.2 Loading the Weaviate Product Collection

Now it is time to work with the Weaviate collection. It is already given to you and it is the product_data you saw before, but added as a Weaviate collection, so we can query with semantic search and metadata filtering.

In [64]:
products_collection = client.collections.get('products')

In [65]:
len(products_collection)

44423

<a id='5-3'></a>

<a id='5-3'></a>
### 5.3 Filtering by Metadata

The functions used to filter by metadata have been moved to the **`utils.py`** file.
You can find this file in the **File Browser** on the left panel.

You worked with these functions in the previous assignment, but for this one, **you won’t need to use them directly**.

So, let’s go ahead and jump into the exercise!

In [66]:
@tracer.tool
def parse_json_output(llm_output):
    """
    Parses a string output from a language model into a JSON object.

    This function attempts to clean and parse a JSON-formatted string produced by a language model (LLM).
    The input string might contain minor formatting issues, such as unnecessary newlines or single quotes
    instead of double quotes. The function attempts to correct such issues before parsing.

    Parameters:
    - llm_output (str): The string output from the language model that is expected to be in JSON format.

    Returns:
    - dict or None: A dictionary if parsing is successful, or None if the input string cannot be parsed into valid JSON.

    Exception Handling:
    - In case of a JSONDecodeError during parsing, an error message is printed, and the function returns None.
    """
    try:
        # Since the input might be improperly formatted, ensure any single quotes are removed
        llm_output = llm_output.replace("\n", '').replace("'",'').replace("}}", "}").replace("{{", "{")  # Remove any erroneous structures
        
        # Attempt to parse JSON directly provided it is a properly-structured JSON string
        parsed_json = json.loads(llm_output)
        return parsed_json
    except json.JSONDecodeError as e:
        print(f"JSON parsing failed: {e}")
        return None

@tracer.tool
def get_filter_by_metadata(json_output: dict | None = None):
    """
    Generate a list of Weaviate filters based on a provided metadata dictionary.

    Parameters:
    - json_output (dict) or None: Dictionary containing metadata keys and their values.

    Returns:
    - list[Filter] or None: A list of Weaviate filters, or None if input is None.
    """
    # If the input dictionary is None, return None immediately
    if json_output is None:
        return None

    # Define a tuple of valid keys that are allowed for filtering
    valid_keys = (
        'gender',
        'masterCategory',
        'articleType',
        'baseColour',
        'price',
        'usage',
        'season',
    )

    # Initialize an empty list to store the filters
    filters = []

    # Iterate over each key-value pair in the input dictionary
    for key, value in json_output.items():
        # Skip the key if it is not in the list of valid keys
        if key not in valid_keys:
            continue

        # Special handling for the 'price' key
        if key == 'price':
            # Ensure the value associated with 'price' is a dictionary
            if not isinstance(value, dict):
                continue

            # Extract the minimum and maximum prices from the dictionary
            min_price = value.get('min')
            max_price = value.get('max')

            # Skip if either min_price or max_price is not provided
            if min_price is None or max_price is None:
                continue

            # Skip if min_price is non-positive or max_price is infinity
            if min_price <= 0 or max_price == 'inf':
                continue

            # Add filters for price greater than min_price and less than max_price
            filters.append(Filter.by_property(key).greater_than(min_price))
            filters.append(Filter.by_property(key).less_than(max_price))
        else:
            # For other valid keys, add a filter that checks for any of the provided values
            filters.append(Filter.by_property(key).contains_any(value))

    return filters


@tracer.tool
def generate_filters_from_query(query):
    json_string, total_tokens = generate_metadata_from_query(query)
    json_output = parse_json_output(json_string)
    filters = get_filter_by_metadata(json_output)
    return filters, total_tokens

<a id='ex04'></a>
### Exercise 4

---


The next function retrieves relevant products based on a query.
It’s a modified version of the one you used previously, with one key change:

* It now includes a boolean parameter called `simplified`.
* If `simplified` is `True`, the function **must skip metadata filtering** and perform a **simple semantic search** using the query.
* Choose an appropriate limit—5 may be too low. In the previous scenario, 20 items were returned, so you might want to stick with that.

Therefore, when `simplified = True`, you should only run a semantic search—**no metadata filters should be applied**.

<details>
  <summary style="color: green;"><strong>Hint 1</strong></summary>
  Use the <code>products_collection</code> (not <code>faq_collection</code>).  
  To query the Weaviate database, use:  
  <code>products_collection.query.near_text(query, limit=pick_your_limit)</code>
</details>

In [67]:
# GRADED CELL

def get_relevant_products_from_query(query, simplified = False):
    """
    Retrieve the most relevant products for a given query by applying semantic search and optional filters.

    This function generates metadata filters from the query and uses them to search for products 
    that best match the intended criteria. If `simplified` is True, it performs only a basic semantic 
    search with no filters. If the filtered search returns too few results, it progressively reduces 
    filtering constraints based on the predefined importance of each filter.

    Parameters:
    query (str): The query string used to search for relevant products.
    simplified (bool): If True, only a simple semantic search is performed without any metadata filters.

    Returns:
    list: A list of product objects that are most relevant to the query.
    total_tokens: The number of tokens used in the LLM call. Returns 0 if simplified search is used.
    """
    
    ##############################################
    ######### GRADED PART STARTS HERE ############
    ##############################################
    
    ### START CODE HERE ###
    
    # If simplified, just do a semantic search with 20 objects and return it
    if simplified:
        with tracer.start_as_current_span("get_relevant_products_from_query", openinference_span_kind="retriever") as span:  
            span.set_input({'query':query, 'simplified':simplified})
            
            ### YOUR CODE BELOW ###
            results = products_collection.query.near_text(query, limit=5)

            # Set the retrieved documents as attributes on the span
            for i, document in enumerate(results.objects): 
                span.set_attribute(f"retrieval.documents.{i}.document.id", str(document.uuid)) 
                span.set_attribute(f"retrieval.documents.{i}.document.metadata", str(document.metadata)) 
                span.set_attribute( 
                    f"retrieval.documents.{i}.document.content", str(document.properties) #@ KEEP
                )  
    
    ### END CODE HERE ###

    ##############################################
    ######### GRADED PART ENDS HERE #############
    ##############################################
            
            span.set_output({"results": results.objects, "total_tokens": 0})
            span.set_status(Status(StatusCode.OK))  
    
            return results.objects, 0  # Total tokens in this case is 0 because there was no LLM call!
    # If not simplified, perform the previous workflow by generating the filters and then doing a semantic search with them
    
    with tracer.start_as_current_span("get_relevant_products_from_query", openinference_span_kind="retriever") as span:  
        span.set_input({'query':query, 'simplified':simplified})
        filters, total_tokens = generate_filters_from_query(query)  # Generate filters based on the query

    # Check if there are no applicable filters
        if filters is None or len(filters) == 0:
            span.set_attribute("retrieval.filters", '')
            results = products_collection.query.near_text(query, limit=20) 
            # Set the retrieved documents as attributes on the span
            for i, document in enumerate(results.objects): 
                span.set_attribute(f"retrieval.documents.{i}.document.id", str(document.uuid))
                span.set_attribute(f"retrieval.documents.{i}.document.metadata", str(document.metadata)) 
                span.set_attribute( 
                    f"retrieval.documents.{i}.document.content", str(document.properties) 
                )  
            span.set_output({"results": results.objects, "total_tokens": total_tokens})
            span.set_status(Status(StatusCode.OK))  
            return results.objects, total_tokens
    # Query with filters and limit to the top 20 relevant objects
        span.set_attribute("retrieval.filters", str(filters))
        results = products_collection.query.near_text(query, filters=Filter.all_of(filters), limit=20)
        span.set_attribute("retrieval.len", len(results.objects))
        # Set the retrieved documents as attributes on the span
        for i, document in enumerate(results.objects): 
            span.set_attribute(f"retrieval.documents.{i}.document.id", str(document.uuid))
            span.set_attribute(f"retrieval.documents.{i}.document.metadata", str(document.metadata)) 
            span.set_attribute( 
                f"retrieval.documents.{i}.document.content", str(document.properties) 
            )
    
        # If the result set contains fewer than 10 products, try reducing filters to broaden the search
        importance_order = [ 'baseColour', 'masterCategory', 'usage', 'masterCategory', 'season', 'articleType', 'gender']
        if len(results.objects) < 10:
            # Iterate through the importance order of filters
            for i in range(len(importance_order)):
                with tracer.start_as_current_span(f"refilter_{i}", openinference_span_kind="chain") as refilter_span: 
                    # Create a list of filters that excludes less important ones
                    filtered_filters = [x for x in filters if x.target in importance_order[i+1:]]
                    refilter_span.set_input(str(filtered_filters))
                    
                    results = products_collection.query.near_text(query, filters=Filter.all_of(filtered_filters), limit=20)
                    # Set the retrieved documents as attributes on the span
                    for j, document in enumerate(results.objects): 
                        refilter_span.set_attribute(f"retrieval.documents.{j}.document.id", str(document.uuid))
                        refilter_span.set_attribute(f"retrieval.documents.{j}.document.metadata", str(document.metadata)) 
                        refilter_span.set_attribute( 
                            f"retrieval.documents.{j}.document.content", str(document.properties) 
                        )
                    # If sufficient products have been found, return early
                    if len(results.objects) >= 5:
                        refilter_span.set_output(results.objects)
                        refilter_span.set_status(Status(StatusCode.OK))  
                        span.set_output(results.objects)
                        span.set_status(Status(StatusCode.OK)) 
                        return results.objects, total_tokens
        span.set_output(results.objects)
        span.set_status(Status(StatusCode.OK)) 
        return results.objects, total_tokens  # Return the final set of relevant products

In [68]:
query = "Give me three T-shirts to use in sunny days"

In [69]:
t, total_tokens = get_relevant_products_from_query(query)

In [70]:
total_tokens

1527

Around 1500 tokens for this query! Let's try with the simplified version

In [71]:
t, total_tokens = get_relevant_products_from_query(query, simplified = True)

In [72]:
total_tokens

0

Note that this query took 0 tokens, as it didn't use the LLM. It directly used the query to retrieve the objects that are in the vector database.

In [73]:
# Test your solution!
unittests.test_get_relevant_products_from_query(get_relevant_products_from_query)

[91mFailed test case: Incorrect result for query = Give me three Tshirts to use in sunny days and simplified = True.
Expected: Product IDs must be {3328, 35983, 54935, 6939, 33565, 49964, 2863, 2866, 1844, 1845, 1846, 1847, 1853, 9539, 1866, 4298, 1867, 3431, 37608, 3318}
Got: Product IDs output are: {1847, 1844, 1846, 3318, 1853}




<a id='5-4'></a>

<a id='5-4'></a>
### 5.4 Generating the retrieved items as context

Now, for the given retrieved items, let's generate a simple context in the following format:

```
Product Name: Inkfruit Men's Little Bit More T-shirt. Product Category: Apparel. Product Usage: Casual. Product Gender: Men. Product Type: T-shirts. Product Category: Topwear. Product Color: Yellow. Product Season: Summer. Product Year: 2011.
```

In [74]:
@tracer.tool
def generate_items_context(results):
    """
    Compile detailed product information from a list of result objects into a formatted string.

    Parameters:
    results (list): A list of result objects, each having a `properties` attribute that is a dictionary 
                    containing product attributes such as 'product_id', 'productDisplayName', 
                    'masterCategory', 'usage', 'gender', 'articleType', 'subCategory', 
                    'baseColour', 'season', and 'year'.

    Returns:
    str: A multi-line string where each line contains the formatted details of a single product.
         Each product detail includes the product ID, name, category, usage, gender, type, color, 
         season, and year.
    """
    t = ""  # Initialize an empty string to accumulate product information

    for item in results:  # Iterate through each item in the results list
        item = item.properties  # Access the properties dictionary of the current item

        # Append formatted product details to the output string
        t += (
            f"Product ID: {item['product_id']}. "
            f"Product name: {item['productDisplayName']}. "
            f"Product Category: {item['masterCategory']}. "
            f"Product usage: {item['usage']}. "
            f"Product gender: {item['gender']}. "
            f"Product Type: {item['articleType']}. "
            f"Product Category: {item['subCategory']} "
            f"Product Color: {item['baseColour']}. "
            f"Product Season: {item['season']}. "
            f"Product Year: {item['year']}.\n"
        )

    return t  # Return the complete formatted string with product details

In [75]:
print(generate_items_context(t)[:1000])

Product ID: 1844. Product name: Inkfruit Mens D day T-shirt. Product Category: Apparel. Product usage: Casual. Product gender: Men. Product Type: Tshirts. Product Category: Topwear Product Color: Blue. Product Season: Summer. Product Year: 2011.
Product ID: 1853. Product name: Inkfruit Mens Little Bit More T-shirt. Product Category: Apparel. Product usage: Casual. Product gender: Men. Product Type: Tshirts. Product Category: Topwear Product Color: Yellow. Product Season: Summer. Product Year: 2011.
Product ID: 1847. Product name: Inkfruit Mens Messy T-shirt. Product Category: Apparel. Product usage: Casual. Product gender: Men. Product Type: Tshirts. Product Category: Topwear Product Color: Blue. Product Season: Summer. Product Year: 2011.
Product ID: 3431. Product name: Myntra Mens I Have a Helluva Attitude T-shirt. Product Category: Apparel. Product usage: Casual. Product gender: Men. Product Type: Tshirts. Product Category: Topwear Product Color: Black. Product Season: Summer. Produ

<a id='5-5'></a>
### 5.5 Query on Products

The next function will answer a product query. 

In [76]:
@tracer.tool
def query_on_products(query, simplified = False):
    """
    Execute a product query process to generate a response based on the nature of the query.

    Parameters:
    query (str): The input query string that needs to be analyzed and answered using product data.
    task_nature_prompt_function (func): The prompt function to be used to decide the task nature (if creative of technical)
    simplified (bool): If True, does not use LLM to generate metadata for filtering

    Returns:
    dict: A dictionary of keyword arguments (`kwargs`) containing the prompt and additional settings 
          for creating a response, suitable for input to an LLM or other processing system.
    int: Number of tokens used in the process to create the kwargs dictionary

    Outputs:
    str: The content of the generated response from the LLM based on the provided query and product 
         information.
    """
    total_tokens = 0
    # Determine if the query is technical or creative in nature
    
    query_label, tokens = decide_task_nature(query, simplified = simplified)
    
    # Sum the tokens used to decide the task nature (creative or technical)
    total_tokens += tokens

    # Obtain necessary parameters based on the query type
    parameters_dict = get_params_for_task(query_label)
    
    # Retrieve products that are relevant to the query
    relevant_products, tokens = get_relevant_products_from_query(query, simplified = simplified)
    
    # Sum the tokens used to get relevant products 
    total_tokens += tokens
     
    # Create a context string from the relevant products
    context = generate_items_context(relevant_products)

    # Construct a prompt including product details and the query. Remember to add the context and the query in the prompt, also, ask the LLM to provide the product ID in the answer
    PROMPT = (
        f"Given the available set of clothing products given by: "
        f"CLOTHING PRODUCTS AVAILABLE:\n{context}\n"
        f"Answer the question that follows.\n"
        f"Never use more than 5 clothing products available below to compose your answer.\n"
        f"Provide the item ID in your answers.\n"
        f"The other information might be provided but not necessarily all of them, pick only the relevant ones for the given query.\n"
        
        f"QUERY: {query}"
    )
    
    # Generate kwargs (parameters dict) for parameterized input to the LLM with , Prompt, role = 'assistant' and **parameters_dict
    kwargs = generate_params_dict(PROMPT, role='assistant', **parameters_dict) 

    
    return kwargs, total_tokens

Let's check with both the previous setup and the enhanced setup


#### Previous setup with simplified = False

In [77]:
kwargs, total_tokens = query_on_products('Make a wonderful look for a man attending a wedding party happening during night.', simplified = False)

In [78]:
result = generate_with_single_input(**kwargs)
print(result['choices'][0]['message']['content'])

To create a wonderful look for a man attending a wedding party at night, I recommend combining the following five clothing products:

1. Product ID: 12324 - Genesis Navy Formal Shirt (Product Category: Apparel, Product usage: Formal, Product gender: Men, Product Type: Shirts, Product Category: Topwear, Product Color: Navy Blue, Product Season: Winter, Product Year: 2011)
2. Product ID: 14712 - Genesis Off-White Formal Shirt (Product Category: Apparel, Product usage: Formal, Product gender: Men, Product Type: Shirts, Product Category: Topwear, Product Color: Off White, Product Season: Winter, Product Year: 2011)
3. Product ID: 12259 - Genesis Grey Formal Shirt (Product Category: Apparel, Product usage: Formal, Product gender: Men, Product Type: Shirts, Product Category: Topwear, Product Color: Grey, Product Season: Winter, Product Year: 2011)
4. Product ID: 12260 - Genesis Blue Striped Formal Shirt (Product Category: Apparel, Product usage: Formal, Product gender: Men, Product Type: Shi

Now let's sum the total tokens to generate the kwargs dictionary and the total tokens used in the final execution.

In [79]:
print(f"Total tokens used in the query is: {total_tokens + result['usage']['total_tokens']}")

Total tokens used in the query is: 3345


**New setup with <code>simplified = True</code>**

In [80]:
kwargs, total_tokens = query_on_products('Make a wonderful look for a man attending a wedding party happening during night.', simplified = True)

In [81]:
total_tokens

135

In [82]:
result = generate_with_single_input(**kwargs)
print(result['choices'][0]['message']['content'])

To create a wonderful look for a man attending a wedding party at night, I suggest combining the following three clothing products:

1. Product ID: 17378. Product name: Arrow Men Formal Purple Tie+Cufflink+Pocket square - Combo Pack.
2. Product ID: 17377. Product name: Arrow Men Formal Purple Tie+Cufflink+Pocket square - Combo Pack.
3. Product ID: 17373. Product name: Arrow Men Formal Purple Tie+Cufflink+Pocket square - Combo Pack.

These three products will complement each other perfectly for a formal look at night. The purple color will match the theme of the wedding, and the tie, cufflink, and pocket square will add a touch of elegance. The combination of these three products will create a stunning and memorable look for the man attending the wedding party.


In [83]:
print(f"Total tokens used in the query is: {total_tokens + result['usage']['total_tokens']}")

Total tokens used in the query is: 770


And the total tokens used in one query was way lower than before!

<a id='6'></a>
## 6 - The final function! 
---
<a id='6-1'></a>
### 6.1 The function to rule them all

Now let's consolidate the functions

The function will:

1. Check if the query is FAQ or Product
2. If FAQ, runs the FAQ related workflow
3. If Product, runs the Product related workflow
4. Add the information into a dataframe

It returns the kwargs dict with the appropriate arguments and the total tokens used to get to the kwargs dict.

In [84]:
@tracer.tool
def answer_query(query, model = "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",simplified=False):
    """
    Processes a user's query to determine its type (FAQ or Product) and executes the appropriate workflow.
    
    Parameters:
    - query (str): The query string provided by the user.
    - model (str): The model that will answer the question. Defaults to meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo'
    - simplified (bool): If True, uses a simplified version of the method. Defaults to False.
    
    Returns:
    - dict: A dictionary containing keyword arguments for further processing.
      If the query is neither FAQ nor Product-related, returns a default response dictionary instructing
      the assistant to answer based on existing context.
    """
    # Initialize the total tokens used to zero
    total_tokens = 0
    
    # Determine if the query is FAQ or Product and get the token count for this step
    label, tokens = check_if_faq_or_product(query, simplified=simplified)
    
    # Sum the tokens
    total_tokens += tokens
    
    # If the query is neither FAQ nor Product, return a default response
    if label not in ['FAQ', 'Product']:
        return {
            "role": "assistant",
            "prompt": (f"User provided a question that does not fit FAQ or Product-related categories. "
                       f"Answer it based on the context you already have. Query provided by the user: {query}")
        }
    
    # Process the query based on its label
    if label == 'FAQ':
        # Handle FAQ-related queries
        kwargs = query_on_faq(query, simplified=simplified)
    elif label == 'Product':
        try:
            # Handle Product-related queries, with error handling in place
            kwargs, tokens = query_on_products(query, simplified=simplified)
            # Add the tokens to the total tokens
            total_tokens += tokens
        except Exception:
            # Return an error response if an exception occurs during querying
            return {
                "role": "assistant",
                "prompt": (f"User provided a question that broke the querying system. "
                           f"Instruct them to rephrase it. Answer it based on the context you already have. "
                           f"Query provided by the user: {query}")
            }, total_tokens
    # Set the model to answer the final query - usually a better one         
    kwargs['model'] = model
    # Return the kwargs and total_tokens for further processing
    return kwargs, total_tokens

In [85]:
kwargs, total_tokens = answer_query("Give me three examples of blue t-shirts available on your catalogue.", simplified = False)

In [86]:
result = generate_with_single_input(**kwargs)
print(result['choices'][0]['message']['content'])

Here are three examples of blue t-shirts available on the catalogue:

1. Product ID: 1847. Product name: Inkfruit Mens Messy T-shirt. Product Color: Blue. Product Season: Summer. Product Year: 2011.
2. Product ID: 3103. Product name: Probase Men's Wtf Blue T-Shirt. Product Color: Navy Blue. Product Season: Summer. Product Year: 2011.
3. Product ID: 3754. Product name: Status Quo Men's Music Revolution Blue T-shirt. Product Color: Blue. Product Season: Summer. Product Year: 2011.


In [87]:
# To get the total tokens for the call, we must sum the total_tokens to get the kwargs dictionary + total tokens from the LLM call
total_tokens +  result['usage']['total_tokens']

3390

In [88]:
kwargs, total_tokens = answer_query("Give me three examples of blue t-shirts available on your catalogue.", simplified = True)

In [89]:
result = generate_with_single_input(**kwargs)
print(result['choices'][0]['message']['content'])

Based on the available clothing products, here are three examples of blue t-shirts:

1. Product ID: 1847. Product name: Inkfruit Mens Messy T-shirt. Product Color: Blue.
2. Product ID: 3995. Product name: Mr.Men Men's Thats Funny Blue T-shirt. Product Color: Blue.
3. Product ID: 3416. Product name: Myntra Mens Broken Guitar T-shirt. Product Color: Blue.


In [90]:
total_tokens +  result['usage']['total_tokens']

840

<a id='6-3'></a>

<a id='7'></a>
## 7 - The ChatBot

Now you will run the ChatBot again! There are minor changes compared to the previous version to allow result logging. There are two versions available: one with the same behavior as in the previous assignment, and another with the new behavior.

We suggest trying the following queries to compare results and token usage:

* I bought a T-shirt and I didn't like it. Can I get a refund?
* I want a look to wear to a beach party at night. It's winter, and I'm a woman.

In [91]:
chat_widget_standard = ChatWidget(generator_function = lambda x: answer_query(x, simplified = False), tracer = tracer)

VBox(children=(HTML(value=''), HBox(), HBox(children=(Text(value='', layout=Layout(width='90%'), placeholder='…

Now go to the UI to check!


In [92]:
make_url()

[1mFOLLOW THIS URL TO OPEN THE UI: http://emffpxgwdflg.labs.coursera.org[0m


Now test the simplified version!

In [None]:
chat_widget_simplified = ChatWidget(generator_function = lambda x: answer_query(x, simplified = True), tracer = tracer)

In [None]:
make_url()

Congratulations! You improved your ChatBot using RAG techniques to reduce token count and added a logging system!