## Purpose

This document is intended for professor to test individual intention implementations. Each test ensures that specific intents behave as expected when given predefined inputs.

## Examples of Messages to Test Each Intention

1. **Intent Name**  
   _e.g., Retrieve Company Info_

2. **Example Inputs**  
   - "Tell me about the company."  
   - "What is your mission statement?"

3. **Expected Outputs**  
   - "Our company specializes in renewable energy."  
   - "The mission is to provide eco-friendly energy solutions."

## Intention 1 - Product Information

### Example Inputs
- "I am looking for a laptop?"
- "What products do you offer?"

In [1]:
from langchain_openai import ChatOpenAI
from cobuy.chatbot.memory import MemoryManager
from langchain_core.runnables.history import RunnableWithMessageHistory

user_id = "1"
conversation_id = "1"

llm = ChatOpenAI(temperature=0.0, model="gpt-4o-mini")
memory = MemoryManager(user_id, conversation_id)

def add_memory_to_chain(original_chain):

    new_chain = RunnableWithMessageHistory(
                original_chain,
                memory.get_session_history,
                input_messages_key="customer_input",
                history_messages_key="chat_history",
                history_factory_config=memory.get_history_factory_config(),
            )
    return new_chain

In [2]:
from cobuy.chatbot.chains.product_info import ProductInfoReasoningChain, ProductInfoResponseChain

reasoning_chain = ProductInfoReasoningChain(llm)
response_chain = ProductInfoResponseChain(llm)

response_chain = add_memory_to_chain(response_chain)

In [3]:
# Step 1: Intialize a list of user inputs
user_inputs = ["What products do you offer", "I am looking for a laptop"]


for user_input in user_inputs:

    user_input = {"customer_input": user_input}

    # Step 2: Use the user input to get the output from ReasoningChain
    reasoning_output = reasoning_chain.invoke(user_input)

    # Step 3: Use the output from ReasoningChain to get the response from ResponseChain
    response = response_chain.invoke(reasoning_output, memory.get_memory_config())

    response.pretty_print()


We offer a wide range of electronics, including:

1. **Televisions** - Smart TVs, 4K, OLED, and more.
2. **Laptops and Desktops** - Various brands and specifications for personal and professional use.
3. **Smartphones and Tablets** - The latest models from top brands.
4. **Audio Equipment** - Headphones, speakers, and sound systems.
5. **Home Appliances** - Refrigerators, microwaves, and more.
6. **Gaming Consoles and Accessories** - PlayStation, Xbox, Nintendo, and gaming peripherals.
7. **Wearable Technology** - Smartwatches and fitness trackers.

Is there a specific category or product you’re interested in?

Great! We have several laptops that might fit your needs. Here are a few options:

1. **TechPro Ultrabook**
   - **Price:** $799.99
   - **Features:** 13.3-inch display, 8GB RAM, 256GB SSD, Intel Core i5 processor
   - **Description:** A sleek and lightweight ultrabook for everyday use.
   - **Rating:** 4.5

2. **BlueWave Gaming Laptop**
   - **Price:** $1,199.99
   - **Feature

In [4]:
chat_history = memory.get_session_history(user_id=user_id, conversation_id=conversation_id)

In [9]:
chat_history

InMemoryHistory(messages=[HumanMessage(content='What products do you offer', additional_kwargs={}, response_metadata={}), AIMessage(content='We offer a wide range of electronics, including:\n\n1. **Televisions** - Smart TVs, 4K, OLED, and more.\n2. **Laptops and Desktops** - Various brands and specifications for personal and professional use.\n3. **Smartphones and Tablets** - The latest models from top brands.\n4. **Audio Equipment** - Headphones, speakers, and sound systems.\n5. **Home Appliances** - Refrigerators, microwaves, and more.\n6. **Gaming Consoles and Accessories** - PlayStation, Xbox, Nintendo, and gaming peripherals.\n7. **Wearable Technology** - Smartwatches and fitness trackers.\n\nIs there a specific category or product you’re interested in?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 145, 'prompt_tokens': 109, 'total_tokens': 254, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reaso

In [120]:
from pydantic import BaseModel, Field
from typing import Optional
from typing import Literal



class ChatbotEvaluationIntention(BaseModel):
    correct_intent: bool = Field(Literal["Product Information", "Product Comparison", "Product Recommendation", "Fallback"], 
                                 description="Indicates if the predicted intent matches the correct intent, if none, it is a fallback case.")
    suggested_intent: Optional[str] = Field(
        None, 
        description="Indicates if the predicted intent matches the correct intent. None indicates a fallback case."
    )
    response_quality: int = Field(
        None,
        ge=1, 
        le=5, 
        description="Rating of the chatbot's response quality, on a scale of 1 to 5."
    )
   # context_management: int = Field(
   #     None,
   #     ge=1, 
   #     le=5, 
   #     description="Rating of the chatbot's context management, on a scale of 1 to 5."
   # )
    feedback: str = Field(None, description="Small feedback on intent recognition, response quality, or context management, with suggestions for improvement.")
    fallback_case: bool = Field(
        False, 
        description="Indicates if this instance is a fallback case where evaluation failed."
    )

In [106]:
from langchain.output_parsers import PydanticOutputParser
output_parser = PydanticOutputParser(pydantic_object=ChatbotEvaluationIntention)

In [None]:
system_template = """
You are an expert evaluator for a task-specific chatbot. Your role is to assess the chatbot's performance by analyzing:
1. Intent recognition accuracy.
2. Response quality (clarity, relevance, and helpfulness).

You will be provided with:
- The most recent user query and the chatbot's predicted intent.
- The chatbot's response to the latest user query.
- A list of possible intentions with their detailed descriptions and examples.

User Query: {user_query}
Predicted Intent: {predicted_intent}
Chatbot Response: {chatbot_response}
Possible Intentions: {intent_list}

### Your Task:
1. **Intent Recognition**: Determine if the chatbot correctly identified the intent.
2. **Response Quality**: Evaluate if the chatbot's response is appropriate, clear, and relevant.
3. **Suggestions**: Recommend corrections for the intent or response if needed.

### Evaluation Format:
{format_instructions}

### Tips for Ratings:
- **Response Quality**: 
  - 5: Fully clear, relevant, and helpful.
  - 3: Partially relevant or slightly unclear but generally acceptable.
  - 1: Off-topic, confusing, or unhelpful.


### Fallback Cases:
If the evaluation cannot proceed due to invalid input or edge cases, follow these steps:
1. Mark the case as a fallback by setting fallback_case to true.
2. Set all other fields (correct_intent, suggested_intent, response_quality, context_management) to null or None.
3. In the feedback field, explain clearly why the evaluation could not proceed. Examples include:
    - The chatbot's response is incomprehensible or nonsensical.
    - The user's intent is ambiguous or unclear.
    - The conversation lacks sufficient context to evaluate the chatbot's response.

"""


In [115]:
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate(
    template=system_template,
    input_variables=["queuser_query", "predicted_intent", "chatbot_response", "intent_list"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)



In [116]:
evaluation_chain = prompt | llm | output_parser

In [117]:
evaluation_chain.input_schema

langchain_core.utils.pydantic.PromptInput

In [119]:
evaluation_chain.invoke({"user_query": "I want a plan to create a company", "predicted_intent": "Product Information",
                          "chatbot_response": "Sure, I can help you with that. What type of company are you looking to create?",
                            "intent_list": ["Product Information", "Product Comparison", "Product Recommendation", "Fallback"]})

ChatbotEvaluationIntention(correct_intent=False, suggested_intent='Business Planning', response_quality=4, context_management=5, feedback="The predicted intent 'Product Information' does not match the user's intent of seeking a business plan. A more appropriate intent would be related to business planning or startup advice. The response is clear and relevant, asking for more details about the type of company, which helps in continuing the conversation effectively.", fallback_case=False)

In [None]:
from typing import Literal


# Define the intent classification template with chat history
intent_classification_template = """
You are an expert classifier of user intentions. Your role is to accurately identify the user's intent based on:
- The user's query.
- The context provided in the conversation history.
- A list of possible intentions with their detailed descriptions and examples.

You will be provided with:
- The user's query.
- The conversation history up to the latest interaction.
- A list of possible intentions.

Conversation History: {conversation_history}  
User Query: {user_query}  
Possible Intentions: {intent_list}  

Your task:  
1. Analyze the user's query in the context of the conversation history and classify it into one of the possible intentions from the provided list.  
2. If the query does not match any of the provided intentions, label it as 'Unclear Intent' and provide an explanation.  

Output Format:  
- Classified Intent: (The most suitable intent from the list or 'Unclear Intent')  
- Explanation: (Why this intent was chosen or why the intent is unclear, referencing the conversation history if relevant)  
"""

class IntentClassification(BaseModel):
    classified_intent: str = Field(Literal["Product Information", "Product Comparison", "Product Recommendation", "Unclear Fallback"], description="The classified intent based on the user query.")
    explanation: str = Field(None, description="Explanation of the intent classification, including references to the conversation history.")


In [None]:
from pydantic import BaseModel, Field
from typing_extensions import Literal
from langchain.output_parsers import PydanticOutputParser


class Intent(BaseModel):

    name: Literal["Product Information", "Product Comparison", "Product Recommendation", "Fallback"] = Field(..., description="The name of the intent.")

    description: str = Field(..., description="A detailed description of the intent, including examples of user queries that match this intent.")


["Product Information", "Product Comparison", "Product Recommendation", "Fallback"]



product_info_intent = Intent(""Product Information", "Get information about the products offered by the company, including features, pricing, and availability. Example queries: 'What products do you offer?' 'Can you tell me about your laptops?'")

In [None]:
class UserIntentEvaluation(BaseModel):
    # Literal for possible intents, including an extra 'fallback_case' for ambiguous cases
    identified_intent: Literal[Intent] = Field(...,  description="The intent that best matches the user query, or 'fallback_case' if the intent is unclear or not part of the predefined options.")

    # Feedback with reasoning behind the intent decision
    feedback: str = Field(..., description="A detailed explanation of why the identified intent was selected, considering the conversation history and user query.")


output_parser = PydanticOutputParser(pydantic_object=UserIntentEvaluation)

In [13]:
user_intent_evaluation_template = """
You are an expert evaluator tasked with analyzing a user's intent based on their current input and previous conversation history.

You will be provided with:
- The conversation history up to the latest interaction.
- The most recent user query.
- A list of possible intentions with their detailed descriptions and examples.

User Query: {user_query}
Conversation History: {conversation_history}
Possible Intentions: {intent_list}

### Your Task:
1. **Intent Evaluation**: 
   - Carefully review the conversation history along with the user query to determine the most likely user intent.
   - Consider the context established by the previous interactions to identify the intent behind the current query.
   - Choose the intent that best matches the user's query.

2. **Matching Intent**:
   - If the most likely intent is found in the provided list, identify it and provide the intent description.
   - If none of the listed intents match the user query, select the 'Fallback' intent.

### Evaluation Format:
{format_instructions}

"""


In [14]:
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate(
    template=user_intent_evaluation_template,
    input_variables=["user_query", "conversation_history", "intent_list"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)

In [15]:
chain = prompt | llm | output_parser

In [17]:
chain.invoke({"user_query": "I want a plan to create a company",
                          "conversation_history": chat_history,
                            "intent_list": ["Product Information", "Product Comparison", "Product Recommendation", "Fallback"]})

UserIntentEvaluation(identified_intent='Fallback', feedback="The user query 'I want a plan to create a company' does not align with the previous conversation about product offerings and laptops. The previous interactions focused on product information and recommendations, while the current query indicates a desire for business planning, which is outside the scope of the provided intents.")