Steps in this notebook:
1. Indexing: We will perform embeddings to create two vectorstores from two kinds of data sources. First is a couple of pairs of customer query and corresponding response to that query. This provides few shot examples in the prompt for the AI system to generate the reply to the customer. The second is product information from a knowledge base which in this case is stored in a csv file. A self-querying retriever is used as product information is stored in the Document metadata. The product information retrieved will be used as context in the LLM prompt.
2. Construct the chains with LangChain Expression Language: As there are several tasks broken down into steps, there is a need to make a few LLM calls in a certain sequence. The steps are implemented in subchains and then put together in a final overall chain.
3. Evaluation: Test the final chain with various test cases.

In [1]:
from operator import itemgetter

from dotenv import load_dotenv

from langchain.chains.query_constructor.base import (
    AttributeInfo, 
    StructuredQueryOutputParser, 
    get_query_constructor_prompt
)
from langchain.prompts import PromptTemplate
from langchain.retrievers.self_query.base import SelfQueryRetriever, ChromaTranslator
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.runnables import (
    RunnablePassthrough, 
    RunnableBranch, 
    RunnableLambda, 
    RunnableParallel
)
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

import pandas as pd

load_dotenv()

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

### 1. Indexing

In [2]:
embedding = OpenAIEmbeddings(model='text-embedding-ada-002')

#### Create Few shot examples retriever

In [3]:
# Examples of pairs of customer query and corresponding response
few_shot_docs = [
    Document(
        page_content='My order was damaged during shipping.', 
        metadata={'reply': "We're sorry to hear that your order was damaged during shipping. We would be happy to send you a replacement order. Our customer service team will be in touch with you in 3 working days."}
    ),
    Document(
        page_content='What are your shipping options for international orders?', 
        metadata={'reply': "We offer various shipping options for international orders to cater to our customers worldwide. You can select your preferred shipping method during the checkout process. Our system will provide you with estimated delivery times and costs based on your location."}
    ),
    Document(
        page_content="I'm extremely disappointed with the delay in the delivery of my order. It's been two weeks since I placed the order, and I still haven't received it. Can you provide an update on its status?", 
        metadata={'reply': "We sincerely apologize for the delay in delivering your order. We understand the importance of receiving your books in a timely manner and regret any inconvenience this may have caused. Our team is actively working to resolve the issue and expedite the delivery process. We'll provide you with a detailed update on the status of your order shortly. Thank you for your patience and understanding."}
    ),
    Document(
        page_content="I recently received my order from your bookstore, and I'm very impressed with the packaging and the condition of the books. Thank you for the great service!", 
        metadata={'reply': "Thank you so much for your kind words! We're thrilled to hear that you're satisfied with your recent order. Providing excellent service and ensuring your books arrive in perfect condition is always our priority."}
    ),
    Document(
        page_content="Hi, I'm interested in purchasing the latest bestseller by [Author's Name]. Can you confirm if it's available?", 
        metadata={'reply': "Thank you for your interest in our bookstore. Yes, we do carry books by [Author's Name]. We have [Book Title] rating: 4.6 stars Price: $26.50 and [Book Title] rating: 4.4 stars Price: $18.40 "}
    ),
    Document(
        page_content='Do you have the book Atomic Habits by James Clear?', 
        metadata={'reply': 'Thank you for your interest! Yes, we do have Atomic Habits by James Clear. It is rated 4.8 stars and is retailing for $25.99.'}
    ),
]

# Create vector database and save to disk
vectordb_fewshot = Chroma.from_documents(
    persist_directory="vectordb/few_shot",
    embedding=embedding,
    documents=few_shot_docs
)
vectordb_fewshot.persist()

In [4]:
# Read vectordb_fewshot
vectordb_fewshot = Chroma(
    persist_directory="vectordb/few_shot",
    embedding_function=embedding
)

# Create retriever with return only 1 value.
fewshot_retriever = vectordb_fewshot.as_retriever(search_kwargs={"k": 1})

#### Create self query retriever

In [5]:
# Read Product information as dataframe from csv
df = pd.read_csv(r"csv_data_source/products_listing.csv")
df['release_date'] = pd.to_datetime(df['release_date'], format="%d-%b-%y")
df['year'] = df['release_date'].dt.year
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   product_ID        8 non-null      int64         
 1   book_name         8 non-null      object        
 2   author            8 non-null      object        
 3   genre             8 non-null      object        
 4   rating            8 non-null      float64       
 5   price             8 non-null      float64       
 6   release_date      8 non-null      datetime64[ns]
 7   book_description  8 non-null      object        
 8   year              8 non-null      int64         
dtypes: datetime64[ns](1), float64(2), int64(2), object(4)
memory usage: 708.0+ bytes


In [6]:
# Convert dataframe to list of Documents
def load_df_as_documents(df: pd.DataFrame) -> list:
    return [
        Document(
            page_content= row['book_description'],
            metadata={
                "title": row['book_name'],
                "author": row['author'],
                "genre": row['genre'],
                "rating": row['rating'],
                "price": row['price'],
                "year": row['year']
            }
        )
        for index, row in df.iterrows()
    ]

docs = load_df_as_documents(df)

# Create vector database
persist_directory = './vectordb/product'
vectordb_product = Chroma.from_documents(
    documents=docs,
    embedding=embedding,
    persist_directory=persist_directory
)
vectordb_product.persist()

In [7]:
# Read from vectordb
vectordb_product = Chroma(
    persist_directory='./vectordb/product',
    embedding_function=embedding
)

# Describe metadata, document content and provide few shot examples for query constructor
metadata_field_info = [
    AttributeInfo(
        name='title',
        description="The book title",
        type="string",
    ),
    AttributeInfo(
        name='author',
        description="The author of the book",
        type="string",
    ),
    AttributeInfo(
        name='genre',
        description="The genre of the book. Valid values are: ['self-help', 'health', 'history', 'biography', 'science']",
        type="string",
    ),
    AttributeInfo(
        name='rating', 
        description="An average 0-5 rating by reviewers on the online book store", 
        type="float"
    ),
    AttributeInfo(
        name='price', 
        description="The price of the book in dollars", 
        type="float"
    ),
    AttributeInfo(
        name='year', 
        description="The year the book is published.", 
        type="integer"
    )
]

document_content_description = "Detailed book description"

examples = [
    (
        "Tell me more about the book Atomic Habits.",
        {
            "query": "Atomic Habits",
            "filter": '',
        },
    ),
    (
        "what books on the self-help genre do you have and have a rating of at least 4.7?",
        {
            "query": "",
            "filter": 'and(eq("genre", "self-help"), gte("rating", 4.7))',
        },
    ),
    (
        "what is the price of the book on human history by Yuval Noah Harari?",
        {
            "query": "human history",
            "filter": 'eq("author", "Yuval Noah Harari")',
        },
    ),
    (
        "I am a big fan of the author Walter Isaacson. I want to know more about the book he wrote on Benjamin Franklin.",
        {
            "query": "Benjamin Franklin",
            "filter": 'eq("author", "Walter Isaacson")',
        },
    ),
]

# Creates the prompt to query the vectorstore, customisable through provided parameters
prompt = get_query_constructor_prompt(
    document_contents=document_content_description,
    attribute_info=metadata_field_info,
    examples=examples
)

output_parser = StructuredQueryOutputParser.from_components()

query_constructor = prompt | llm | output_parser

selfqueryretriever = SelfQueryRetriever(
    query_constructor=query_constructor,
    vectorstore=vectordb_product,
    structured_query_translator=ChromaTranslator(),
)

#### 2. Construct LCEL Chains

In [8]:
json_parser = JsonOutputParser()
str_parser = StrOutputParser()

#### Language chain
Classify language from the customer feedback / complaint / query.
If language is not in English, translate to English.

In [9]:
language_classification_template = """Given an input text, classify the language. Return only a single word. \
Example output: Korean
Example output: English

Input: {input}"""

language_classification_prompt = PromptTemplate(
    template=language_classification_template,
    input_variables=["input"]
)

language_classification_chain = language_classification_prompt | llm | str_parser

In [10]:
language_translate_template = """Given an input text in {language}, translate to English. Keep the tone of the text the \
same in your translation.
Input: {input}"""

language_translate_prompt = PromptTemplate(
    template=language_translate_template,
    input_variables=["input", "language"]
)

language_translate_chain = language_translate_prompt | llm | str_parser

In [11]:
language_chain = (
    RunnablePassthrough.assign(
        # Classify the input text from the customer
        language = language_classification_chain
    ) 
    | RunnablePassthrough.assign(
        en_query = RunnableBranch(
            # if English, return the input text as is
            (lambda x: x['language'] == "English", lambda x: x['input']),
            # Else if not English, translate the input text to English
            language_translate_chain
        )
    )
)

#### Text Classification Chain
Perform classification tasks – sentiment analysis and subject classification

In [12]:
text_classification_template = """Given a customer query or feedback for an e-commerce book store, categorise the sentiment \
and subject of the customer input. Sentiment should be one of three categories: [positive, negative, neutral]. \
Subject should be one of the following categories: [billing, product, delivery, service, website, other]. \
Here is a description for categories:
Billing: Relates to invoicing and payments
Product: Relates to the books the e-commerce book store is selling
Delivery: Relates to shipping of the products to the customer
Service: Relates to the customer service experience.
Website: Relates to the website of the e-commerce book store
Use other if you are unable to classify the subject.

Return your answer in JSON format with keys "sentiment" and "subject". 
Example output:
{{"sentiment": "neutral", "subject": "product"}}

Input from customer: {en_query}
"""

text_classification_prompt = PromptTemplate(
    template=text_classification_template,
    input_variables=["en_query"]
)

text_classification_chain = text_classification_prompt | llm | json_parser

#### Example retriever chain
Use the retriever tool to obtain few shot examples.

In [13]:
def dict_parser(documents: list[Document]) -> dict:
    doc = documents[0]
    return {"question": doc.page_content, "answer": doc.metadata['reply']}

example_retriever_chain = itemgetter("en_query") | fewshot_retriever | RunnableLambda(dict_parser)

#### Product chain
If subject is on product and if product information is required to answer the question, use self query retriever tool.

In [14]:
product_template = """You are a helpful customer service assistant for an e-commerce book store. Given a customer 
query or feedback that is on the subject of product (i.e. books), determine if the customer is asking for information 
about books or not. Return your answer in JSON output format with keys "query_product_info" and "query_to_knowledge_base".

If the customer is not asking for information about books, return "query_product_info" as false and 
"query_to_knowledge_base" as an empty string.

If the customer is asking for information about books, return "query_product_info" as true. Next, construct a query that 
will be used to search a knowledge base for the required information on books. The query needs to be specific in 
specifying the metadata of the knowledge base. Product information can be searched for using the following 
metadata: [title, author, genre, rating, price, year]. Output the query in "query_to_knowledge_base" key.

Example 1:

Input from customer: What books do you have on biographies out in 2023?

Answer in JSON:
{{"query_product_info": true, "query_to_knowledge_base": "What are books that have a genre "biography" and published in the year '2023'?"}}

Example 2:

Input from customer: "I am happy with the book I bought last week. Finished reading it today and it was great!"

Answer in JSON:
{{"query_product_info": false, "query_to_knowledge_base": ""}}

Example 3:

Input from customer: "I am a big fan of [author_name]. What books do you have written by him?"

Answer in JSON:
{{"query_product_info": true, "query_to_knowledge_base": "What are books that are written by the author '[author_name]'?"}}

Input from customer: {en_query}

Answer in JSON:
"""

product_prompt = PromptTemplate(
    template=product_template,
    input_variables=["en_query"]
)

product_chain = (
    product_prompt 
    | llm 
    | json_parser
    | RunnableBranch(
        (lambda x: x["query_product_info"], itemgetter("query_to_knowledge_base") | selfqueryretriever), 
        lambda x: ""
    )
)

#### Reply contructor chain
Construct the prompt with the few shot examples and product context, and prompt the LLM to construct the response to answer the customer’s question 

In [15]:
reply_constructor_template = """You are a helpful customer service assistant for an e-commerce book store. Given a customer \
query or feedback, draft a concise reply to the customer. You are provided with context and an example response to help you \
draft the reply. You may not need to rely on the example if you find it not relevant to the customer input.

Context: {context}

Example: {example}

Input from customer: {en_query}

Reply:
"""

reply_constructor_prompt = PromptTemplate(
    template=reply_constructor_template,
    input_variables=["en_query", "example", "context"]
)

reply_constructor_chain = reply_constructor_prompt | llm | str_parser

#### Translation chain
Translate back to customer's original language if necessary.

In [16]:
translation_template = """Translate the following text in English into {language}, keeping the tone the same.
Text: {en_reply}
"""

translation_prompt = PromptTemplate(
    template=translation_template,
    input_variables=["en_reply", "language"]
)

translation_chain = RunnableBranch(
        (lambda x: x["language"] == "English", itemgetter("en_reply")),
        translation_prompt | llm | str_parser
    )

#### Final chain

In [17]:
def flatten(inp: dict) -> dict:
    """Flatten the dictionary. """
    output = {}
    for key, value in inp.items():
        if key == "classification_response":
            for k, v in value.items():
                output[k] = v
        else:
            output[key] = value
    return output

In [18]:
final_chain = (
    
    # Classify and translate customer input
    language_chain 
    
    # Run both text_classification_chain and example_retriever_chain in parallel
    | RunnableParallel({
        "input": itemgetter("input"),
        "language": itemgetter("language"),
        "en_query": itemgetter("en_query"),
        "classification_response": text_classification_chain,
        "example": example_retriever_chain
    }) 
    
    # Flatten classification_response
    | RunnableLambda(flatten) 
    
    # Route to the appropriate chain based on subject
    | RunnablePassthrough.assign(
        context = RunnableBranch(
            (lambda x: x["subject"] == "product", product_chain),
            lambda x: ""
        )
    )
    
    # Construct the reply using few shot example and context
    | RunnablePassthrough.assign(
        en_reply = reply_constructor_chain
    )
    
    # Translate to original language as response to the customer
    | RunnablePassthrough.assign(
        output = translation_chain
    )
)

#### 3. Evaluation: Test cases

In [19]:
questions = [
    # English, not about product
    "What are your shipping options for international orders?",
    
    # non-English, about product, no need product info
    "Il libro che ho comprato ieri è arrivato in perfette condizioni. Consigliato!",
    
    # non-English, about product, need product info
    "What books do you have on self-help rated at least 4.7 stars?",
    
    # No few shot example available for this query (irrelevant example)
    "I am billed an incorrect amount for my purchase. Can I get help to have the bill corrected?",
    
    # Query about a product that does not exist and irrelevant documents from the retriever (irrelevant context)
    "What is the rating and selling price of the book ‘Holly’ by Stephen King?"
]

In [22]:
final_chain.invoke({'input': questions[0]})

{'input': 'What are your shipping options for international orders?',
 'language': 'English',
 'en_query': 'What are your shipping options for international orders?',
 'sentiment': 'neutral',
 'subject': 'delivery',
 'example': {'question': 'What are your shipping options for international orders?',
  'answer': 'We offer various shipping options for international orders to cater to our customers worldwide. You can select your preferred shipping method during the checkout process. Our system will provide you with estimated delivery times and costs based on your location.'},
 'context': '',
 'en_reply': 'We offer various shipping options for international orders to cater to our customers worldwide. You can select your preferred shipping method during the checkout process. Our system will provide you with estimated delivery times and costs based on your location.',
 'output': 'We offer various shipping options for international orders to cater to our customers worldwide. You can select yo

In [23]:
final_chain.invoke({'input': questions[1]})

{'input': 'Il libro che ho comprato ieri è arrivato in perfette condizioni. Consigliato!',
 'language': 'Italian',
 'en_query': 'Output: The book I bought yesterday arrived in perfect condition. Highly recommended!',
 'sentiment': 'positive',
 'subject': 'delivery',
 'example': {'question': "I recently received my order from your bookstore, and I'm very impressed with the packaging and the condition of the books. Thank you for the great service!",
  'answer': "Thank you so much for your kind words! We're thrilled to hear that you're satisfied with your recent order. Providing excellent service and ensuring your books arrive in perfect condition is always our priority."},
 'context': '',
 'en_reply': "Thank you for your feedback! We're delighted to hear that the book you purchased arrived in perfect condition. We appreciate your recommendation and hope you enjoy your new read!",
 'output': 'Grazie per il tuo feedback! Siamo felici di sapere che il libro che hai acquistato è arrivato in 

In [24]:
final_chain.invoke({'input': questions[2]})

{'input': 'What books do you have on self-help rated at least 4.7 stars?',
 'language': 'English',
 'en_query': 'What books do you have on self-help rated at least 4.7 stars?',
 'sentiment': 'neutral',
 'subject': 'product',
 'example': {'question': 'Do you have the book Atomic Habits by James Clear?',
  'answer': 'Thank you for your interest! Yes, we do have Atomic Habits by James Clear. It is rated 4.8 stars and is retailing for $25.99.'},
 'context': [Document(page_content="#1 NATIONAL BESTSELLER \n\n#1 INTERNATIONAL BESTSELLER\n\nWhat does everyone in the modern world need to know?\n\nRenowned psychologist Jordan B. Peterson's answer to this most difficult of questions uniquely combines the hard-won truths of ancient tradition with the stunning revelations of cutting-edge scientific research.\n\nHumorous, surprising and informative, Dr. Peterson tells us why skateboarding boys and girls must be left alone, what terrible fate awaits those who criticize too easily, and why you should

In [20]:
final_chain.invoke({'input': questions[3]})

{'input': 'I am billed an incorrect amount for my purchase. Can I get help to have the bill corrected?',
 'language': 'English',
 'en_query': 'I am billed an incorrect amount for my purchase. Can I get help to have the bill corrected?',
 'sentiment': 'negative',
 'subject': 'billing',
 'example': {'question': 'My order was damaged during shipping.',
  'answer': "We're sorry to hear that your order was damaged during shipping. We would be happy to send you a replacement order. Our customer service team will be in touch with you in 3 working days."},
 'context': '',
 'en_reply': 'We apologize for the billing error on your purchase. Please provide us with your order details and the correct amount you should have been billed. Our customer service team will investigate this issue and work to correct the billing error promptly. Thank you for bringing this to our attention.',
 'output': 'We apologize for the billing error on your purchase. Please provide us with your order details and the cor

In [21]:
final_chain.invoke({'input': questions[4]})

{'input': 'What is the rating and selling price of the book ‘Holly’ by Stephen King?',
 'language': 'English',
 'en_query': 'What is the rating and selling price of the book ‘Holly’ by Stephen King?',
 'sentiment': 'neutral',
 'subject': 'product',
 'example': {'question': "Hi, I'm interested in purchasing the latest bestseller by [Author's Name]. Can you confirm if it's available?",
  'answer': "Thank you for your interest in our bookstore. Yes, we do carry books by [Author's Name]. We have [Book Title] rating: 4.6 stars Price: $26.50 and [Book Title] rating: 4.4 stars Price: $18.40 "},
 'context': [Document(page_content='#1 New York Times bestseller\n\nFrom the author of Steve Jobs and other bestselling biographies, this is the astonishingly intimate story of the most fascinating and controversial innovator of our era—a rule-breaking visionary who helped to lead the world into the era of electric vehicles, private space exploration, and artificial intelligence. Oh, and took over Twit