Interested in developing a customized chatbot tailored to your business. This article offers a solution to address exactly those challenges! A restaurant chatbot serves various purposes, enhancing customer experience and streamlining operations. Firstly, it facilitates reservations, allowing users to book tables conveniently. Secondly, it provides menu information, catering to dietary preferences and enabling informed ordering. Additionally, it offers assistance with frequently asked questions about operating hours, location, and special offers, reducing the burden on staff. Moreover, it can handle feedback and complaints promptly, improving customer satisfaction and retention. Overall, integrating a restaurant chatbot optimizes service delivery, increases efficiency, and enhances overall customer engagement.

We would be leveraging the power of LLMs. Google recently launched Gemma and we going to use it. Gemma, a series of lightweight and advanced open models developed by Google, shares the research and technology foundation with the Gemini models. These models, which are decoder-only large language models designed for text-to-text tasks, come in English variants with open weights, pre-trained versions, and instruction-tuned adaptations. Gemma models are adept at various text generation tasks like question answering, summarization, and reasoning. Their compact size enables deployment in resource-constrained environments such as laptops, desktops, or personal cloud infrastructure, thus democratizing access to cutting-edge AI models and promoting innovation for all.


Achieving the capability for a machine to fully comprehend the various ways humans might inquire about something and respond in natural language akin to a human represents, in my view, a comprehensive goal within the realm of Natural Language Processing (NLP).

We will use langchain as Orchestration tools and chromadb as the vector database

Langchain is a versatile natural language processing (NLP) framework designed for building conversational AI systems. It offers a range of tools and components to facilitate the development of language-based applications. With its modular architecture, Langchain enables developers to construct customized pipelines for tasks such as text generation, sentiment analysis, and named entity recognition. Its flexibility allows integration with various machine learning models and libraries, providing scalability and adaptability to diverse projects. Langchain aims to simplify NLP development by providing pre-built modules and a user-friendly interface, empowering developers to create sophisticated language-based applications efficiently.

ChromaDB for LLM usage refers to integrating ChromaDB, a versatile database management system, with Large Language Models (LLMs) for various applications beyond genomics. This integration enables users to leverage LLMs for tasks such as natural language understanding, text generation, and information retrieval using data stored in ChromaDB. By combining ChromaDB's efficient data storage and retrieval capabilities with the language processing power of LLMs, users can develop sophisticated AI applications for diverse domains, including customer service, content recommendation, and data analysis. This integration streamlines the development process and enhances the functionality of AI systems by providing access to structured data stored in ChromaDB for training and inference tasks performed by LLMs. Overall, ChromaDB's integration with LLMs offers a versatile solution for building intelligent applications across various industries and use cases.

Let's start with installing the necessary libraries.

In [None]:
!pip install langchain
!pip install chromadb

Collecting langchain
  Downloading langchain-0.1.9-py3-none-any.whl (816 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/817.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m245.8/817.0 kB[0m [31m7.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.0/817.0 kB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.21 (from langchain)
  Downloading langchain_community-0.0.24-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m42.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.2,>=0.1.26 (from langchain)
  Download

The data we are going to use is a menu of a restaurant which is in JSON format. Thus, we will first load the json file and extract the individual dishes and store them in .txt file

In [None]:
import json

json_file = "menu.json"
with open(json_file, "r") as f:
    json_data = json.load(f)

In [None]:
import os
count = 0
folder_path = "/content/Data"

for dish in json_data:
  file_path = os.path.join(folder_path,"{}.txt".format(count))
  f = open(file_path, "w")
  for key, value in dish.items():
      f.write(f"{key}: {value}\n")
  f.close()
  count+=1

Once the .txt files are created, the next task is to load the data using langchain Loaders. All the files are loaded and stored in the *docs*

In [None]:
from langchain_community.document_loaders import TextLoader

loaders = []

for i in range(12):
  file_path = os.path.join(folder_path,"{}.txt".format(i))
  loaders.append(TextLoader(file_path))

docs = []
for loader in loaders:
    docs.extend(loader.load())

Next task is to create the Vector Database. We would create the embeddings using the Hugging Face Inference API Embeddings function and store them in the Vector Database

In [None]:
from langchain.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings

inference_api_key = "WRITE_YOUR_API_KEY"

embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=inference_api_key, model_name="sentence-transformers/all-mpnet-base-v2"
)

vectordb = Chroma.from_documents(
    documents=docs,
    embedding=embeddings
)

It's time now to invoke the llm. We would be using gemma-2b-it which was launched by google. We set temperature to 0.1 so that we get outputs that are more direct, less creative, and expected. We don't want our chatbot to give unnecessary info. It will use the top 5 results for a given query and then refine it according to the prompt

In [None]:
from langchain_community.llms import HuggingFaceHub

llm = HuggingFaceHub(
    repo_id="google/gemma-2b-it",
    task="text-generation",
    model_kwargs={
        "max_new_tokens": 512,
        "top_k": 5,
        "temperature": 0.1,
        "repetition_penalty": 1.03,
    },
    huggingfacehub_api_token = "WRITE_YOUR_API_KEY"
)

  warn_deprecated(


In the prompt, we mention that it is for a restaurant and give some information about are dataset.

In [None]:
from langchain.prompts import PromptTemplate

template = """You are a Chatbot at a Restaurant. Help the customer pick the right dish to order. The items in the context are dishes. The field below the item is the cost of the dish. About is the description of the dish. Use the context below to answe the questions
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)


As this is a chatbot, we want the chatbot to remember the previous conversation. Thus, we initialize the Memory function.

In [None]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

We define the retriever to fetch the data from the database. We also initalise the Conversational RAG chain with parameters of llm, retriver and memory

In [None]:
from langchain.chains import ConversationalRetrievalChain

retriever = vectordb.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory,
)

We imports necessary modules and defines a prompt for contextualizing user questions based on chat history. It utilizes langchain_core components to formulate standalone questions. Finally, it applies a language model (llm) and a string output parser to process the contextualized questions.

In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)
contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()

This function `contextualized_question` checks if `chat_history` is present in the input dictionary. If it is, it returns the contextualize_q_chain, otherwise, it returns the value of the "question" key. The `rag_chain` incorporates this function along with a retriever and a language model to process a series of operations.

In [None]:
def contextualized_question(input: dict):
    if input.get("chat_history"):
        return contextualize_q_chain
    else:
        return input["question"]


rag_chain = (
    RunnablePassthrough.assign(
        context=contextualized_question | retriever
    )
    | QA_CHAIN_PROMPT
    | llm
)

We initializes a conversation loop where the user can input questions until they type "exit." Each question is processed by the 'rag_chain' function, which combines various operations to generate an AI response. The AI response is printed, and the conversation history is updated with the user question and the AI response.

In [None]:
import wandb
wandb.login(key='WRITE_YOUR_API_KEY')

Collecting wandb
  Downloading wandb-0.16.3-py3-none-any.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m29.0 MB/s[0m eta [36m0:00:00[0m
Collecting GitPython!=3.1.29,>=1.0.0 (from wandb)
  Downloading GitPython-3.1.42-py3-none-any.whl (195 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m195.4/195.4 kB[0m [31m18.4 MB/s[0m eta [36m0:00:00[0m
Collecting sentry-sdk>=1.0.0 (from wandb)
  Downloading sentry_sdk-1.40.5-py2.py3-none-any.whl (258 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.5/258.5 kB[0m [31m27.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting docker-pycreds>=0.4.0 (from wandb)
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting setproctitle (from wandb)
  Downloading setproctitle-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30 kB)
Collecting gitdb<5,>=4.0.1 (from GitPython!=3.1.29,>=1.0.0->w

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [None]:
from langchain_core.messages import AIMessage, HumanMessage

os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
os.environ["WANDB_PROJECT"] = "Restaurant_ChatBot"

print("Welcome to the Restaurant. How can I help you today?")
chat_history = []

def predict(message, history):
  ai_msg = rag_chain.invoke({"question": message, "chat_history": chat_history})
  idx = ai_msg.find("Answer")
  chat_history.extend([HumanMessage(content=message), ai_msg])

  return ai_msg[idx:]

Welcome to the Restaurant. How can I help you today?


And that’s how we build a simple LLM chatbot with a very limited amount of data!


In [None]:
import gradio as gr

gr.ChatInterface(predict).launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://a8b026c1b3c18915f2.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


