# Nestlé HR Policy Chatbot

This notebook demonstrates how to build a conversational chatbot that answers questions based on Nestlé’s HR policy document.  

The workflow consists of:
- Loading the PDF and splitting it into manageable chunks.  
- Embedding the text chunks into a vector space.  
- Storing and querying embeddings using a Chroma vector store.  
- Leveraging OpenAI’s GPT model for retrieval‑augmented question answering.  
- Creating a user‑friendly interface with Gradio.


In [None]:
# Install necessary packages (run once)
!pip install --quiet openai langchain chromadb pypdf gradio


In [None]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Set your OpenAI API key
os.environ['OPENAI_API_KEY'] = 'YOUR_OPENAI_API_KEY'

# Path to the Nestlé HR policy PDF (update the path as needed)
pdf_path = 'the_nestle_hr_policy_pdf_2012.pdf'

# Load and split the PDF into pages
loader = PyPDFLoader(pdf_path)
pages = loader.load_and_split()

# Further split the pages into smaller chunks for better retrieval
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(pages)
print(f"Loaded {len(docs)} document chunks.")


In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

# Create embeddings and build a Chroma vector database
embeddings = OpenAIEmbeddings()
vectorstore_path = './chroma_db'

# If a persistent database exists, load it; otherwise, create a new one
if os.path.exists(vectorstore_path):
    db = Chroma(persist_directory=vectorstore_path, embedding_function=embeddings)
else:
    db = Chroma.from_documents(docs, embeddings, persist_directory=vectorstore_path)
    db.persist()

retriever = db.as_retriever(search_kwargs={'k': 4})


In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

# Build the retrieval‑augmented question answering chain
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type='stuff',
    retriever=retriever,
    return_source_documents=True
)

# Define a function to answer questions
def answer_question(query: str) -> str:
    '''Return an answer to the query based on the HR policy.'''
    result = qa_chain(query)
    return result['result']

# Test the function
# print(answer_question('What is the policy on parental leave?'))


In [None]:
import gradio as gr

# Gradio chat interface
def chatbot_interface(question: str) -> str:
    return answer_question(question)

iface = gr.Interface(
    fn=chatbot_interface,
    inputs=gr.Textbox(lines=2, placeholder="Ask a question about Nestlé’s HR policy..."),
    outputs='text',
    title='Nestlé HR Policy Chatbot',
    description='Ask me about the Nestlé HR policy and I will answer your questions based on the official document.'
)

# To launch the Gradio app, uncomment the line below and run the cell
# iface.launch()


## Notes

- Replace `YOUR_OPENAI_API_KEY` with your actual OpenAI API key before running the notebook.  
- Make sure the PDF file (`the_nestle_hr_policy_pdf_2012.pdf`) is in the same directory as this notebook or update the `pdf_path` variable accordingly.  
- The vector database is saved in a folder called `chroma_db` for persistence between sessions.  
- To deploy the Gradio interface for others to use, call `iface.launch()` and follow the instructions.  
