Vector Database FAISS/ChromaDB/PineCone?

-  Step 1: Load sec1 and sec2 PDFs
-  Step 2: Split into Chunks + Tag with Metadata ( Page Number + sec 2/sec1 label)
-  Step 3: Combine into VectorDB

Each Query Should Return:
- Page Number + Sec 2/Sec1 source
- UI should be able to filter between sec 1 and sec 2 content.

In [24]:

import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Check if a specific variable is loaded
#print(os.getenv("OPENAI_API_KEY"))


True

In [9]:
OPENAI_API_KEY = ""
#set manually cos i cant load it properly somehow

In [10]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain.schema.runnable import RunnableLambda
from langchain.chat_models import ChatOpenAI
#from langchain.output_parsers import StrOutputParser
import os

# Load PDFs
pdf_paths = {"Sec1": "sec1.pdf", "Sec2": "sec2.pdf"}
documents = []

for label, path in pdf_paths.items():
    loader = PyPDFLoader(path)
    pages = loader.load()
    for page in pages:
        documents.append({
            "text": page.page_content,
            "metadata": {"page": page.metadata["page"], "source": label}
        })

# Chunk the documents
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500, chunk_overlap=50
)
chunks = []

for doc in documents:
    splits = text_splitter.split_text(doc["text"])
    for split in splits:
        chunks.append({
            "text": split,
            "metadata": doc["metadata"]
        })

# Generate Embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small",openai_api_key=OPENAI_API_KEY)

faiss_index = FAISS.from_texts(
    [chunk["text"] for chunk in chunks], 
    embeddings, 
    metadatas=[chunk["metadata"] for chunk in chunks]
)

# Save VectorDB
faiss_index.save_local("faiss_index")

print("Vector database created successfully!")

Vector database created successfully!


In [21]:
def answer_question_from_vector_store(vector_store, input_question):
    prompt = PromptTemplate.from_template(
        template="""
You are the Heritage Education Research Assistant, an AI-powered tool designed to help educators in Singapore create comprehensive and balanced lesson plans about Singapore's history and culture. Your task is to provide multiple perspectives on historical questions, with a focus on validated sources from the National Heritage Board (NHB) and other reputable institutions.

Given a user's question and any provided filters (student age group, historical timeframe, theme), please:

1. Generate 3-5 different perspectives on the question, each with a brief summary (2-3 sentences) explaining the reasoning behind that perspective.
2. Ensure that the language and content complexity is appropriate for the specified student age group (if provided).
3. If a specific historical timeframe or theme is specified, tailor your responses to fit within those parameters.
4. After presenting the perspectives, suggest 2-3 discussion questions that could encourage critical thinking among students about these different viewpoints.

Remember, your goal is to provide educators with balanced, well-sourced information that they can use to create engaging and thought-provoking lessons about Singapore's history and culture.

Context: {context}

Question: {question}
        """
    )

    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)
    
    retriever = vector_store.as_retriever(search_kwargs={"k": 10})
    retrieved_docs = retriever.invoke(input_question)
    
    formatted_context = format_docs(retrieved_docs)
    
    rag_chain_from_docs = (
        RunnableLambda(lambda x: {"context": x["context"], "question": x["question"]})
        | prompt
        | ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
    )

    result = rag_chain_from_docs.invoke({"context": formatted_context, "question": input_question})
    return {"answer": result.content, "context": retrieved_docs} 

# Load FAISS index
vectorstore = FAISS.load_local("faiss_index", embeddings,allow_dangerous_deserialization=True)


In [22]:
# Test query
question = "Who is the founder of Singapore?"
response = answer_question_from_vector_store(vectorstore, question)

print(response['answer'])
print()
print(f"Referenced sources: {[doc.metadata['source'] for doc in response['context']]}")


Perspective 1: Sir Stamford Raffles
Summary: Some argue that Raffles should be considered the founder of Singapore due to his role in signing the 1819 Treaty that allowed the British to establish a trading post in the region. His contributions to the early development of Singapore are significant and well-documented.

Perspective 2: William Farquhar
Summary: Others believe that Farquhar should be recognized as the founder of Singapore because of his efforts in building the settlement from scratch. Farquhar played a crucial role in the early development of Singapore alongside Raffles.

Perspective 3: John Crawfurd
Summary: Some may consider Crawfurd as the founder of Singapore because he signed the 1824 Treaty of Friendship and Alliance that gave the British control over the entire island. His diplomatic efforts and contributions to the British colonial presence in Singapore are noteworthy.

Discussion Questions:
1. How do the different perspectives on who founded Singapore reflect the 

In [23]:
print(response['answer'])
print()
print("Referenced sources:")
for doc in response['context']:
    print(f"Page {doc.metadata['page']} (Source: {doc.metadata['source']}):\n{doc.page_content}\n")


Perspective 1: Sir Stamford Raffles
Summary: Some argue that Raffles should be considered the founder of Singapore due to his role in signing the 1819 Treaty that allowed the British to establish a trading post in the region. His contributions to the early development of Singapore are significant and well-documented.

Perspective 2: William Farquhar
Summary: Others believe that Farquhar should be recognized as the founder of Singapore because of his efforts in building the settlement from scratch. Farquhar played a crucial role in the early development of Singapore alongside Raffles.

Perspective 3: John Crawfurd
Summary: Some may consider Crawfurd as the founder of Singapore because he signed the 1824 Treaty of Friendship and Alliance that gave the British control over the entire island. His diplomatic efforts and contributions to the British colonial presence in Singapore are noteworthy.

Discussion Questions:
1. How do the different perspectives on who founded Singapore reflect the 