## Step 1: Download and Clean the Text

In this step, we download *The Art of War* by Sun Tzu from Project Gutenberg (ID: 132) and clean it for use in our RAG pipeline.  
We remove the legal preamble and footer added by Project Gutenberg and isolate the core content starting from "I. LAYING PLANS".

📄 Output: `art_of_war_cleaned.txt`


In [None]:
# Step 1: Download and clean 'The Art of War' text
import requests
import re

url = "https://www.gutenberg.org/files/132/132-0.txt"
response = requests.get(url)
raw_text = response.text

# Remove Gutenberg header/footer
start = raw_text.find("I. LAYING PLANS")
end = raw_text.find("End of the Project Gutenberg")
cleaned_text = raw_text[start:end].strip()

# Save to file
with open("art_of_war_cleaned.txt", "w", encoding="utf-8") as f:
    f.write(cleaned_text)

print("Downloaded and saved cleaned text.")


## Step 2: Chunk the Document

To prepare the document for vector storage, we split it into smaller overlapping text chunks.

We use `RecursiveCharacterTextSplitter` from LangChain, which tries to split intelligently (e.g., at sentence or paragraph boundaries if possible).  
This ensures better semantic coherence in each chunk.

🔧 Parameters:
- `chunk_size = 500`: max characters per chunk
- `chunk_overlap = 100`: overlap between chunks to preserve context

📄 Input: `art_of_war_cleaned.txt`  
📄 Output: A list of LangChain `Document` objects


In [None]:
# Step 2: Split the document into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

# Load cleaned text
with open("art_of_war_cleaned.txt", "r", encoding="utf-8") as f:
    text = f.read()

# Split using LangChain's text splitter
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_text(text)

# Wrap chunks into LangChain Document objects
documents = [Document(page_content=chunk) for chunk in chunks]

print(f"Total chunks created: {len(documents)}")


## Step 3: Inspect First Two Chunks

Before embedding the text, we inspect the first two chunks to understand how overlapping works.

We're using `chunk_size=500` and `chunk_overlap=100`, so each chunk should share ~100 characters with the previous one.  
This helps preserve context across chunk boundaries during retrieval.

This step is useful for debugging and verifying the chunking logic.


In [None]:
# Step 3: Print first two chunks and highlight overlap
chunk1 = documents[0].page_content
chunk2 = documents[1].page_content

# Find the overlap manually (by matching last 100 chars of chunk1 with the start of chunk2)
overlap = ""
for i in range(100, 0, -1):
    if chunk1[-i:] == chunk2[:i]:
        overlap = chunk1[-i:]
        break

print("--- Chunk 1 ---")
print(chunk1)
print("\n--- Chunk 2 ---")
print(chunk2)

print("\n--- Overlap Detected ---")
print(overlap if overlap else "(No overlap found)")


## 🔍 Note: Why No Overlap Was Detected

Even though we used `chunk_overlap=100`, no text overlap was detected between Chunk 1 and Chunk 2.  
This is because `RecursiveCharacterTextSplitter` prioritizes **semantic breakpoints** like sentence or paragraph ends over strictly enforcing overlap.

It tries to split at natural language boundaries first, and only falls back to hard slicing if needed.

### 🧠 Takeaway:
- Overlap **is a guideline**, not a hard rule.
- In early chunks with clean sentence structure (like The Art of War), overlap may not be triggered.
- This behavior improves the quality of text retrieval during RAG, since chunks are more coherent.

If needed for debugging, you can switch to `CharacterTextSplitter` to force strict overlap logic.


## Step 4: Embed Chunks and Store in FAISS (via .env-configured Model)

In this step, we embed each chunk into a vector using a local embedding model and store it in a FAISS index.

We dynamically read the embedding model name from a `.env` file for flexibility.  
This lets us switch between models (e.g., `nomic-embed-text`, `bge-base-en`) without changing code.

### 🔧 How it Works:
1. Load model name from `.env` (e.g., `EMBEDDING_MODEL=nomic-embed-text`)
2. Use `OllamaEmbeddings` to embed each document chunk
3. Store all vectors in FAISS
4. Save the FAISS index to disk in `faiss_index/`

📄 Input: `List[Document]`  
📦 Output: `faiss_index/` folder with vector DB


In [None]:
# Step 4: Embed and store in FAISS using Ollama + .env config (compatible with langchain 0.3.21)
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
from dotenv import load_dotenv
import os

# Load embedding model name from .env
load_dotenv()
embedding_model_name = os.getenv("EMBEDDING_MODEL", "nomic-embed-text")

# Initialize embedding model from Ollama
embedding_model = OllamaEmbeddings(model=embedding_model_name)

# Embed and store in FAISS
vectorstore = FAISS.from_documents(documents, embedding_model)

# Save index to disk
index_dir = "faiss_index"
os.makedirs(index_dir, exist_ok=True)
vectorstore.save_local(index_dir)

print(f"FAISS index saved to '{index_dir}' using embedding model '{embedding_model_name}'")


## Step 5: Query the FAISS Index

In this step, we simulate a user query and search the FAISS index to retrieve the most relevant document chunks.

This is the **Retrieval (R)** part of RAG:
- The query is embedded using the same model as the chunks
- FAISS performs similarity search to find top-k matching chunks
- The retrieved text will later be passed to an LLM for answering

We’ll just print the top 3 chunks for now.


In [None]:
# Step 5: Query the FAISS index and print top-k results (with deserialization fix)
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings
import os

# Reload model and vector store
embedding_model = OllamaEmbeddings(model=os.getenv("EMBEDDING_MODEL", "nomic-embed-text"))
vectorstore = FAISS.load_local(
    "faiss_index",
    embeddings=embedding_model,
    allow_dangerous_deserialization=True
)

# User query
query = "What does Sun Tzu say about deception?"

# Perform similarity search
top_k = 3
results = vectorstore.similarity_search(query, k=top_k)

# Display results
for i, doc in enumerate(results, 1):
    print(f"\n--- Retrieved Chunk {i} ---\n")
    print(doc.page_content)


## Step 6: Prompt Augmentation – A in RAG

Now that we’ve retrieved relevant context, we build a prompt that can be passed to a language model.

This is the **Augmentation (A)** step:
- Combine the retrieved chunks into a single context block
- Add the user’s question
- Format everything into a clear prompt

We will **not** generate a response yet — just prepare the input to be fed to the LLM in the next step.


In [None]:
# Step 6: Build prompt from retrieved context and user question

# Concatenate the content of all retrieved chunks
context = "\n\n".join([doc.page_content for doc in results])

# Define the user query again
query = "What does Sun Tzu say about deception?"

# Build the full prompt (can be tuned later)
prompt = f"""You are a helpful assistant. Use the context below to answer the question.

Context:
{context}

Question: {query}
Answer:"""

# Print the final prompt (for inspection only)
print(prompt)


## Step 7: Generate Answer – G in RAG

In this final step of the RAG pipeline, we use the augmented prompt from Step 6 and send it to a local language model.

This is the **Generation (G)** step:
- We use the `Ollama` class directly to interact with `orca-mini`
- The prompt includes retrieved context and the user’s question
- The model responds with a grounded, natural-language answer

🧠 Model: `orca-mini` (running locally via Ollama)


In [None]:
# Step 7: Invoke the model with the prompt
from langchain_community.llms import Ollama
llm = Ollama(model="orca-mini")

# Invoke the model with our prepared prompt
response = llm.invoke(prompt)

# Print the output
print("🧠 Generated Answer:")
print(response)