### 🧠 What this code does (Imports)

- Imports all essential LangChain modules to build a RAG (Retrieval-Augmented Generation) pipeline:

| Module | Purpose |
|--------|---------|
| `PromptTemplate` | Creates structured prompts for LLMs. |
| `RetrievalQA` | Combines retriever + LLM for question answering. |
| `HuggingFaceEmbeddings` | Loads embedding models for vectorization. |
| `FAISS` | Vectorstore for fast similarity search. |
| `PyPDFLoader`, `DirectoryLoader` | Loads documents (PDFs or others) from disk. |
| `RecursiveCharacterTextSplitter` | Splits documents into manageable text chunks. |
| `CTransformers` | Loads and runs local LLMs (like LLaMA) efficiently. |

✅ Together, these enable document ingestion, embedding, retrieval, and generation.


In [None]:
from langchain import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import CTransformers
import os
import json
from langchain.schema import Document

### 📘 How to Load JSON Files

1. **Put JSON files** in a folder, e.g. `data/`.

2. **Set the folder path** in code:
```python
extracted_data = load_json("data/")


In [None]:

def load_json(data_path):
    documents = []
    for filename in os.listdir(data_path):
        if filename.endswith(".json"):
            file_path = os.path.join(data_path, filename)
            with open(file_path, "r", encoding="utf-8") as f:
                data = json.load(f)

                # Adjust this logic depending on your JSON structure
                text = data.get("content") or data.get("text") or json.dumps(data)
                documents.append(Document(page_content=text, metadata={"source": filename}))
    
    return documents

extracted_data = load_json("data/")
print(f"Loaded {len(extracted_data)} documents")


### 🔹 What this does

Splits long documents into smaller text chunks (max 500 characters with 20 overlap) for better embedding and retrieval.

### 🔧 How to tweak

- `chunk_size=500`: Increase for fewer, larger chunks (faster, but less precise).
- `chunk_overlap=20`: Increase to preserve more context between chunks (helps answer long, context-heavy queries).

Example:
```python
RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)


In [None]:
def text_split(extracted_data):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
    text_chunks = text_splitter.split_documents(extracted_data)
    return text_chunks

text_chunks = text_split(extracted_data)
print(f"Number of chunks: {len(text_chunks)}")


### 🧠 What this code does

1. **Loads a sentence embedding model** from Hugging Face (`all-MiniLM-L6-v2`) — a fast, lightweight transformer for encoding text into vectors.

2. **Encodes the query** `"Hello world"` into a dense vector using:
```python
embeddings.embed_query("Hello world")


In [None]:
def download_hugging_face_embeddings():
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    return embeddings

embeddings = download_hugging_face_embeddings()
query_result = embeddings.embed_query("Hello world")
print(f"Query Embedding Length: {len(query_result)}")


### 🧠 What this code does (FAISS Vectorstore)

- Creates a FAISS index using the text chunks and embeddings.
- Searches for the top 3 most similar chunks to your query.
- Returns those relevant documents for downstream processing.

✅ FAISS is fast, scalable, and ideal for local vector search.


In [None]:
docsearch = FAISS.from_texts([t.page_content for t in text_chunks], embeddings)

query = "What are allergies?"
docs = docsearch.similarity_search(query, k=3)
print("Top 3 Documents for the Query:", docs)


### 🧠 What this code does (Prompt Template)

- Defines how the model should structure its answer.
- Injects retrieved text into the `context`, and appends the `question`.
- Ensures clear, helpful, and controlled answers — no hallucination.

✅ You can customize the tone, verbosity, or format by editing this template.


In [None]:
prompt_template = """
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful answer below and nothing else.
Helpful answer:
"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
chain_type_kwargs = {"prompt": PROMPT}


### 🧠 What this code does (LLM Loading with CTransformers)

- Loads a local quantized `.bin` model using `CTransformers`.
- Supports low-resource environments (runs on CPU).
- `max_new_tokens` controls response length.
- `temperature` affects creativity (lower = more factual).

✅ Make sure your `.bin` model is compatible and placed in the right path.


In [None]:
llm = CTransformers(
    model="model/llama-2-7b-chat.ggmlv3.q4_0.bin",
    model_type="llama",
    config={'max_new_tokens': 512, 'temperature': 0.8}
)


### 🧠 What this code does (QA Chain Setup)

- Creates a `RetrievalQA` chain with:
  - a retriever (`FAISS`)
  - a language model (`LLaMA`)
  - a custom prompt template.
- Retrieves top 2 matching chunks.
- Returns both the answer and source documents used.

✅ Change `k` to return more/less context. Adjust `chain_type` for different behaviors.


In [None]:
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=docsearch.as_retriever(search_kwargs={'k': 2}),
    return_source_documents=True,
    chain_type_kwargs=chain_type_kwargs
)


### 🧠 What this code does (Interactive Q&A)

- Starts a loop to take user input as a query.
- Uses the QA chain to generate an answer with retrieved context.
- Displays the result and repeats until you type `'exit'`.

✅ Great for testing and running your RAG system in the terminal.


In [None]:
while True:
    try:
        user_input = input(f"\nInput your question (type 'exit' to stop): ")
        if user_input.lower() == 'exit':
            print("\nExiting chat... Goodbye!")
            break

        result = qa({"query": user_input})
        response = result["result"]

        print(f"\nUser Question: {user_input}")
        print(f"Bot Response: {response}\n")

    except KeyboardInterrupt:
        print("\nExiting chat...")
        break
