# RAG Investment Agent: Hybrid Search + Contextual Retrieval + Gradio UI

This notebook implements an advanced RAG (Retrieval-Augmented Generation) system optimized for **Google Colab (T4 GPU)** with **Google Drive** support:
- **Model:** [FlameF0X/lfm2](https://huggingface.co/FlameF0X/lfm2)
- **Method:** Contextual Retrieval (Anthropic) combined with Hybrid Search (FAISS + BM25)
- **Storage:** Load data from Local Colab or Google Drive
- **UI:** Gradio

## 1. Install Dependencies

In [11]:
# Install core libraries
!pip install -q transformers torch faiss-cpu rank_bm25 langchain-text-splitters sentence-transformers gradio accelerate

## 2. Google Drive Mounting (Optional)
Run this cell if you want to load data from your Google Drive.

In [12]:
USE_DRIVE = True  # Set to True to use Google Drive

if USE_DRIVE:
    from google.colab import drive
    drive.mount('/content/drive')
    print("Google Drive mounted.")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Google Drive mounted.


## 3. Configuration and Tokens

In [13]:
import os
import torch

# 1. Load HF_TOKEN
try:
    from google.colab import userdata
    hf_token = userdata.get('HF_TOKEN')
except:
    hf_token = os.environ.get('HF_TOKEN')

if hf_token:
    print("HF Token loaded successfully.")
else:
    print("HF Token not found. Please set HF_TOKEN environment variable.")

# 2. Device Configuration (Optimized for T4)
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    print(f"Found GPU: {gpu_name}")
    DEVICE = "cuda"
    TORCH_DTYPE = torch.float16
else:
    print("CUDA not available. Using CPU.")
    DEVICE = "cpu"
    TORCH_DTYPE = torch.float32

HF Token loaded successfully.
Found GPU: Tesla T4


## 4. Load Data
Set your data path below. If using Drive, it usually looks like `/content/drive/MyDrive/your_folder`.

In [14]:
import glob

# Update your folder path here
DATA_DIR = "data investment"
if USE_DRIVE:
    DATA_DIR = "/content/drive/MyDrive/Đầu tư/Investment-Management-Specialization/3. Portfolio and Risk Management/Module 4: Risk Management"

if not os.path.exists(DATA_DIR):
    print(f"⚠️ Directory {DATA_DIR} not found. Searching in current directory...")
    files = glob.glob("*.txt")
else:
    files = glob.glob(os.path.join(DATA_DIR, "*.txt"))

documents = []
for file_path in files:
    try:
        with open(file_path, "r", encoding="utf-8") as f:
            content = f.read()
            documents.append({
                "filename": os.path.basename(file_path),
                "content": content
            })
    except Exception as e:
        print(f"Error loading {file_path}: {e}")

print(f"Loaded {len(documents)} documents.")

Loaded 11 documents.


## 5. Chunking Data

In [15]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=600,
    chunk_overlap=50
)

chunks = []
for doc in documents:
    doc_chunks = text_splitter.split_text(doc["content"])
    for i, chunk_text in enumerate(doc_chunks):
        chunks.append({
            "id": f"{doc['filename']}_{i}",
            "doc_content": doc["content"],
            "chunk_text": chunk_text,
            "metadata": {"filename": doc["filename"]}
        })

print(f"Created {len(chunks)} chunks.")

Created 99 chunks.


## 6. Load Model FlameF0X/lfm2

In [16]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LiquidAI/LFM2.5-1.2B-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=hf_token)

print(f"Loading {model_name} on {DEVICE}...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=TORCH_DTYPE,
    device_map="auto",
    token=hf_token
)

def generate_text(prompt, max_new_tokens=150):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)
    return tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)

Loading LiquidAI/LFM2.5-1.2B-Thinking on cuda...


## 7. Implement Contextual Retrieval

In [17]:
DOCUMENT_SUMMARY_PROMPT = """
<document>
{whole_document}
</document>

You are an expert at financial document analysis. Please provide a brief, high-level summary (1-2 sentences) of the overall context and main topic of this document.
This summary will be used to help a search engine understand the broad context for small chunks of this document.

Answer format:
<think>
[Your reasoning here]
</think>
[Broad context summary here]
"""

print("Generating document-level summaries...")
doc_summaries = {}
for doc in documents:
    filename = doc["filename"]
    print(f"Summarizing {filename}...")
    prompt = DOCUMENT_SUMMARY_PROMPT.format(whole_document=doc["content"])
    full_response = generate_text(prompt, max_new_tokens=250)

    if "</think>" in full_response:
        summary = full_response.split("</think>")[-1].strip()
    else:
        summary = full_response.strip()
    doc_summaries[filename] = summary

print("Applying contextualization to chunks...")
for chunk in chunks:
    filename = chunk["metadata"]["filename"]
    summary = doc_summaries.get(filename, "")
    # Combine Method A (Metadata) and Method B (Summary)
    prefix = f"[File: {filename}] [Context: {summary}]"
    chunk["contextualized_text"] = f"{prefix}\n{chunk['chunk_text']}"


Generating document-level summaries...
Summarizing Defining forwards and options - Forwards.txt...
Summarizing Defining forwards and options - Options.txt...
Summarizing Risk as volatility?.txt...
Summarizing What about illiquidity? - UBS guest speaker.txt...
Summarizing Currency risk - Return.txt...
Summarizing Currency risk - Risk.txt...
Summarizing Defining the Value-at-Risk.txt...
Summarizing Computing the Value-at-Risk.txt...
Summarizing Defining the Expected Shortfall.txt...
Summarizing Computing the Expected Shortfall.txt...
Summarizing Risk management applied to portfolio allocation.txt...
Applying contextualization to chunks...


## 8. Hybrid Search Setup

In [18]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
from rank_bm25 import BM25Okapi

embed_model = SentenceTransformer("BAAI/bge-small-en-v1.5", device=DEVICE)
texts_to_embed = [c["contextualized_text"] for c in chunks]
embeddings = embed_model.encode(texts_to_embed)

dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings).astype('float32'))

tokenized_corpus = [text.split(" ") for text in texts_to_embed]
bm25 = BM25Okapi(tokenized_corpus)

def hybrid_search(query, k=5):
    query_vec = embed_model.encode([query])
    distances, indices = index.search(np.array(query_vec).astype('float32'), k * 2)
    tokenized_query = query.split(" ")
    bm25_scores = bm25.get_scores(tokenized_query)
    bm25_indices = np.argsort(bm25_scores)[::-1][:k*2]
    combined_indices = list(set(indices[0].tolist()) | set(bm25_indices.tolist()))
    return [chunks[i] for i in combined_indices[:k]]

## 9. RAG Agent Logic

In [19]:
def ask_agent(query):
    relevant_chunks = hybrid_search(query)
    context_text = "\n---\n".join([c["contextualized_text"] for c in relevant_chunks])

    prompt = f"""You are a professional Financial Investment Advisor. Answer the question based ONLY on the provided context.

### Context:
{context_text}

### Question:
{query}

### Instructions:
1. Use <think> tags to analyze the context, identify key entities, and formulate a logical plan for the answer.
2. Provide a clear, professional answer after the </think> tag.
3. If the context does not contain enough information, state that you don't have enough information instead of making things up.

Answer:"""

    # Increase max_new_tokens for reasoning + answer
    response = generate_text(prompt, max_new_tokens=600)
    return response

## 10. Gradio UI

In [20]:
import gradio as gr

def chatbot_interface(message, history):
    return ask_agent(message)

demo = gr.ChatInterface(
    chatbot_interface,
    title="Investment RAG Agent (FlameF0X/lfm2)",
    description="Advanced RAG Agent with Google Drive support and FlameF0X/lfm2 model. Ask me about Currency Risk, Portfolio Allocation, or Derivatives.",
    examples=[
        "What is Value-at-Risk and how to compute it?",
        "Summarize Currency risk return.",
        "What is Expected Shortfall?"
    ]
)

if __name__ == "__main__":
    demo.launch(share=True)

  self.chatbot = Chatbot(


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://69695b9f0243be2870.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
