<a href="https://colab.research.google.com/github/vandatthang/aws-cloud-for-beginner/blob/master/Maven_GuestLecture_ProblemFirstAICourse_JayeetaP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# Welcome to Building Financial AI Agents, Problem First 😀

**Please DUPLICATE the notebook before running!**

###Code for the Guest Lecture by: Jayeeta Putatunda
###Date: March 12, 2025





##Goal of this hands-on code section:


1. Build AI Solutions to handle various financial tasks

2. TWO main solutions:
  - Simple RAG based query for Doc QA
  - Deep Financial Research multi-AI Agents

##Open-Source Tech Stack:


- Fast OS LLM Inference [Groq]: https://groq.com/
- Agentic Framework [Agno]: https://www.agno.com/
- Vector Database [Weaviate]: https://weaviate.io/
- Embeddings [Sentence-transformers]: https://huggingface.co/sentence-transformers
- Websearch [DuckDuckGo]: https://github.com/duckduckgo

## Install Required Packages

In [None]:
!pip install groq yfinance agno
!pip install groq duckduckgo-search newspaper4k lxml_html_clean agno
!pip install pypdf agno
!pip install sentence-transformers
!pip install weaviate-client



In [None]:
!pip install groq agno sentence-transformers pypdf weaviate-client python-dotenv --upgrade --quiet

In [None]:
!python --version

Python 3.11.11


## Set Environment Variables

In [None]:
import os
from google.colab import userdata

os.environ['GROQ_API_KEY'] = userdata.get('GROQ_API_KEY')
os.environ['PHI_API_KEY'] = userdata.get('PHI_API_KEY') #INPUT YOUR AGNO KEY (I had this key before the rebrand)
WEAVIATE_URL = userdata.get('WEAVIATE_URL')
WEAVIATE_API_KEY = userdata.get('WEAVIATE_API_KEY')

print("API keys have been set!")


API keys have been set!


### Loading Docs in Colab Folder
We can also upload pdf's directly to the Colab and access it.


In [None]:
import os
from typing import List, Tuple, Union

os.environ['data_path'] = "/content/"
os.environ['doc_name'] = "cpcdc-commercial-lending-program-loan-policy.pdf"

print("Document loaded!")

Document loaded!


##Application 1: Simple RAG based query: With one **VectorDB**

## Step 1: Creating the Embedding Model

### Understanding Embeddings
Text embeddings are numerical representations of text that capture semantic meaning. We use them to:
- Convert text into vectors for similarity search
- Enable efficient document retrieval
- Support semantic search capabilities

The `EmbeddingModel` class below implements this functionality using the SentenceTransformer library.


In [None]:
from sentence_transformers import SentenceTransformer

# Define the embedding model class
class EmbeddingModel:
    def __init__(self):
        self.model = SentenceTransformer('sentence-transformers/paraphrase-MiniLM-L6-v2')
        self.dimensions = 384

    def get_embedding_and_usage(self, text: Union[str, List[str]]) -> Tuple[Union[List[List[float]], List[float]], dict]:
        if isinstance(text, str):
            embedding = self.model.encode(text)
            embedding_list = embedding.tolist()
            usage = {"prompt_tokens": len(text.split()), "total_tokens": len(text.split())}
            return embedding_list, usage
        else:
            embeddings = self.model.encode(text)
            embedding_list = embeddings.tolist()
            total_tokens = sum(len(t.split()) for t in text)
            usage = {"prompt_tokens": total_tokens, "total_tokens": total_tokens}
            return embedding_list, usage

    def get_embedding(self, text: Union[str, List[str]]) -> Union[List[float], List[List[float]]]:
        if isinstance(text, str):
            return self.model.encode(text).tolist()
        return self.model.encode(text).tolist()

#### Key Components:
- We use the `paraphrase-MiniLM-L6-v2` model, which provides a good balance of speed and accuracy
- The model outputs 384-dimensional vectors
- We track token usage for monitoring purposes


## Step 2: Initiatlizing the Vector database

A vector database is crucial for efficient storage and retrieval of document embeddings. We'll use Weaviate, a powerful vector search engine that enables semantic search capabilities.

### Understanding Vector Databases
- Vector databases store and index high-dimensional vectors (embeddings)
- Enable efficient similarity search
- Support complex queries and filtering
- Scale well with large document collections

### Weaviate Setup

In [None]:
from agno.vectordb.weaviate import Weaviate
from agno.vectordb.weaviate.index import Distance, VectorIndex

weaviate_vector_db = Weaviate(
          wcd_url=WEAVIATE_URL,
          wcd_api_key=WEAVIATE_API_KEY,
          vector_index=VectorIndex.HNSW, # a more complex index that is slower to build, but it scales well to large datasets as queries have a logarithmic time complexity.
          distance= Distance.COSINE,
          embedder=EmbeddingModel()
      )

#### Key Components:
1. `wcd_url`: The URL of your Weaviate instance
2. `wcd_api_key`: Authentication key for accessing Weaviate
3. `embedder`: The embedding model we created earlier

### Why Weaviate?
- Open-source and scalable
- Supports hybrid search (combination of vector and keyword search)
- Built-in filtering and aggregation capabilities
- Real-time updates and queries


## Step 3: Initiailizing the Knowledge Base

The knowledge base is where we store and manage our document collection. It handles:
- Document loading and parsing
- Text chunking and preprocessing
- Integration with the vector database


### Setting Up the Knowledge Base

In [None]:
from pathlib import Path
from agno.knowledge.pdf import PDFKnowledgeBase

# Create PDF URL knowledge base
pdf_paths=Path(os.path.join(os.environ['data_path']), os.environ['doc_name'])
print("The document used as knowledge base:",pdf_paths)
knowledge_base = PDFKnowledgeBase(
    path=pdf_paths,
    vector_db=weaviate_vector_db,
)

The document used as knowledge base: /content/cpcdc-commercial-lending-program-loan-policy.pdf


#### Components Explained:
1. `PDFKnowledgeBase`: Specialized class for handling PDF documents
2. `path`: Location of the PDF files to process
3. `vector_db`: The Weaviate instance we created earlier

### Knowledge Base Features:
- Automatic PDF parsing and text extraction
- Document chunking for optimal retrieval: by default, the document is chunked in fixed sizes of 3000 characters: https://github.com/agno-agi/agno/blob/07e870a37803163af453c626bb5bce4f22b042c0/libs/agno/agno/document/chunking/fixed.py#L7

- Metadata extraction and management
- Integration with vector search

### Best Practices:
1. Organize documents in a clear folder structure
2. Use consistent naming conventions
3. Monitor document processing for errors
4. Validate document content after loading


## Step 4: Document QA System Implementation

The `DocumentQA` class is our main implementation that combines all components into a working system. Let's understand its structure and functionality:

### Key Methods:
1. `initialize_agent_with_data`: Sets up the knowledge base with PDF documents
2. `show_sample_content`: Displays examples from the loaded documents
3. `ask`: Processes questions and generates answers using RAG

### Understanding the RAG Pipeline
1. Document ingestion and embedding
2. Vector storage in Weaviate
3. Similarity search for relevant content
4. LLM-powered response generation


In [None]:
from agno.agent import Agent
from agno.models.groq import Groq
from agno.knowledge.agent import AgentKnowledge

class DocumentQA:
    def __init__(self, embedding_model: EmbeddingModel, knowledge_base: AgentKnowledge):
        # Initialize embedder
        self.embedder = embedding_model
        # Initialize Groq model
        self.chat_model = Groq(id="llama3-8b-8192")
        self.current_knowledge_base = knowledge_base
        self.agent = None


    def initialize_agent_with_data(self):
        """Load a PDF from a URL"""
        # Initialize the Agent
        self.agent = Agent(
        knowledge=self.current_knowledge_base,
        search_knowledge=True,
        model=self.chat_model
          )

        # Load knowledge base
        print("Loading knowledge base...")
        self.current_knowledge_base.load(recreate=True)
        print("✅ Knowledge base loaded successfully!")

        # Show sample content
        #self.show_sample_content()


    def show_sample_content(self, num_samples: int = 5):
        """Show sample content from the knowledge base"""

        docs = self.current_knowledge_base.search("")
        print("\nSample documents in knowledge base:")
        print("-" * 50)
        for i, doc in enumerate(docs[:num_samples], 1):
            print(f"\nDocument {i}:")
            if hasattr(doc, 'content'):
                print(doc.content[:200] + "..." if len(doc.content) > 200 else doc.content)
            elif hasattr(doc, 'text'):
                print(doc.text[:200] + "..." if len(doc.text) > 200 else doc.text)

    def ask(self, question: str):
        """Ask a question about the loaded document"""
        if not self.current_knowledge_base or not self.agent:
            print("Please load a document first!")
            return

        self.current_knowledge_base.vector_db.get_client().connect()
        print(f"\nQ: {question}")
        try:
            # Get relevant documents
            relevant_docs = self.current_knowledge_base.search(question)

            # Build context from relevant documents
            context = "\n".join([doc.content if hasattr(doc, 'content') else doc.text
                               for doc in relevant_docs])

            # Create a prompt that includes the context
            full_prompt = f"""Based on the following content:

            {context}

            Question: {question}

            Please provide a detailed answer based ONLY on the information provided above."""

            # Get response with context
            response = self.agent.run(full_prompt)
            #print(f"\nA: {response.content}")

            return response.content

            #You can choose to return the context here for Evaluation
            #return response.content, context

        except Exception as e:
            print(f"Error: {e}")
            import traceback
            print(traceback.format_exc())



### Step 5: Initialize Agent and query Document

In [None]:
qa = DocumentQA(embedding_model=EmbeddingModel(), knowledge_base=knowledge_base)

In [None]:
qa.initialize_agent_with_data()

Loading knowledge base...


✅ Knowledge base loaded successfully!

Sample documents in knowledge base:
--------------------------------------------------

Document 1:
 6 2) Executive officers are defined as the executive director and the officers (if any) appointed by the executive director or the Board of Directors of the CDC. 3) Directors shall be defined as any ...

Document 2:
 16 XII. Portfolio Management The Portfolio Manager, with the Executive Director’s oversight, is responsible for tracking and reporting to the Board and Loan Committee the performance of the portfolio...

Document 3:
 4 I. Purpose of this policy The policies and procedures outlined in this document provide a framework within which the Citizen Potawatomi Community Development Corporation – Commercial Lending Progra...

Document 4:
 19 EXHIBITS Exhibit A. Application Form See attached Exhibit B. 1. Summary of Loan Rating System Loan Rating Total Points Loan Loss Reserve Acceptable for Approval? Monitoring Frequency Lo High A 23 ...

Docum

### Start asking questions from the document

In [None]:
qa.ask("What is the maximum loan amount?")


Q: What is the maximum loan amount?

A: According to the text, the maximum loan amount is $12,500,000, but loans above $300,000 must be reviewed and approved on a case-by-case basis by the full Board of Directors.


'According to the text, the maximum loan amount is $12,500,000, but loans above $300,000 must be reviewed and approved on a case-by-case basis by the full Board of Directors.'

In [None]:
qa.ask("Summarize the key points in this document")


Q: Summarize the key points in this document

A: Based on the provided document, the key points are as follows:

1. **Portfolio Management**: The Portfolio Manager is responsible for tracking and reporting the performance of the portfolio to the Board and Loan Committee. The Portfolio Manager presents the Loan Report and Portfolio Report at each Board and Loan Committee meeting.

2. **Loan Fund Essentials**:
	* Loan Fund Equity: The Loan Fund maintains a net asset balance of at least 20% of the sum of all borrowed funds plus net assets dedicated or restricted to lending.
	* Loan Fund Investment Guidelines: Undisbursed loan funds are deposited in FDIC-insured institutions, and interest-bearing bank accounts are sought while preserving sufficient liquidity.

3. **Loan Rating and Loan Loss Reserve**:
	* Loan Rating: The Portfolio Manager determines the necessary change, if any, to the Loan Rating and Loan Loss Reserve to reflect the borrowers' delinquency status.
	* Loan Loss Reserve: Th

"Based on the provided document, the key points are as follows:\n\n1. **Portfolio Management**: The Portfolio Manager is responsible for tracking and reporting the performance of the portfolio to the Board and Loan Committee. The Portfolio Manager presents the Loan Report and Portfolio Report at each Board and Loan Committee meeting.\n\n2. **Loan Fund Essentials**:\n\t* Loan Fund Equity: The Loan Fund maintains a net asset balance of at least 20% of the sum of all borrowed funds plus net assets dedicated or restricted to lending.\n\t* Loan Fund Investment Guidelines: Undisbursed loan funds are deposited in FDIC-insured institutions, and interest-bearing bank accounts are sought while preserving sufficient liquidity.\n\n3. **Loan Rating and Loan Loss Reserve**:\n\t* Loan Rating: The Portfolio Manager determines the necessary change, if any, to the Loan Rating and Loan Loss Reserve to reflect the borrowers' delinquency status.\n\t* Loan Loss Reserve: The Loan Loss Reserve is the necessar

##Application 2: Deep Financial Research Agent

Agent Team - Your Professional News & Finance Squad!

This example shows how to create a powerful team of AI agents working together
to provide comprehensive financial analysis and news reporting. The team consists of:
1. Web Agent: Searches and analyzes latest news
2. Finance Agent: Analyzes financial data and market trends
3. Lead Editor: Coordinates and combines insights from both agents

Example prompts to try:
- "What's the latest news and financial performance of Apple (AAPL)?"
- "Analyze the impact of AI developments on NVIDIA's stock (NVDA)"
- "How are EV manufacturers performing? Focus on Tesla (TSLA) and Rivian (RIVN)"
- "What's the market outlook for semiconductor companies like AMD and Intel?"
- "Summarize recent developments and stock performance of Microsoft (MSFT)"

In [None]:
#!pip install duckduckgo-search yfinance agno


In [None]:
from textwrap import dedent

from agno.agent import Agent
from agno.models.groq import Groq
from agno.tools.duckduckgo import DuckDuckGoTools
from agno.tools.yfinance import YFinanceTools

web_agent = Agent(
    name="Web Agent",
    role="Search the web for information",
    model=Groq(id="llama-3.1-8b-instant"),
    tools=[DuckDuckGoTools()],
    instructions=dedent("""\
        You are an experienced financial researcher and news analyst! 🔍

        Follow these steps when searching for information:
        1. Start with the most recent and relevant sources
        2. Cross-reference information from multiple sources
        3. Prioritize reputable news outlets and official sources
        4. Always cite your sources with links
        5. Focus on market-moving news and significant developments

        Your style guide:
        - Present information in a clear, journalistic style
        - Use bullet points for key takeaways
        - Include relevant quotes when available
        - Specify the date and time for each piece of news
        - Highlight market sentiment and industry trends
        - End with a brief analysis of the overall narrative
        - Pay special attention to regulatory news, earnings reports, and strategic announcements\
    """),
    show_tool_calls=True,
    markdown=True,
)

finance_agent = Agent(
    name="Finance Agent",
    role="Get financial data",
    model=Groq(id="llama-3.1-8b-instant"),
    tools=[
        YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True)
    ],
    instructions=dedent("""\
        You are a skilled financial analyst with expertise in market data! 📊

        Follow these steps when analyzing financial data:
        1. Start with the latest stock price, trading volume, and daily range
        2. Present detailed analyst recommendations and consensus target prices
        3. Include key metrics: P/E ratio, market cap, 52-week range
        4. Analyze trading patterns and volume trends
        5. Compare performance against relevant sector indices

        Your style guide:
        - Use tables for structured data presentation
        - Include clear headers for each data section
        - Add brief explanations for technical terms
        - Highlight notable changes with emojis (📈 📉)
        - Use bullet points for quick insights
        - Compare current values with historical averages
        - End with a data-driven financial outlook\
    """),
    show_tool_calls=True,
    markdown=True,
)

agent_team = Agent(
    team=[web_agent, finance_agent],
    model=Groq(id="llama-3.1-8b-instant"),
    instructions=dedent("""\
        You are the lead editor of a prestigious financial news desk! 📰

        Your role:
        1. Coordinate between the web researcher and financial analyst
        2. Combine their findings into a compelling narrative
        3. Ensure all information is properly sourced and verified
        4. Present a balanced view of both news and data
        5. Highlight key risks and opportunities

        Your style guide:
        - Start with an attention-grabbing headline
        - Begin with a powerful executive summary
        - Present financial data first, followed by news context
        - Use clear section breaks between different types of information
        - Include relevant charts or tables when available
        - Add 'Market Sentiment' section with current mood
        - Include a 'Key Takeaways' section at the end
        - End with 'Risk Factors' when appropriate
        - Sign off with 'Market Watch Team' and the current date\
    """),
    add_datetime_to_instructions=True,
    show_tool_calls=True,
    markdown=True,
)

# Example usage with diverse queries
agent_team.print_response(
    "Analyze the impact of AI developments on NVIDIA's stock (NVDA)", stream=True
)



Output()

More example prompts to try:
Advanced queries to explore:
1. "Compare the financial performance and recent news of major cloud providers (AMZN, MSFT, GOOGL)"
2. "What's the impact of recent Fed decisions on banking stocks? Focus on JPM and BAC"
3. "Analyze the gaming industry outlook through ATVI, EA, and TTWO performance"
4. "How are social media companies performing? Compare META and SNAP"
5. "What's the latest on AI chip manufacturers and their market position?"
