# Project Report Vincent Sebastian  Financial Assistant

---

## ðŸŽ¯ Executive Summary

This capstone project presents an integrated AI-powered financial assistant application that combines conversational AI capabilities with document analysis features. The application leverages Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) to provide intelligent financial insights and document Q&A capabilities through an intuitive Streamlit interface.

**Key Features:**
- ðŸ’¬ **Financial Chatbot**: Intelligent conversational agent for financial queries
- ðŸ“„ **Document Assistant**: PDF analysis with summarization and Q&A capabilities

## 1. Problem Statement & Use Case

My project is about having a reliable financial assistant that will help you save much time just to analyze one company, so when AI start to analyze you can rest or do something else, and not only one, it's more than one company so you will have a bigger picture of the sectors and many more.

## 2. Objective of the Application
The Primary objective is to having a conversational with reliable financial assitant not only getting information about the company but can understand the provided financial file

## 3. LLM Usage Strategy

For my project i'm using ChatGroq with `llama-3.3-70b-versatile` model dan low temperature, here my integration code

In [1]:
import json
import requests
from datetime import datetime
import streamlit as st

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_groq import ChatGroq
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import StreamlitChatMessageHistory

SECTORS_API_KEY = st.secrets["SECTORS_API_KEY"]
GROQ_API_KEY = st.secrets["GROQ_API_KEY"]


def retrieve_from_endpoint(url: str) -> dict:
    """
    A robust, reusable helper function to perform GET requests.
    """
    
    headers = {"Authorization": SECTORS_API_KEY}

    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()

        data = response.json()

        return data

    except requests.exceptions.HTTPError as err:
        return {
            "error": f"HTTPError {err.response.status_code} - {err.response.reason}",
            "url": url,
            "detail": err.response.text
        }
    
    except Exception as e:
        return {
            "error": f"Unexpected error: {type(e).__name__} - {str(e)}",
            "url": url
        }


@tool
def get_company_overview(stock: str) -> dict:
    """
    Get company overview
    
    @param stock: The stock symbol of the company
    @return: The company overview
    """

    url = f"https://api.sectors.app/v1/company/report/{stock}/?sections=overview"

    return retrieve_from_endpoint(url)


@tool
def get_company_revenue_cost_segments(stock : str) -> dict :
    """
    Return revenue and cost segments of a given stock.

    @param stock: The stock symbol of the company
    @return: The company revenue and cost segments
    """

    url = f"https://api.sectors.app/v1/company/get-segments/{stock}/"

    return retrieve_from_endpoint(url)

@tool
def get_top_companies_by_tx_volume(start_date: str, end_date: str, top_n: int = 5) -> dict:
    """
    Get top companies by transaction volume

    @param start_date: The start date in YYYY-MM-DD format
    @param end_date: The end date in YYYY-MM-DD format
    @param top_n: Number of stocks to show
    @return: A list of most traded IDX stocks based on transaction volume for a certain interval
    """
    url = f"https://api.sectors.app/v1/most-traded/?start={start_date}&end={end_date}&n_stock={top_n}"

    return retrieve_from_endpoint(url)

@tool
def get_daily_tx(stock: str, start_date: str, end_date: str) -> list[dict]:
    """
    Get daily transaction for a stock

    @param stock: The stock 4 letter symbol of the company
    @param start_date: The start date in YYYY-MM-DD format
    @param end_date: The end date in YYYY-MM-DD format
    @return: Daily transaction data of a given ticker for a certain interval
    """
    url = f"https://api.sectors.app/v1/daily/{stock}/?start={start_date}&end={end_date}"

    return retrieve_from_endpoint(url)


@tool
def get_top_companies_ranked(dimension: str, top_n: int, year: int) -> list[dict]:
    """
    Return a list of top companies (symbol) based on certain dimension 
    (dividend yield, total dividend, revenue, earnings, market cap,...)

    @param dimension: The dimension to rank the companies by, one of: 
    "dividend_yield", "total_dividend", "revenue", "earnings", "market_cap", ...

    @param top_n: Number of stocks to show
    @param year: Year of ranking, always show the most recent full calendar year that has ended
    @return: A list of top tickers in a given year based on certain classification
    """

    url = f"https://api.sectors.app/v1/companies/top/?classifications={dimension}&n_stock={top_n}&year={year}"

    return retrieve_from_endpoint(url)

@tool
def get_top_companies_by_growth(dimension : str, sub_sectors : str) -> dict :
    """
    Return a list of top companies (symbol) based on certain dimension 
    (top_earnings_growth_gainers, top_earnings_growth_losers, top_revenue_growth_gainers, top_revenue_growth_losers,...)

    @param dimension : The dimension to rank the companies by, one of: 
    top_earnings_growth_gainers, top_earnings_growth_losers, top_revenue_growth_gainers, top_revenue_growth_losers.
    @param sub_sectors : use get_company_overview tools to get the subsectors of the company, if not provided just leave it blank
    """

    url = f"https://api.sectors.app/v1/companies/top-growth/?classifications={dimension}&n_stock=5&sub_sector={sub_sectors}"

    return retrieve_from_endpoint(url)

@tool
def get_top_companies_by_mover(dimension : str, period : str, sub_sectors : str) -> dict :
    """
    Return a list of top companies (symbol) based on certain dimension on certain period
    (top_gainers, top_losers,...)

    @param dimension : The dimension to rank the companies by, one of: 
    (top_gainers, top_losers)
    @param period : The certain period, one of:
    (1d, 7d, 14d, 30d, 365d)
    @param sub_sectors : use get_company_overview tools to get the subsectors of the company, if not provided just leave it blank
    """

    url = f"https://api.sectors.app/v1/companies/top-changes/?classifications={dimension}&n_stock=5&periods={period}&sub_sector={sub_sectors}"

    return retrieve_from_endpoint(url)

def get_finance_agent():

    # Defined Tools
    tools = [
        get_company_overview,
        get_top_companies_by_tx_volume,
        get_daily_tx,
        get_top_companies_ranked,
        get_top_companies_by_growth,
        get_top_companies_by_mover,
        get_company_revenue_cost_segments
    ]

    # Create the Prompt Template
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                f"""
                Answer the following queries, being as factual and analytical as you can. 
                If you need the start and end dates but they are not explicitly provided, 
                infer from the query. Whenever you return a list of names, return also the 
                corresponding values for each name. If the volume was about a single day, 
                the start and end parameter should be the same. Note that the endpoint for 
                performance since IPO has only one required parameter, which is the stock. 
                Today's date is {datetime.today().strftime("%Y-%m-%d")}
                """
            ),
            MessagesPlaceholder(variable_name="chat_history"),
            ("human", "{input}"),
            MessagesPlaceholder("agent_scratchpad"),
        ]
    )

    # Initializing the LLM
    llm = ChatGroq(
        temperature=0,
        model_name="llama-3.3-70b-versatile",
        groq_api_key=GROQ_API_KEY,
    )

    # Create the Agent and AgentExecutor
    agent = create_tool_calling_agent(llm, tools, prompt)
    agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

    # Add Memory to the AgentExecutor
    def get_session_history(session_id: str):

        return StreamlitChatMessageHistory(key=session_id)
    
    agent_with_memory = RunnableWithMessageHistory(
        agent_executor,
        get_session_history,
        input_messages_key="input",
        history_messages_key="chat_history",
    )

    return agent_with_memory

  from .autonotebook import tqdm as notebook_tqdm


For extra i'm using RAG for understanding the uploaded file, 
HuggingFace for embedding with `sentence-transformers/all-mpnet-base-v2` model, here my implementation code

In [2]:
import tempfile
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import InMemoryVectorStore
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.tools import create_retriever_tool

MODEL = "llama-3.3-70b-versatile"
llm = ChatGroq(model=MODEL, temperature=0.0)

def create_pdf_agent(file) :

        # Simpan PDF ke file sementara
        with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file :
            tmp_file.write(file.read())
            tmp_path = tmp_file.name
        
        loader = PyPDFLoader(tmp_path)
        docs = loader.load()

        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,  
            chunk_overlap=200   
        )

        all_splits = text_splitter.split_documents(docs)
        embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

        vector_store_pdf = InMemoryVectorStore(embedding)
        _ = vector_store_pdf.add_documents(documents=all_splits)

        tools = create_retriever_tool(vector_store_pdf.as_retriever(search_kwargs={'k': 5}),
                                        name = "pdf_document_retriever",
                                        description= "Retrieve PDF as context to accurately and concisely answer the user's question")

        prompt = ChatPromptTemplate.from_messages(
                [
                    (
                        "system",
                        '''
                        You are a helpful and detail-oriented assistant. 
                        You are provided with a tool to retrieve a PDF document from a vector store. 
                        Use the context to accurately and concisely answer the user's question. 

                        You need to follow these rules:
                        - Only use data from tools provided. Never guess or use outside data.  
                        - If data is not available, say so clearly and do not make it up. Suggest alternative sources if possible.  
                        - Add follow-up questions to help users dive deeper  
                        '''
                    ),
                    (
                        "human", "{input}"
                    ),
                    MessagesPlaceholder("agent_scratchpad")
                ]
            )

        # Create the Agent and AgentExecutor
        agent = create_tool_calling_agent(llm, [tools], prompt)
        
        agent_executor_pdf = AgentExecutor(agent=agent, tools=[tools], verbose=True)

        return agent_executor_pdf

## 4. Data Flow & Processing Steps

Data Flow : User input (text/upload only pdf file) -> LLM Processing (get related data from Sectors API / Understanding Document) -> Generate Response -> Display Response


## 5. UI/UX Considerations

I'm using a simple and clean UI to making a easy friendly web and not too much noise

## 6. Deployment Notes

This project will deploy Streamlit Cloud using Python 3.11 and the secret keys will stored in Streamlit secret

## 7. Reflection & Next Steps

While i'm making this project i wanna try to develop with a really real-time based data to improve accuracy, but i'll cost a lot for me, and i think we can combine with others API to compare the data(s), not only via sectors while it can take a longer time in user, but with a little longer wait for better data i think it's fair enough and improve in user system, so it can CRUD for user and integrate for predictive analysis with disclaimer ofc.