# Project FinBot

### How does FinBot help?


On a daily basis there are many news articles that come about based on the stock market. A user may find it time consuming to read all the articles, hence gets overwhelmed. FinBot helps to ingest all the articles you like to know more about and the user can perform RAG search with FinBot.

#### STEP 1: Importing the required libraries

In [1]:
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.chains.qa_with_sources.loading import load_qa_with_sources_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
import os
import time
import langchain
from langchain import OpenAI

In [2]:
# read the openai api key
os.environ["OPENAI_API_KEY"]=open("OPENAI_API_Key.txt","r").read().strip()

#### STEP 2: Load the data from the URLS

Loading the urls of the fintech news websites using the library from langChain called unstructured

In [None]:
# !pip install unstructured


[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting unstructured
  Using cached unstructured-0.16.23-py3-none-any.whl.metadata (24 kB)
Collecting chardet (from unstructured)
  Using cached chardet-5.2.0-py3-none-any.whl.metadata (3.4 kB)
Collecting python-magic (from unstructured)
  Downloading python_magic-0.4.27-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting lxml (from unstructured)
  Downloading lxml-5.3.1-cp312-cp312-win_amd64.whl.metadata (3.8 kB)
Collecting emoji (from unstructured)
  Using cached emoji-2.14.1-py3-none-any.whl.metadata (5.7 kB)
Collecting python-iso639 (from unstructured)
  Downloading python_iso639-2025.2.18-py3-none-any.whl.metadata (14 kB)
Collecting langdetect (from unstructured)
  Downloading langdetect-1.0.9.tar.gz (981 kB)
     ---------------------------------------- 0.0/981.5 kB ? eta -:--:--
     -------------------------------------- 981.5/981.5 kB 9.2 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requiremen

As of date today-> 28/02/2025
these articles on Stock market are on trend based on stock market crash:
- https://indianexpress.com/article/business/market/stock-market-crash-today-sensex-nifty-live-updates-9860683/
- https://www.ndtvprofit.com/markets/february-market-fallout-niftys-longest-losing-streak-since-1996-and-fmcgs-worst-fall-in-18-years
- https://economictimes.indiatimes.com/markets/stocks/stock-watch/sensex-falls-but-these-stocks-gained-over-10-on-bse/articleshow/118628952.cms
- https://economictimes.indiatimes.com/markets/stocks/stock-watch/stock-market-update-stocks-that-hit-52-week-highs-on-nse-in-todays-trade/articleshow/118628856.cms
- https://www.thehindu.com/business/markets/rupee-falls-28-paise-to-close-at-8746-against-us-dollar/article69274471.ece

In [3]:
loader=UnstructuredURLLoader(
urls = [
       " https://indianexpress.com/article/business/market/stock-market-crash-today-sensex-nifty-live-updates-9860683/",
       " https://www.ndtvprofit.com/markets/february-market-fallout-niftys-longest-losing-streak-since-1996-and-fmcgs-worst-fall-in-18-years",
       " https://economictimes.indiatimes.com/markets/stocks/stock-watch/sensex-falls-but-these-stocks-gained-over-10-on-bse/articleshow/118628952.cms",
       " https://economictimes.indiatimes.com/markets/stocks/stock-watch/stock-market-update-stocks-that-hit-52-week-highs-on-nse-in-todays-trade/articleshow/118628856.cms",
       " https://www.thehindu.com/business/markets/rupee-falls-28-paise-to-close-at-8746-against-us-dollar/article69274471.ece"
    ]
)

In [5]:
data=loader.load()
print("Number of documents loaded: ", len(data))

Number of documents loaded:  5


In [10]:
data[4].page_content

"February 28, 2025e-Paper\n\nSubscribe\n\nLive Now Agriculture\n\nBooks\n\nBooks\n\nHindi Belt\n\nThe Hindu On Books Books of the week, reviews, excerpts, new titles and features.\n\nSEE ALL NEWSLETTERS\n\nBusiness\n\nBusiness Agri-Business Economy Industry Markets Budget\n\nChildren\n\nCities\n\nCities Bengaluru Chennai Coimbatore Delhi Hyderabad Kochi Kolkata Kozhikode Madurai Mangaluru Mumbai Puducherry Thiruvananthapuram Tiruchirapalli Vijayawada Visakhapatnam\n\nData\n\nData\n\nData Point Podcast\n\nData Point Decoding the headlines with facts, figures, and numbers\n\nSEE ALL NEWSLETTERS\n\nEbook\n\nEducation\n\nEducation Careers Colleges Schools\n\nElections\n\nEntertainment\n\nEntertainment Art Dance Movies Music Reviews Theatre\n\nFirst Day First Show News and reviews from the world of cinema and streaming.\n\nSEE ALL NEWSLETTERS\n\nEnvironment\n\nFood\n\nFood Dining Features Guides Recipes\n\nGood Health Hunting\n\nHealth\n\nHealth\n\nMonkeypox\n\nHealth Matters Ramya Kannan w

In [11]:
data[4].metadata

{'source': ' https://www.thehindu.com/business/markets/rupee-falls-28-paise-to-close-at-8746-against-us-dollar/article69274471.ece'}

#### STEP 3: Splitting the data text into chunks

Using the library from langchain -> RecursiveCharacterTextSplitter

In [12]:
text_splitter= RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

chunked_data=text_splitter.split_documents(data)

In [14]:
print("Number of chunks created by splitting: ", len(chunked_data))

Number of chunks created by splitting:  82


Let's print the number of characters in each chunk

In [19]:
for i in chunked_data:
    print(f"{len(i.page_content)}")

956
621
923
812
783
517
839
794
919
935
911
976
998
987
957
995
938
332
973
705
714
919
899
977
527
917
726
955
966
995
939
733
994
999
995
832
928
580
482
987
814
980
337
996
352
983
821
875
817
936
995
982
955
733
994
999
995
832
928
580
482
987
702
994
464
996
352
983
821
990
916
942
826
896
804
802
839
975
995
969
768
481


#### STEP 4: Create vector embedings of these texts and store it into FAISS

In [36]:
# !pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp312-cp312-win_amd64.whl.metadata (4.5 kB)
Downloading faiss_cpu-1.10.0-cp312-cp312-win_amd64.whl (13.7 MB)
   ---------------------------------------- 0.0/13.7 MB ? eta -:--:--
   ------ --------------------------------- 2.1/13.7 MB 11.8 MB/s eta 0:00:01
   ------------- -------------------------- 4.5/13.7 MB 11.7 MB/s eta 0:00:01
   ---------------- ----------------------- 5.8/13.7 MB 9.5 MB/s eta 0:00:01
   ------------------ --------------------- 6.3/13.7 MB 7.6 MB/s eta 0:00:01
   ------------------- -------------------- 6.6/13.7 MB 6.6 MB/s eta 0:00:02
   --------------------- ------------------ 7.3/13.7 MB 5.8 MB/s eta 0:00:02
   ----------------------- ---------------- 8.1/13.7 MB 5.6 MB/s eta 0:00:01
   --------------------------- ------------ 9.4/13.7 MB 5.5 MB/s eta 0:00:01
   ------------------------------- -------- 10.7/13.7 MB 5.6 MB/s eta 0:00:01
   ----------------------------------- ---- 12.1/13.7 MB 5.6 MB/s eta 0:0


[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [20]:
Vector_embeddings=OpenAIEmbeddings()

Faiss_vector_index=FAISS.from_documents(chunked_data,Vector_embeddings)

  Vector_embeddings=OpenAIEmbeddings()


In [21]:
#save the index
Faiss_vector_index.save_local("vector_index.faiss")

In [22]:
vector_index_faiss = FAISS.load_local("vector_index.faiss", Vector_embeddings,allow_dangerous_deserialization=True)

Specifying the custom LLM

In [23]:
llm= OpenAI(
    temperature=0.7,
    max_tokens=500
)

  llm= OpenAI(


#### STEP 5: Retrieve similar embeddings for a given question and call LLM to retrieve it

Using Retreival QA chain a library from langchain

In [24]:
q_a_chain=RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=vector_index_faiss.as_retriever())
q_a_chain



#### STEP 6: Testing the RAG

In [26]:
query = "Can you tell me the reason for stock market crash?"
langchain.debug=True

q_a_chain({"question":query}, return_only_outputs=True)

[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain] Entering Chain run with input:
[0m{
  "question": "Can you tell me the reason for stock market crash?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "input_list": [
    {
      "context": "Written by Hitesh Vyas Mumbai | Updated: February 28, 2025 16:13 IST\n\n5 min read\n\nDecline in Indian equities mirrored losses across the wider Asian markets. (Express Photo: Sankhadeep Banerjee)\n\nIndian Stock Market Today: The Indian stock market has witnessed another bloodbath on Friday, with key indices crashing by 1.9 per cent and wiping out a staggering 18 per cent of investor wealth since September last year. The relentless sell-off has left investors reeling, wi

{'answer': ' The stock market has crashed due to the uncertainty and impact of tariffs by the US president, as well as large foreign portfolio investors pulling out from the market.\n',
 'sources': 'https://indianexpress.com/article/business/market/stock-market-crash-today-sensex-nifty-live-updates-9860683/'}

In [28]:
query = "What are the stocks that gained over 10% in BSE?"
#langchain.debug=True

q_a_chain({"question":query}, return_only_outputs=True)

[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain] Entering Chain run with input:
[0m{
  "question": "What are the stocks that gained over 10% in BSE?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "input_list": [
    {
      "context": "Follow us\n\nShare\n\nFont Size\n\nAbcSmall\n\nAbcMedium\n\nAbcLarge\n\nSave\n\nPrint\n\nComment\n\nSynopsis\n\nIn the Nifty 50 index, 5 stocks ended in the green, while 44 stocks closed in the red in today's trade.\n\nNEW DELHI: A number of stocks rose in excess of 10% on BSE as domestic equity indices, BSE Sensex and NSE Nifty, ended in the red on Friday. These high-performing stocks that rallied more than 10% during the session included, Medico Remedies(11.81%), Home Firs

{'answer': ' The high-performing stocks that gained over 10% during the session on BSE were Medico Remedies, Home First Finance Co, and Transpact Enterprise.\n',
 'sources': 'https://economictimes.indiatimes.com/markets/stocks/stock-watch/sensex-falls-but-these-stocks-gained-over-10-on-bse/articleshow/118628952.cms'}

In [29]:
query = "Why is the price of Rupee depreciating"
#langchain.debug=True

q_a_chain({"question":query}, return_only_outputs=True)

[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain] Entering Chain run with input:
[0m{
  "question": "Why is the price of Rupee depreciating"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "input_list": [
    {
      "context": "Reddit\n\nRemove SEE ALL\n\nFIIs offloaded equities worth ₹556.56 crore in the capital markets on net basis on Thursday (February 27, 2025), according to exchange data. File | Photo Credit: Reuters\n\nThe depreciated 28 paise to close at 87.46 (provisional) against the U.S. dollar on Friday (February 28, 2025), as the strength of the American currency and a negative trend in domestic equities dented investor sentiments.\n\nForex traders said the ongoing uncertainty surrounding tariff 

{'answer': ' The price of Rupee may be depreciating due to the strength of the American currency, negative trends in domestic equities, uncertainty over U.S. trade tariffs, and sustained FII outflows.\n',
 'sources': 'https://www.thehindu.com/business/markets/rupee-falls-28-paise-to-close-at-8746-against-us-dollar/article69274471.ece'}

In [30]:
query = "Can you summarize on the stocks that are hitting 52 week high"
#langchain.debug=True

q_a_chain({"question":query}, return_only_outputs=True)

[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain] Entering Chain run with input:
[0m{
  "question": "Can you summarize on the stocks that are hitting 52 week high"
}
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[chain:RetrievalQAWithSourcesChain > chain:MapReduceDocumentsChain > chain:LLMChain] Entering Chain run with input:
[0m{
  "input_list": [
    {
      "context": "Meanwhile, stocks such as Photon Capital, Hira Automobiles, Aris Intnl, Dhanlaxmi Cotex and Triumph International Finance India Ltd. hit their fresh 52-week high, while Cosyn, Capital Trust, Envair Electro, Kunststoffe Ind and Nova Iron Steel touched their new 52-week low in today's trade.\n\nStock Trading\n\nMaximise Returns by Investing in the Right Companies\n\nBy - The Economic Times, Get Certified By India's Top Business News Brand\n\nStock Trading\n\nRenko

{'answer': " Stocks hitting their fresh 52-week high in today's trade include Photon Capital, Hira Automobiles, Aris Intnl, Dhanlaxmi Cotex, Triumph International Finance India Ltd., Brookfield India REIT, Blue Coast Hotel, Norben Tea Exp, Capital Infra Trust, Laxmi Goldorna House, Medico Remedies, Home First Finance Co, and Transpact Enterprise.\n",
 'sources': 'https://economictimes.indiatimes.com/markets/stocks/stock-watch/sensex-falls-but-these-stocks-gained-over-10-on-bse/articleshow/118628952.cms, https://economictimes.indiatimes.com/markets/stocks/stock-watch/stock-market-update-stocks-that-hit-52-week-highs-on-nse-in-todays-trade/articleshow/118628856.cms'}