## Webpage Loaders
- Load the webpage and extract the data using the `requests` and `BeautifulSoup` libraries.
- Use LLM to extract meaningful data from the webpage.

### Project 1: Share Market Data Analysis Based on Global Cues
- We will extract the data from the website `https://www.bloomberg.com/markets` and analyze the data to understand the impact of global cues on the Indian share market.

In [1]:
from dotenv import load_dotenv

load_dotenv('./../.env')

True

In [7]:
import bs4
from langchain_community.document_loaders import WebBaseLoader


urls = ["https://www.moneycontrol.com/",
        "https://economictimes.indiatimes.com/markets/stocks/news", 
        "https://www.livemint.com/latest-news"]

loader = WebBaseLoader(web_paths=urls)
docs = []
async for doc in loader.alazy_load():
    docs.append(doc)

def format_docs(docs):
    return "\n\n".join([x.page_content for x in docs])


context = format_docs(docs)

In [8]:
import re

def text_clean(text):
    text = re.sub(r"\n\n+", "\n\n", text)
    text = re.sub(r"\t+", "\t", text).strip()
    return text

In [9]:
context = text_clean(context)
len(context)

102319

In [10]:
### QnA with LLM
from scripts import llm

In [12]:
doc

Document(metadata={'source': 'https://www.livemint.com/latest-news', 'title': 'Latest News Today: Latest News Headlines, Breaking News, Current News | Mint', 'description': 'Latest News Today: Read latest and breaking news from India and across the world. Explore more for current news, premium and WSJ news, business news on Mint.', 'language': 'en'}, page_content="\n\n\n          \n\n\n\n\n \nLatest News Today: Latest News Headlines, Breaking News, Current News | Mint\n\n\n\n\n\n\n\n\n\n\n\n\n \n\n\n\n \n\n\n\n\n\n\n\n          Explore          Sign in      e-paper Subscribe  Sign In     Wednesday, 30 October 2024              Stocks       Mutual Funds       News                     Home News Markets Premium Money Mutual Fund Personal Loan Companies Technology Web Stories In Charts Opinion Videos      All Companies  Technology Markets Money MyMint Mutual Funds Insurance Auto  Industry  Personal Finance        Hello User  Sign in      Sign Out     My Account My Account Subscribe  My Wat

In [13]:
question = """Extract stock market related news if present in the text. 
                                        Do not write preamble or explaination. Extract all news in points."""

response = llm.ask_llama(context[:10_000], question)
print(response)

* Force Motors hits 20% upper circuit on healthy Q2FY25 results
* Godavari Biorefineries shares list at 12.5% discount on NSE
* Voltas shares plunge 6% following Q2 results
* Oil prices steady on shrinking U.S. crude inventories
* Bitcoin rises past $70,000 for the first time since June


In [14]:
# Get the answer from chunks of 10_000 characters with 100 characters overlap
def chunk_text(text, chunk_size=10_000, overlap=100):
    chunks = []
    for i in range(0, len(text), chunk_size-overlap):
        chunks.append(text[i:i+chunk_size])
    return chunks

chunks = chunk_text(context)



In [15]:
chunk_summary = []
for chunk in chunks:
    response = llm.ask_llama(chunk, question)
    chunk_summary.append(response)

In [16]:
for summary in chunk_summary:
    print(summary)
    break

* Force Motors hits 20% upper circuit on healthy Q2FY25 results
* Godavari Biorefineries shares list at 12.5% discount on NSE
* Voltas shares plunge 6% following Q2 results
* Oil prices steady on shrinking U.S. crude inventories
* Bitcoin rises past $70,000 for the first time since June


In [28]:
summary = "\n\n".join(chunk_summary)
response = llm.ask_llama(context = summary, 
                             question = """Write a detailed market news report in markdown format. Think carefully then write the report.""")

In [29]:
import os

# print(response)

os.makedirs("data", exist_ok=True)
with open("data/market_report.md", "w") as f:
    f.write(response)

In [27]:
print(summary)

* Force Motors hits 20% upper circuit on healthy Q2FY25 results
* Godavari Biorefineries shares list at 12.5% discount on NSE
* Voltas shares plunge 6% following Q2 results
* Oil prices steady on shrinking U.S. crude inventories
* Bitcoin rises past $70,000 for the first time since June

• Advance / Decline (NSE) 
• Global Markets
• STOCK ACTION
• NSE
• BSE
• Value 
(Rs Cr.)
• HDFC Bank
• Maruti Suzuki
• M&M
• Reliance
• ICICI Bank
• Tata Motors
• Wipro
• IndusInd Bank

Here are the extracted stock market related news:

* Maruti Suzuki shares up 2% despite brokerages flagging demand concerns for entry-level cars
* Maruti Suzuki’s first EV promises high-tech edge, set to electrify roads in Japan and Europe
* Northern Arc Capital Q2 profit rises 24% to Rs 98 cr

Here are the extracted stock market related news:

* Investors will remain positive post Budget on Indian fixed income assets
* Right time to invest in fixed income for stability, says LIC Mutual Fund's Marzban Irani
* How to get