# Portfolio Material News Updater  

Welcome to the **Material News Updater** tutorial. By the end of this notebook you will be able to:

1. **Collect** fresh articles from a curated list of financial-news RSS feeds.
2. **Embed** those stories in a Chroma vector database with OpenAI embeddings.
3. **Query** the database with GPT-4.1-mini to surface *material* news for a portfolio of stocks.
4. **Summarize** the individual stock news breifs into a concise portfolio brief.
5. **Email** yourself a concise morning briefing—fully automated.

You’ll practise:

* Working with RSS feeds.
* Turning raw text into embeddings using LangChain + OpenAI.
* Building a Retrieval-Augmented Generation (RAG) chain.
* Packaging results for convenient distribution.

**Prerequisites**  
• Python ≥ 3.11 (Conda or venv).  
• OpenAI API key and Yahoo Mail credentials stored in a `.env` file (use the .env.example template and remove the .example suffix).

# OpenAI Account Setup
* Create an OpenAI Platform Account – Go to [platform.openai.com](https://platform.openai.com/docs/overview) and follow the prompts to sign up.
* Generate Your API Key – After signing in, navigate to the API keys page to create a new secret key.

# Yahoo Account Setup
This example uses **Yahoo Mail** to send the morning brief because it supports SMTP with `STARTTLS`, making it easy to integrate using the stmplib python library for sending emails.

⚠️ **WARNING:** For security and privacy reasons, we strongly recommend using a **dedicated “junk” Yahoo account** for this project. Do **not** use your personal or sensitive email address as your sending email. The TO_EMAIL, which is the email for the brief to be sent to can be any email as this does not require sensitive information. 

To set it up:
- [Create a new Yahoo email account](https://login.yahoo.com/account/create)
- [Generate an app password](https://help.yahoo.com/kb/SLN15241.html?guccounter=1) to use instead of your actual login

You can also run this notebook via a Google Colab here: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nEkt2ivAlU2XVU7AkyQG4Q-rMat7UcX8?usp=sharing)

---

Let’s start by loading the RSS catalogue.

## 1. Load RSS sources

`news_rss.json` contains a simple list of feed URLs plus some lightweight metadata.  
The next cell just deserialises that file into Python so we can iterate over it.

In [66]:
import json

with open('news_rss.json', 'r') as file:
    rss_json = json.load(file)

In [None]:
# Unhash the below line to install the feedparser package
# !pip install feedparser

In [67]:
import feedparser
import time

feeds = []

for rss_dict in rss_json:
    rss_url = rss_dict['rss']
    source = rss_dict['source']
    news_type = rss_dict['type']
    feed = feedparser.parse(rss_url)
    feed_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time()))
    feed_dict = {
        'source':source,
        'type':news_type,
        'feed':feed,
        'time_pulled':feed_time
    }
    feeds.append(feed_dict)

## 2. Embed stories and build the vector store
  
Here we walk through each RSS entry, keep the entire article intact (because they’re short), and prepare a list of `documents` that we’ll feed into Chroma. Each document carries metadata—`title`, `source`, the time we pulled it, and a high-level `news_type` tag—for easier filtering.

In [68]:
from dotenv import load_dotenv

load_dotenv()

# Create full documents from entries without splitting
documents = []
for rss_feed in feeds:
    try:
        for entry in rss_feed['feed']['entries']: 
            title = entry['title']
            entry_content = str(entry)
            documents.append({"content": entry_content, "metadata": {"title":title,"source": rss_feed["source"], "time_pulled": rss_feed["time_pulled"], "news_type":rss_feed['type']}})
    except Exception as e:
        print(f'An error - {e} occured for {entry}')


### Batch The Embeddings

OpenAI’s embedding endpoint caps *each request* at **300 k tokens**. The helper below measures every document, adds them to a running total, and starts a new batch whenever the next doc would tip us over the limit.

In [44]:
# Unhash and run the below if you need to install the tiktoken package
# !pip install tiktoken

In [69]:

def batch_docs(documents, model='text-embedding-3-small', token_limit=300000):
    import tiktoken
    encoding = tiktoken.encoding_for_model(model)
    
    batches = []
    current_tokens = 0
    current_docs = []

    for document in documents:
        tokens = len(encoding.encode(document["content"]))
        
        if tokens > token_limit:
            raise ValueError(f"Single document exceeds token limit: {tokens} tokens")

        if current_tokens + tokens > token_limit:
            batches.append(current_docs)
            current_docs = [document]
            current_tokens = tokens
        else:
            current_docs.append(document)
            current_tokens += tokens

    if current_docs:
        batches.append(current_docs)

    return batches


In [70]:
batches = batch_docs(documents)

In [None]:
# Unhash the below to install chromadb
# !pip install chromadb

In [71]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

# Initialize the embedding model and vector store
embedding_model = OpenAIEmbeddings(model='text-embedding-3-small')
vector_store = Chroma(collection_name='rss_news_feeds', embedding_function=embedding_model)

In [72]:
for batch in batches:
    texts = [doc["content"] for doc in batch]
    metadatas = [doc["metadata"] for doc in batch]
    vector_store.add_texts(texts, metadatas=metadatas)


## 3. Craft the system prompt

This prompt underpins our retrieval chain. We must include the "{context}" variable in the prompt as this will bring in the relevant news articles based on our query.

In [73]:
system_prompt = (
    "You are an assistant designed to search through news feeds and find the most relevant news based on a prompt. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't have enough information to infer an answer, say that you dont have enough information to answer the question."
    "Each document contains the meta data which has the source, news type and pulled date. If you have two stories from different sources that"
    " appear to be about the same story prioritize the one with the ealier published time."
    "Keep the answer concise."
    "\n\n"
    "{context}"
)

In [None]:
# Unhash the below line if you need to install yfinance
#!pip install yfinance # We will use yfinance to give some more context about portfolio companies

## 4. Create the RAG Chain

Here we put all the components together for our RAG workflow

In [None]:
# Unhash the below to install langchain_community and langchain_openai
#!pip install langchain_community
#!pip install langchain_openai

In [74]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from datetime import datetime

llm = ChatOpenAI(model="gpt-4.1-mini")
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

retriever = vector_store.as_retriever(search_kwargs={"k": 20})
qa_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, qa_chain)

## 5. Create material news summaries for each company

For each ticker we:
1. Pull a short business description from `yfinance` for extra context.  
2. Ask the RAG chain which news **published today** might have a material effect on the stock prices.  
3. Put each answer in a document for us to aggregate into our portfolio brief later.

In [75]:
from langchain_core.documents import Document
from typing import List
from openai import OpenAI
import yfinance as yf

client = OpenAI()

# 1. Function to summarize news for a given ticker
def summarize_ticker(ticker: str) -> str:
    today = datetime.today().strftime("%Y-%m-%d")
    info  = yf.Ticker(ticker).info
    company     = info.get("longName", ticker)
    description = info.get("longBusinessSummary", "")
    query = (
        f"Today's date is {today}. You MUST only reference news published either today or yesterday. News published any other day is unacceptable!\n\n"
        f"Here is a description of {company}: {description}\n\n"
        f"Which news stories published either today or yesterday are most likely to have a material impact on {company}'s stock? If there isnt anything material, say so.\n\n"
        "Include citations. Remember news published any other day is unacceptable!"
    )
    result = rag_chain.invoke({"input": query})
    return f"{ticker} – {company}:\n{result}\n"

# 2. Generate per-ticker summaries
tickers = ["AAPL","MSFT","NVDA","JNJ","UNH","JPM","V","PG","KO","XOM"]
summaries = [summarize_ticker(t) for t in tickers]

# 3. Wrap into Documents for summarization chain
docs = [
    Document(page_content=s, metadata={"ticker": s.split("–")[0]})
    for s in summaries
]

## 6. Roll-up briefing

A second LLM pass condenses the individual blurbs into a single portfolio brief, formatted in Markdown which we can easily convert to html for the email.

In [None]:

def create_portfolio_breif(summary_docs:List[Document],model='gpt-4o') -> str:
    input = f"""You are a financial analyst and expert Markdown formatter.

        Your task is to synthesize the following stock-specific news briefs into a structured, markdown-formatted report. Focus only on **material information** relevant to the portfolio.

        **Instructions:**
        - Structure the output using **headings** for each portfolio stock (e.g., ## AAPL).
        - Under each stock, use **bullet points** for each insight.
        - Use your expert financial skill to determine what actually has a high probability to be material and only highlight those stories. Be selective!
        - Povide a clear explaination below the news on why the news could materially affect the stock - what are the potential outcomes? 
        - Add **in-context citations** (e.g., (Source: Bloomberg, June 14)) with links to back up each point.
        - Do **not** wrap the output in code blocks.
        - Do **not** include commentary or filler—just structured insights.

        Here are the news briefs: {str(docs)}"""
    
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are an expert financial analyst."},
            {"role": "user", "content": input}
        ],
    )
    return completion.choices[0].message.content
# 3. Create and run a map-reduce summarization chain
portfolio_brief = create_portfolio_breif(docs)

print(portfolio_brief)

## 7. Deliver the brief

We convert the Markdown to HTML and send it via Yahoo SMTP.  
Make sure you’ve set `YAHOO_EMAIL`, `YAHOO_APP_PASSWORD`, and `TO_EMAIL` in your environment *before* running the next cell.

In [None]:
# Unhash the bwlow line if you need to install the markdown package
#!pip install markdown

In [79]:

import os
import smtplib
import markdown
from email.message import EmailMessage
from dotenv import load_dotenv

# Load environment variables (e.g. from a .env file)
load_dotenv()

YAHOO_EMAIL        = os.environ['YAHOO_EMAIL']
YAHOO_APP_PASSWORD = os.environ['YAHOO_APP_PASSWORD']
TO_EMAIL           = os.environ['TO_EMAIL']

def send_yahoo_email(subject: str, body_markdown: str):
    """Send an HTML email via Yahoo SMTP (STARTTLS on port 587)."""
    # Build the email
    msg = EmailMessage()
    msg['Subject'] = subject
    msg['From']    = YAHOO_EMAIL
    msg['To']      = TO_EMAIL

    # Convert Markdown to HTML
    html_body = markdown.markdown(body_markdown)
    msg.set_content("This email contains HTML only.", subtype="plain")
    msg.add_alternative(html_body, subtype="html")

    try:
        # Connect, secure with STARTTLS, login, send
        with smtplib.SMTP('smtp.mail.yahoo.com', 587, timeout=30) as server:
            server.starttls()
            server.login(YAHOO_EMAIL, YAHOO_APP_PASSWORD)
            server.send_message(msg)
        print(f"\n✅ Email sent successfully to {TO_EMAIL}")
    except Exception as e:
        print(f"\n❌ Failed to send: {e}")

# Example invocation using your portfolio brief
send_yahoo_email("Morning Brief", portfolio_brief)



✅ Email sent successfully to brian.pisaneschi@cfainstitute.org


---

### Want to Fully Automate This Workflow?

If you're interested in automating this daily news briefing—complete with scheduled runs and email delivery via GitHub Actions follow the instructions in the README file [here](https://github.com/CFA-Institute-RPC/The-Automation-Ahead/tree/main/Retrieval%20Augmented%20Generation/Portfolio%20News%20Updater/automation_scripts#readme). It walks you through setting up environment variables, customizing your portfolio, and deploying the automated workflow.