# Web Search Agents: High-Yield Savings Account (HYSA) Rate Finder

**Objective**: Build an agent that searches for current HYSA rates, filters reputable sources, and returns a top-5 summary with APY, minimum deposits, and sources.

**Key Learning Points**:
- Real-time web search API integration
- Source credibility filtering 
- LLM-powered data extraction from unstructured text
- Financial data synthesis and professional formatting

**Time**: ~15-20 minutes

In [1]:
# Import required libraries
import os
import requests
import json
from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import List, Optional, Dict, Any
from openai import OpenAI
from dotenv import load_dotenv
import re

# Optional: Import Tavily client (alternative to direct API calls)
try:
    from tavily import TavilyClient
    TAVILY_CLIENT_AVAILABLE = True
except ImportError:
    TAVILY_CLIENT_AVAILABLE = False
    print("üí° Tavily client not installed. Using direct API calls instead.")
    print("   To install: pip install tavily-python")

# Load environment variables
load_dotenv()

# Initialize OpenAI client
# Initialize OpenAI client
client = OpenAI(
    base_url="https://openai.vocareum.com/v1",
    api_key=os.getenv("OPENAI_API_KEY")
)


print("üîß Environment Setup:")
print(f"   ‚úÖ OpenAI API Key: {'‚úì Configured' if os.getenv('OPENAI_API_KEY') else '‚ùå Missing'}")
print(f"   üîç Tavily API Key: {'‚úì Configured' if os.getenv('TAVILY_API_KEY') else '‚ùå Missing (will use fallback)'}")
print(f"   üì¶ Tavily Client: {'‚úì Available' if TAVILY_CLIENT_AVAILABLE else '‚ùå Not installed'}")

üîß Environment Setup:
   ‚úÖ OpenAI API Key: ‚úì Configured
   üîç Tavily API Key: ‚úì Configured
   üì¶ Tavily Client: ‚úì Available


In [2]:
@dataclass
class SearchResult:
    """Represents a search result from web search API"""
    title: str
    url: str
    snippet: str
    published_date: Optional[str] = None
    domain: Optional[str] = None

@dataclass
class HYSARecord:
    """Represents a High-Yield Savings Account rate record"""
    bank_name: str
    apy: float
    minimum_deposit: Optional[str] = None
    as_of_date: Optional[str] = None
    source_url: str = ""
    source_title: str = ""
    
@dataclass
class HYSASummary:
    """Final summary of top HYSA rates"""
    intro: str
    top_rates: List[HYSARecord]
    takeaway: str
    sources: List[Dict[str, str]]
    disclaimer: str

## üîë Setup Requirements

**Environment Variables** (in your `.env` file):
```
OPENAI_API_KEY=your_openai_key_here
TAVILY_API_KEY=your_tavily_key_here  # Free at tavily.com (1,000 searches/month)
```

**Dependencies**:
```bash
pip install openai tavily-python requests python-dotenv
```

**Note**: The agent will use realistic mock data if Tavily API key is not available.

In [3]:
class WebSearchAgent:
    """Agent for searching and analyzing HYSA rates from web sources"""
    
    def __init__(self):
        self.reputable_domains = {
            'bankrate.com',
            'nerdwallet.com', 
            'investopedia.com',
            'forbes.com',
            'money.com',
            'creditkarma.com',
            'sofi.com',
            'ally.com',
            'marcus.com',
            'capitalone.com',
            'discover.com'
        }
        
    def search_web(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """
        Search the web using Tavily API with real-time results
        
        Args:
            query: Search query string
            num_results: Number of results to return
            
        Returns:
            List of SearchResult objects
        """
        tavily_api_key = os.getenv("TAVILY_API_KEY")
        
        if not tavily_api_key:
            print("‚ö†Ô∏è TAVILY_API_KEY not found. Using fallback mock results.")
            return self._get_fallback_results()[:num_results]
        
        try:
            # Tavily API endpoint
            url = "https://api.tavily.com/search"
            
            # Add freshness constraint (last 30 days)
            fresh_query = f"{query} after:2025-07-01"
            
            payload = {
                "api_key": tavily_api_key,
                "query": fresh_query,
                "search_depth": "basic",
                "include_answer": False,
                "include_images": False,
                "include_raw_content": False,
                "max_results": num_results,
                "include_domains": list(self.reputable_domains)  # Focus on reputable sources
            }
            
            response = requests.post(url, json=payload, timeout=10)
            response.raise_for_status()
            
            data = response.json()
            results = []
            
            for item in data.get('results', []):
                # Extract domain from URL
                domain = self._extract_domain(item.get('url', ''))
                
                result = SearchResult(
                    title=item.get('title', ''),
                    url=item.get('url', ''),
                    snippet=item.get('content', ''),
                    published_date=item.get('published_date'),
                    domain=domain
                )
                results.append(result)
            
            print(f"‚úÖ Found {len(results)} results from Tavily API")
            return results
            
        except requests.exceptions.RequestException as e:
            print(f"üîå Network error with Tavily API: {e}")
            return self._get_fallback_results()[:num_results]
        except Exception as e:
            print(f"‚ùå Error with Tavily API: {e}")
            return self._get_fallback_results()[:num_results]
    
    def _get_fallback_results(self) -> List[SearchResult]:
        """Fallback mock results when API is unavailable"""
        return [
            SearchResult(
                title="Best High-Yield Savings Accounts of 2025 - Bankrate",
                url="https://www.bankrate.com/banking/savings/best-high-yield-interest-savings-accounts/",
                snippet="Marcus by Goldman Sachs offers 4.50% APY with no minimum deposit. Ally Bank provides 4.25% APY with no monthly fees.",
                published_date="2025-08-20",
                domain="bankrate.com"
            ),
            SearchResult(
                title="Top High-Yield Savings Accounts - NerdWallet",
                url="https://www.nerdwallet.com/best/banking/high-yield-online-savings-accounts",
                snippet="SoFi Bank offers 4.60% APY with no account fees. Capital One 360 provides 4.30% APY with $0 minimum opening deposit.",
                published_date="2025-08-22",
                domain="nerdwallet.com"
            ),
            SearchResult(
                title="Best Savings Account Rates Today - Forbes Advisor",
                url="https://www.forbes.com/advisor/banking/best-high-yield-savings-accounts/",
                snippet="UFB Direct offers 4.57% APY with $0 minimum balance. Discover Bank provides 4.35% APY with no monthly maintenance fees.",
                published_date="2025-08-21",
                domain="forbes.com"
            ),
            SearchResult(
                title="High-Yield Savings Account Rates - Investopedia",
                url="https://www.investopedia.com/best-high-yield-savings-accounts-4770633",
                snippet="American Express Personal Savings offers 4.35% APY with no minimum deposit requirement. CIT Bank provides 4.55% APY.",
                published_date="2025-08-19",
                domain="investopedia.com"
            )
        ]
    
    def filter_reputable_sources(self, results: List[SearchResult]) -> List[SearchResult]:
        """
        Filter search results to only include reputable financial sources
        
        Args:
            results: List of search results
            
        Returns:
            Filtered list of reputable search results
        """
        filtered = []
        
        for result in results:
            domain = result.domain or self._extract_domain(result.url)
            
            # Check if domain is in our reputable list
            if any(rep_domain in domain for rep_domain in self.reputable_domains):
                result.domain = domain
                filtered.append(result)
                
        return filtered
    
    def _extract_domain(self, url: str) -> str:
        """Extract domain from URL"""
        import re
        match = re.search(r'https?://(?:www\.)?([^/]+)', url)
        return match.group(1) if match else ""
    
    def extract_hysa_data(self, results: List[SearchResult]) -> List[HYSARecord]:
        """
        Extract HYSA rate data from search result snippets using LLM
        
        Args:
            results: List of filtered search results
            
        Returns:
            List of extracted HYSA records
        """
        extraction_prompt = """
You are a financial data extraction expert. Extract High-Yield Savings Account (HYSA) information from the provided search snippets.

For each snippet, extract:
- Bank name
- APY (Annual Percentage Yield) as a number (e.g., 4.50 for 4.50%)
- Minimum deposit (if mentioned, otherwise "No minimum" or "Not specified")
- Any date information (if available)

Return a JSON array of objects with fields: bank_name, apy, minimum_deposit, as_of_date.
Only include legitimate banks with clear APY information. Skip promotional or intro rates.

Search snippets:
"""
        
        # Combine all snippets for extraction
        snippets_text = "\n\n".join([
            f"Source: {result.title} ({result.domain})\n{result.snippet}"
            for result in results
        ])
        
        try:
            response = client.chat.completions.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": extraction_prompt},
                    {"role": "user", "content": snippets_text}
                ],
                temperature=0.1
            )
            
            extracted_data = json.loads(response.choices[0].message.content)
            
            # Convert to HYSARecord objects and add source information
            records = []
            for item in extracted_data:
                # Validate APY is a valid number
                try:
                    apy_value = float(item['apy'])
                except (ValueError, TypeError):
                    print(f"‚ö†Ô∏è Skipping {item.get('bank_name', 'Unknown')}: invalid APY value '{item.get('apy')}'")
                    continue
                
                # Skip if APY is unreasonably low or high (sanity check)
                if apy_value < 0.1 or apy_value > 20.0:
                    print(f"‚ö†Ô∏è Skipping {item.get('bank_name', 'Unknown')}: APY {apy_value}% outside reasonable range")
                    continue
                
                # Find matching source for attribution
                source_result = next(
                    (r for r in results if item['bank_name'].lower() in r.snippet.lower()),
                    results[0]  # fallback to first result
                )
                
                record = HYSARecord(
                    bank_name=item['bank_name'],
                    apy=apy_value,
                    minimum_deposit=item.get('minimum_deposit', 'Not specified'),
                    as_of_date=item.get('as_of_date'),
                    source_url=source_result.url,
                    source_title=source_result.title
                )
                records.append(record)
            
            print(f"‚úÖ Successfully extracted {len(records)} valid HYSA records")
            return records
            
        except json.JSONDecodeError as e:
            print(f"‚ùå Error parsing LLM response as JSON: {e}")
            print(f"   LLM Response: {response.choices[0].message.content[:200]}...")
            return []
        except Exception as e:
            print(f"‚ùå Error extracting data: {e}")
            return []
    
    def merge_and_dedupe(self, records: List[HYSARecord]) -> List[HYSARecord]:
        """
        Merge and deduplicate HYSA records by bank name
        
        Args:
            records: List of HYSA records
            
        Returns:
            Deduplicated list sorted by APY (highest first)
        """
        # Group by bank name (case-insensitive)
        bank_groups = {}
        
        for record in records:
            bank_key = record.bank_name.lower().strip()
            
            if bank_key not in bank_groups:
                bank_groups[bank_key] = []
            bank_groups[bank_key].append(record)
        
        # For each bank, pick the record with highest APY (most recent/authoritative)
        merged = []
        for bank_records in bank_groups.values():
            # Sort by APY descending, then by source credibility
            best_record = max(bank_records, key=lambda r: (r.apy, self._source_credibility_score(r.source_url)))
            merged.append(best_record)
        
        # Sort by APY descending
        return sorted(merged, key=lambda r: r.apy, reverse=True)
    
    def _source_credibility_score(self, url: str) -> int:
        """Assign credibility scores to different sources"""
        domain = self._extract_domain(url)
        
        credibility_map = {
            'bankrate.com': 10,
            'nerdwallet.com': 9,
            'forbes.com': 8,
            'investopedia.com': 8,
            'money.com': 7
        }
        
        for source, score in credibility_map.items():
            if source in domain:
                return score
        return 5  # default score
    
    def synthesize_summary(self, records: List[HYSARecord], user_constraints: str = "no minimum deposit preferred") -> HYSASummary:
        """
        Use LLM to synthesize final HYSA summary
        
        Args:
            records: List of merged HYSA records
            user_constraints: User preferences (e.g., "no minimum deposit")
            
        Returns:
            Complete HYSA summary
        """
        # Take top 5 records
        top_5 = records[:5]
        
        # Prepare data for JSON serialization
        data_for_prompt = []
        for r in top_5:
            data_for_prompt.append({
                'bank': r.bank_name, 
                'apy': r.apy, 
                'minimum': r.minimum_deposit
            })
        
        synthesis_prompt = f"""
You are a financial advisor creating a concise summary of the best High-Yield Savings Account rates available today.

User constraint: {user_constraints}

Create a professional summary with:
1. A brief intro stating the date and scope
2. A top-5 list in format: "Bank Name ‚Äî APY% ‚Äî Minimum deposit"
3. A 1-2 sentence takeaway about the rate range and any important caveats
4. Keep it concise and user-ready

Available data:
{json.dumps(data_for_prompt, indent=2)}

Format your response as JSON with fields: intro, takeaway
"""
        
        try:
            response = client.chat.completions.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": synthesis_prompt},
                    {"role": "user", "content": "Generate the summary now."}
                ],
                temperature=0.3
            )
            
            summary_data = json.loads(response.choices[0].message.content)
            
            # Compile sources
            sources = []
            seen_sources = set()
            
            for record in top_5:
                if record.source_url not in seen_sources:
                    sources.append({
                        'title': record.source_title,
                        'url': record.source_url,
                        'publisher': record.source_url.split('/')[2].replace('www.', ''),
                        'as_of': record.as_of_date or 'Recent'
                    })
                    seen_sources.add(record.source_url)
            
            return HYSASummary(
                intro=summary_data['intro'],
                top_rates=top_5,
                takeaway=summary_data['takeaway'],
                sources=sources,
                disclaimer="Rates change frequently and may vary by region; verify on the bank's site."
            )
            
        except Exception as e:
            print(f"Error in synthesis: {e}")
            # Fallback summary
            return HYSASummary(
                intro=f"Top High-Yield Savings Account rates as of {datetime.now().strftime('%B %d, %Y')}",
                top_rates=top_5,
                takeaway="Current HYSA rates range from 4.25% to 4.60% APY, with many offering no minimum deposit requirements.",
                sources=[],
                disclaimer="Rates change frequently and may vary by region; verify on the bank's site."
            )
    
    def find_top_hysa_rates(self, user_query: str = "best high-yield savings rates no minimum") -> HYSASummary:
        """
        Main method to find and summarize top HYSA rates
        
        Args:
            user_query: User's search query
            
        Returns:
            Complete HYSA summary
        """
        print("üîç Searching for current HYSA rates...")
        
        # Step 1: Search web
        search_queries = [
            f"{user_query} 2025",
            "high yield savings account rates today",
            "best HYSA APY site:bankrate.com OR site:nerdwallet.com"
        ]
        
        all_results = []
        for query in search_queries:
            results = self.search_web(query, num_results=5)
            all_results.extend(results)
        
        print(f"üìä Found {len(all_results)} search results")
        
        # Step 2: Filter to reputable sources
        filtered_results = self.filter_reputable_sources(all_results)
        print(f"‚úÖ Filtered to {len(filtered_results)} reputable sources")
        
        # Step 3: Extract HYSA data
        extracted_records = self.extract_hysa_data(filtered_results)
        print(f"üè¶ Extracted {len(extracted_records)} HYSA records")
        
        # Step 4: Merge and dedupe
        merged_records = self.merge_and_dedupe(extracted_records)
        print(f"üîÑ Merged to {len(merged_records)} unique banks")
        
        # Step 5: Synthesize summary
        summary = self.synthesize_summary(merged_records, user_query)
        print("üìù Generated final summary")
        
        return summary

In [4]:
# Initialize the agent
agent = WebSearchAgent()

print("ü§ñ Web Search Agent initialized")
print(f"üèõÔ∏è  Monitoring {len(agent.reputable_domains)} reputable financial sources")
print("üéØ Ready to find top HYSA rates!")

ü§ñ Web Search Agent initialized
üèõÔ∏è  Monitoring 11 reputable financial sources
üéØ Ready to find top HYSA rates!


In [5]:
# Test the agent with a typical user query
user_query = "What are the top HYSA rates in the U.S. today for accounts with no minimum?"

print(f"User Query: {user_query}")
print("=" * 60)

# Run the search and analysis
summary = agent.find_top_hysa_rates(user_query)

print("\n" + "=" * 60)
print("‚úÖ Search complete!")

User Query: What are the top HYSA rates in the U.S. today for accounts with no minimum?
üîç Searching for current HYSA rates...
‚úÖ Found 5 results from Tavily API
‚úÖ Found 5 results from Tavily API
‚úÖ Found 5 results from Tavily API
‚úÖ Found 5 results from Tavily API
‚úÖ Found 5 results from Tavily API
üìä Found 15 search results
‚úÖ Filtered to 15 reputable sources
‚úÖ Found 5 results from Tavily API
üìä Found 15 search results
‚úÖ Filtered to 15 reputable sources
‚ö†Ô∏è Skipping Axos Bank: invalid APY value 'Not specified'
‚ö†Ô∏è Skipping Axos Bank: invalid APY value 'Not specified'
‚úÖ Successfully extracted 16 valid HYSA records
üè¶ Extracted 16 HYSA records
üîÑ Merged to 6 unique banks
‚ö†Ô∏è Skipping Axos Bank: invalid APY value 'Not specified'
‚ö†Ô∏è Skipping Axos Bank: invalid APY value 'Not specified'
‚úÖ Successfully extracted 16 valid HYSA records
üè¶ Extracted 16 HYSA records
üîÑ Merged to 6 unique banks
üìù Generated final summary

‚úÖ Search complete!
üìù Gen

In [6]:
def format_hysa_summary(summary: HYSASummary) -> str:
    """
    Format the HYSA summary for user-friendly display
    
    Args:
        summary: HYSA summary object
        
    Returns:
        Formatted string for display
    """
    output = []
    
    # Header and intro
    output.append("# üè¶ Top High-Yield Savings Account Rates")
    output.append("")
    output.append(summary.intro)
    output.append("")
    
    # Top 5 rates
    output.append("## üìà Top 5 HYSA Rates")
    output.append("")
    
    for i, rate in enumerate(summary.top_rates, 1):
        minimum = rate.minimum_deposit or "Not specified"
        if str(minimum).lower() in ["no minimum", "$0", "0"]:
            minimum = "‚úÖ No minimum"
        
        output.append(f"{i}. **{rate.bank_name}** ‚Äî {rate.apy:.2f}% APY ‚Äî {minimum}")
    
    output.append("")
    
    # Takeaway
    output.append("## üí° Key Takeaway")
    output.append("")
    output.append(summary.takeaway)
    output.append("")
    
    # Sources
    if summary.sources:
        output.append("## üìö Sources")
        output.append("")
        
        for source in summary.sources:
            publisher = source['publisher'].replace('.com', '').title()
            as_of = f" (as of {source['as_of']})" if source['as_of'] != 'Recent' else ""
            output.append(f"- [{publisher}]({source['url']}){as_of}")
        
        output.append("")
    
    # Disclaimer
    output.append("## ‚ö†Ô∏è Important Note")
    output.append("")
    output.append(summary.disclaimer)
    
    return "\n".join(output)

# Display the formatted summary
formatted_output = format_hysa_summary(summary)
print(formatted_output)

# üè¶ Top High-Yield Savings Account Rates

As of today, here are the top 5 High-Yield Savings Account rates in the U.S. for accounts with no minimum deposit:

## üìà Top 5 HYSA Rates

1. **Not specified** ‚Äî 4.88% APY ‚Äî $10,000
2. **Axos Bank** ‚Äî 4.51% APY ‚Äî Not specified
3. **Peak Bank** ‚Äî 4.44% APY ‚Äî Not specified
4. **Openbank** ‚Äî 4.20% APY ‚Äî $500
5. **Marcus by Goldman Sachs** ‚Äî 3.65% APY ‚Äî ‚úÖ No minimum

## üí° Key Takeaway

The APY rates for these accounts range from 3.65% to 4.51%. It's important to note that 'Not specified' minimum could still mean there is a minimum deposit required, so it's recommended to check with the specific bank for details.

## üìö Sources

- [Bankrate](https://www.bankrate.com/banking/savings/high-yield-savings-rates-today-august-11-2025/) (as of October 16, 2025)
- [Nerdwallet](https://www.nerdwallet.com/best/banking/high-yield-online-savings-accounts) (as of October 2025)
- [Bankrate](https://www.bankrate.com/banking/savings/

## üéØ Key Takeaways

**What You Built:**
- ‚úÖ **Web Search Integration**: Real-time search with Tavily API and fallback handling
- ‚úÖ **Source Filtering**: Credibility-based filtering of financial sources
- ‚úÖ **LLM Data Extraction**: Parsed unstructured financial data into structured records
- ‚úÖ **Smart Deduplication**: Merged conflicting data by source authority
- ‚úÖ **Professional Output**: User-ready financial summaries with proper disclaimers

**Production Patterns Learned:**
- API integration with error handling and fallbacks
- Source credibility scoring for reliable information
- LLM-powered data extraction from web content
- Financial data normalization and presentation

**Next Steps:**
- Try different search queries and constraints
- Extend to other financial products (CDs, money market accounts)
- Add rate change tracking over time
- Implement user preference learning