# MCP Historical Weather Comparison

## Overview

This notebook demonstrates how to use **Model Context Protocol (MCP)** to extend Claude's capabilities with real-time data access. We'll build a system that allows Claude to access historical weather data and compare annual weather statistics between two locations.

### What is MCP?

Model Context Protocol (MCP) is a standard that enables AI models to securely connect with external data sources and tools. Instead of being limited to training data, models can access live information, APIs, and services.

### What You'll Learn

1. **Limitations of LLMs without external tools** - See how Claude responds to weather queries without access to real data
2. **MCP Integration** - Connect Claude to a historical weather API
3. **Data Visualization** - Create aesthetically pleasing charts comparing weather patterns
4. **Interactive Analysis** - Ask Claude to analyze and compare weather data between locations

### Goals

By the end of this notebook, you'll have a working system that can:
- Fetch historical weather data for any location
- Compare annual weather statistics between two cities
- Generate visualizations and insights about weather patterns
- Demonstrate the power of extending LLMs with external data sources

## Setup

### Environment Setup

This notebook is designed to work in both Google Colab and local Jupyter environments. The following cell will automatically detect the environment and install the necessary dependencies.

In [1]:
import sys

# Check if we're running in Google Colab
IN_COLAB = 'google.colab' in sys.modules

# Install required packages
if IN_COLAB:
    # Install packages for Colab environment
    packages = [
        "anthropic>=0.66.0",
        "altair>=5.5.0",
        "openmeteo-requests>=1.7.2"
        "pandas>=2.3.2",
        "requests>=2.32.5",
        "retry-requests>=2.0.0",
        "requests-cache==1.2.1",
    ]
    !pip install {" ".join(packages)}

### Python Package Imports

Now let's import all the necessary packages for our weather comparison system.

In [2]:
import os
import json
import getpass
from typing import Optional, Dict, Any
from datetime import datetime, timedelta

# Data handling and validation
import requests
from pydantic import BaseModel, Field, validator

# Visualization
import altair as alt
alt.data_transformers.enable('json')

# For displaying rich notebook content
from IPython.display import Image, Markdown, display

# Claude API
from anthropic import Anthropic

import requests_cache  # for caching API responses
from retry_requests import retry  # for retrying API requests
import openmeteo_requests  # for making API requests to Open-Meteo

import numpy as np  # for numerical operations
import pandas as pd  # for data manipulation

print("All packages imported successfully!")

All packages imported successfully!


### Anthropic API Key Setup

We need to securely obtain your Anthropic API key to interact with Claude. This function will try multiple methods in order of preference for security.

In [3]:
def get_api_key():
    try:
        if IN_COLAB:
            # Import package for accessing user data
            from google.colab import userdata
            api_key = userdata.get('ANTHROPIC_API_KEY')
        else:
            api_key = os.environ.get("ANTHROPIC_API_KEY")
    except:
        # Prompt user for their API key
        api_key = getpass.getpass("Enter your Anthropic API key: ")
    return api_key

## Motivation: Do we actually need to extend LLMs?

Before diving into MCP integration, let's first see what happens when we ask Claude to compare weather data between two locations **without giving it access to any external tools or data sources**.

This will demonstrate the fundamental limitation of LLMs: they can only work with information from their training data, which has a knowledge cutoff and may not include specific, current, or detailed data.

In [4]:
def ask_claude_without_tools(question: str) -> str:
    """
    Send a question to Claude without any external tools or data access.
    
    Args:
        question: The question to ask Claude
    
    Returns:
        Claude's response as a string
    """
    try:
        client = Anthropic(api_key=get_api_key())
        message = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1000,
            messages=[
                {
                    "role": "user", 
                    "content": question
                }
            ]
        )
        return message.content[0].text
    except Exception as e:
        return f"Error: {e}"

# Test Claude's response without external data
weather_question = """
Please compare the annual weather statistics for San Francisco, California
and Redwood City, California for the years 2000-2023. 
I'd like to see:

1. Average temperatures by year
2. Total precipitation by year  
3. Number of sunny days by year
4. Humidity levels by year
5. Any notable weather patterns or extremes

Please provide specific data and create a comparison showing which city had more favorable weather conditions.
"""

print("🤖 Asking Claude about weather data WITHOUT external tools...")
print("="*70)
print(f"Question: {weather_question}")
print("="*70)
print("Claude's Response:")
print()

response = ask_claude_without_tools(weather_question)
Markdown(response)

🤖 Asking Claude about weather data WITHOUT external tools...
Question: 
Please compare the annual weather statistics for San Francisco, California
and Redwood City, California for the years 2000-2023. 
I'd like to see:

1. Average temperatures by year
2. Total precipitation by year  
3. Number of sunny days by year
4. Humidity levels by year
5. Any notable weather patterns or extremes

Please provide specific data and create a comparison showing which city had more favorable weather conditions.

Claude's Response:



I don't have access to specific historical weather data for San Francisco and Redwood City from 2000-2023. However, I can provide you with general information about how these two cities typically compare and suggest where to find the detailed data you're looking for.

## General Climate Comparison

**San Francisco vs. Redwood City (typical patterns):**

### Temperature
- **San Francisco**: More moderate year-round due to ocean influence
  - Summer highs: ~65-70°F
  - Winter lows: ~45-50°F
- **Redwood City**: More continental climate with greater temperature swings
  - Summer highs: ~75-85°F
  - Winter lows: ~40-45°F

### Precipitation
- Both cities have Mediterranean climates with wet winters/dry summers
- **San Francisco**: ~23 inches annually
- **Redwood City**: ~20 inches annually

### Sunshine
- **Redwood City**: Typically sunnier, especially in summer
- **San Francisco**: More fog, particularly summer mornings

### Humidity
- **San Francisco**: Higher humidity due to ocean proximity
- **Redwood City**: Lower humidity, more inland location

## Where to Find Specific Data

For the detailed yearly statistics you requested (2000-2023), I recommend:

1. **NOAA Climate Data Online** (climate.gov)
2. **Weather Underground Historical Weather**
3. **National Weather Service Climate Data**
4. **California Climate Data Archive**

These sources can provide the specific annual breakdowns you're looking for, including temperature averages, precipitation totals, sunny days, and humidity levels.

Would you like me to help you interpret the data once you gather it, or provide more specific guidance on accessing these weather databases?

### Key Observations

As you can see from Claude's response above, without access to external data sources, the model has several limitations:

1. **No real-time data**: Claude can't access current or specific historical weather data
2. **General knowledge only**: Responses are based on general patterns from training data
3. **No specific metrics**: Can't provide exact precipitation amounts, temperatures, or day counts
4. **No visualizations**: Can't create charts or graphs from actual data

This demonstrates why **Model Context Protocol (MCP)** is valuable - it bridges the gap between the model's reasoning capabilities and real-world data access.

## Analysis

*This section will contain analysis of weather data patterns and comparisons between locations once we implement the MCP integration.*

```mermaid
graph TD
    A[Weather API] --> B[MCP Server]
    B --> C[Claude Client]
    C --> D[Data Analysis]
    D --> E[Visualization]
    E --> F[User Insights]
```

In this section we will create some Python functions that:
- retrieve daily data from the Open-Mateo Weather API for a list of locations
- aggregate daily data to annual summaries
- create a time-series chart, plotting the annual data and a trend

In [5]:
class Location(BaseModel):
    name: str
    longitude: float
    latitude: float

## Get weather data

In [6]:
api_url = "https://archive-api.open-meteo.com/v1/archive"

weather_variables = [
    "temperature_2m_max",
    "temperature_2m_mean",
    "temperature_2m_min",
    "rain_sum",
    "snowfall_sum",
    "precipitation_hours",
    "sunshine_duration"
  ]

# Setup an Open-Meteo API client with a cache and retry mechanism
cache_session = requests_cache.CachedSession('.cache', expire_after=3600)
retry_session = retry(cache_session, retries=5, backoff_factor=0.2)
openmeteo = openmeteo_requests.Client(session=retry_session)

In [7]:
def get_weather_data(
      locations: list[Location],
      start_date: str,
      end_date: str,
      variables: list[str]
    ):
    """
    Get weather data for one or more locations.
    
    Args:
        locations: List of Location objects
        start_date: Start date in YYYY-MM-DD format
        end_date: End date in YYYY-MM-DD format
        variables: List of weather variables to retrieve
    
    Returns:
        Pandas DataFrame with weather data
    """
    
    def parse_response(variables, response, location_name):
        daily = response.Daily()
        daily_data_dict = {
            "date": pd.date_range(
            start = pd.to_datetime(daily.Time(), unit = "s", utc = True),
            end = pd.to_datetime(daily.TimeEnd(), unit = "s", utc = True),
            freq = pd.Timedelta(seconds = daily.Interval()),
            inclusive = "left"
            )
        }

        # Add the variable data.
        for i, variable in enumerate(variables):
            daily_data_dict[variable] = daily.Variables(i).ValuesAsNumpy()

        # Add a column for the location name.
        daily_data_dict["location_name"] = location_name

        return pd.DataFrame(daily_data_dict)

  
    params = {
        "latitude": [x.latitude for x in locations],
        "longitude": [x.longitude for x in locations],
        "start_date": start_date,
        "end_date": end_date,
        "daily": variables,
    }

    # Query for weather data and get one response per location.
    responses = openmeteo.weather_api(api_url, params=params)

    # Concatenate all of the responses into a single dataframe
    daily_df_list = [parse_response(variables, response, locations[i].name) for i, response in enumerate(responses)]
    daily_df = pd.concat(daily_df_list, axis=0)

    return daily_df

### Try it out

In [8]:
test_locations = [
  Location(
    name='San Francisco',
    latitude=37.7749,
    longitude=-122.4194,
  ),
  Location(
    name='Redwood City',
    latitude=37.4848,
    longitude=-122.2281,
  )
]

# Test the original functionality
daily_data = get_weather_data(
  locations=test_locations,
  start_date="2000-01-01",
  end_date="2019-12-31",
  variables=['temperature_2m_mean', 'temperature_2m_max', 'rain_sum', 'sunshine_duration']
)
daily_data

Unnamed: 0,date,temperature_2m_mean,temperature_2m_max,rain_sum,sunshine_duration,location_name
0,2000-01-01 00:00:00+00:00,7.517750,11.574000,0.0,29323.564453,San Francisco
1,2000-01-02 00:00:00+00:00,8.274000,12.074000,0.0,30698.628906,San Francisco
2,2000-01-03 00:00:00+00:00,7.753168,12.524000,0.0,30748.666016,San Francisco
3,2000-01-04 00:00:00+00:00,8.513582,11.974000,0.6,7515.774902,San Francisco
4,2000-01-05 00:00:00+00:00,10.132333,15.374000,0.0,25200.000000,San Francisco
...,...,...,...,...,...,...
7300,2019-12-27 00:00:00+00:00,9.049335,12.978499,0.0,30588.695312,Redwood City
7301,2019-12-28 00:00:00+00:00,8.543084,13.328500,0.0,30485.902344,Redwood City
7302,2019-12-29 00:00:00+00:00,10.351417,13.728499,3.6,12621.796875,Redwood City
7303,2019-12-30 00:00:00+00:00,10.066000,13.928500,5.9,25244.691406,Redwood City


## Calculate Annual Statistics

To enable better long-term comparisons, we can write a function that calculates annual statistics of how many times a variable exceeds a minimum or maximum threshold.

Examples:
- Days that the maximum temperature exceeds 30 degrees C
- Days that the mean temperatures is between 20 and 25 degrees C
- Days that rain exceeds 2 mm 

In [9]:
def calculate_annual_stats(
    daily_data: pd.DataFrame,
    variable: str,
    threshold_min: float = None,
    threshold_max: float = None
    ) -> pd.DataFrame:
  """
  Calculate annual statistics for the given daily data.
  
  Args:
    daily_data: DataFrame containing weather data
    variable: Name of the variable column to analyze
    threshold_min: Optional minimum threshold (inclusive)
    threshold_max: Optional maximum threshold (inclusive)
  
  Returns:
    DataFrame with columns: year, count, location_name
  """
  # Validate threshold parameters
  if threshold_min is None and threshold_max is None:
    raise ValueError("At least one of threshold_min or threshold_max must be provided")
  
  # Get unique location names from the daily data
  locations = daily_data['location_name'].unique()
  
  # Initialize list to store results
  results = []
  
  # Process each location
  for location in locations:
    # Filter data for this location and make an explicit copy to avoid SettingWithCopyWarning
    location_data = daily_data[daily_data['location_name'] == location].copy()
    
    # Extract year from date column
    location_data['year'] = pd.to_datetime(location_data['date']).dt.year
    
    # Apply threshold filters
    def count_days_in_range(x):
      mask = pd.Series(True, index=x.index)  # Start with all True
      
      if threshold_min is not None:
        mask = mask & (x >= threshold_min)
      
      if threshold_max is not None:
        mask = mask & (x <= threshold_max)
      
      return mask.sum()
    
    yearly_counts = location_data.groupby('year')[variable].apply(count_days_in_range)
    
    mean_daily = location_data.groupby('year')[variable].mean()

    # Convert to dataframe
    yearly_df = pd.DataFrame({
      'year': yearly_counts.index,
      'count': yearly_counts.values,
      'location_name': location
    })
    
    results.append(yearly_df)
    
  # Combine all results
  return pd.concat(results, axis=0).reset_index(drop=True)

### Try it out

In [10]:
annual_stats = calculate_annual_stats(
    daily_data,
    variable='temperature_2m_mean',
    threshold_min=20,
    threshold_max=25
)
annual_stats

Unnamed: 0,year,count,location_name
0,2000,9,San Francisco
1,2001,5,San Francisco
2,2002,7,San Francisco
3,2003,13,San Francisco
4,2004,13,San Francisco
5,2005,0,San Francisco
6,2006,5,San Francisco
7,2007,7,San Francisco
8,2008,15,San Francisco
9,2009,9,San Francisco


## Create a Timeseries Chart

Even tables of annual statistics can get pretty long, so we create a function for charting the data.

In [11]:
def create_annual_stats_chart(
    annual_stats: pd.DataFrame,
    title: str
  ) -> alt.Chart:
  """
  Create a chart showing the annual statistics.
  """
  # Base chart with data points
  base = alt.Chart(annual_stats).encode(
    x='year:O',
    y='count:Q',
    color='location_name:N'
  )

  # Create line chart
  lines = base.mark_line()

  # Add trend lines
  trend_lines = base.transform_regression(
    'year', 'count', 
    groupby=['location_name']
  ).mark_line(
    strokeDash=[5,5]
  ).encode(
    color='location_name:N'
  )

  # Combine the line and trend lines
  return (lines + trend_lines).properties(
    title=title
  )

### Try it out

In [12]:
create_annual_stats_chart(
    annual_stats,
    title='Temperature Mean between 20 and 25 degrees C'
)