# Data Pull Notebook

This notebook will demonstrate how to fetch and process energy market data.

## Test Individual Data Source Fetchers

The following cells will test the data fetching functions for each individual data source (ENTSO-E, Elexon, EIA, Nord Pool, and Mock).
This helps in isolating and debugging issues with specific APIs or configurations.

**Common reasons for failures include:**
- Invalid, expired, or incorrectly configured API keys.
- API endpoint changes or deprecation.
- Data not yet published for the requested date/time.
- Incorrect request parameters (e.g., date format, region codes).
- Network issues or API provider downtime.
- Rate limiting by the API provider.

We will print the exact URL requested for each API to help in debugging.

In [1]:
import pandas as pd
from datetime import datetime, timedelta, timezone # Added timezone
import sys
import os

# Adjust path if running notebook from a different directory relative to src
module_path = os.path.abspath(os.path.join(os.getcwd(), '..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from src.fetching.price_fetchers import (
    fetch_day_ahead_prices,  # ENTSO-E
    fetch_elexon_prices,
    fetch_eia_prices,
    fetch_nord_pool_prices,
    get_day_ahead_prices # Smart wrapper
)
from src.utils.mock_data_generator import generate_mock_price_data
from src import config

# Use a fixed recent date for testing.
# Fetching for "yesterday" (UTC) is a good practice for day-ahead data.
test_date = datetime.now(timezone.utc) - timedelta(days=4)
# You can also test with a specific past date known to have data:
# test_date = datetime(2024, 5, 20, tzinfo=timezone.utc) 

print(f"--- Initializing Data Fetch Test ---")
print(f"Attempting to fetch data for date (UTC): {test_date.strftime('%Y-%m-%d')}")
print(f"Current DATA_SOURCE in .env: {config.DATA_SOURCE}")
print(f"Preferred data sources in .env: {config.PREFERRED_DATA_SOURCES}")
print("-" * 70 + "\n")

--- Initializing Data Fetch Test ---
Attempting to fetch data for date (UTC): 2025-05-23
Current DATA_SOURCE in .env: elexon" # Options: "auto", "mock", "entsoe", "elexon", "eia", "nordpool", "all
Preferred data sources in .env: ['entsoe', 'elexon', 'eia', 'nordpool']
----------------------------------------------------------------------



## ENTSO-E Data Source

The ENTSO-E Transparency Platform provides market data for the European electricity market. The `fetch_day_ahead_prices` function connects to ENTSO-E's API to retrieve day-ahead electricity prices.

**Key Requirements:**
- Valid ENTSO-E API key configured in `.env` file
- Correct domain configurations (default is typically for Great Britain: 10YGB----------A)
- Data availability for the requested date

**Common Issues:**
- API key expiration or validation issues
- Region/domain code misconfiguration
- Data not yet published for the requested period

In [2]:
# --- Test ENTSO-E ---
print("### Testing ENTSO-E (fetch_day_ahead_prices) ###")
if config.ENTSOE_API_KEY:
    print(f"ENTSOE_API_KEY found: {config.ENTSOE_API_KEY[:5]}... (partially hidden)")
    
    # Print the parameters and URL that will be constructed
    print("ENTSO-E request parameters:")
    day = test_date.strftime("%Y%m%d")
    params = {
        "documentType": config.ENTSOE_DOC_TYPE,
        "processType": config.ENTSOE_PROCESS_TYPE,
        "in_Domain": config.ENTSOE_IN_DOMAIN,
        "out_Domain": config.ENTSOE_OUT_DOMAIN,
        "periodStart": day + "0000",
        "periodEnd": day + "2359",
        "securityToken": config.ENTSOE_API_KEY[:5] + "..." # Show only start of key
    }
    print(params)
    
    # Construct the URL manually for display
    import urllib.parse
    base_url = config.ENTSOE_API_URL
    query_params = params.copy()
    query_params["securityToken"] = query_params["securityToken"][:5] + "..."  # Hide most of key
    url_preview = f"{base_url}?{urllib.parse.urlencode(query_params)}"
    print(f"URL preview: {url_preview}")
    print("Note: The actual URL will be properly encoded by the requests library")
    
    try:
        df_entsoe = fetch_day_ahead_prices(date=test_date)
        if not df_entsoe.empty:
            print("ENTSO-E data fetched successfully:")
            print(df_entsoe.head())
        else:
            print("No data returned from ENTSO-E. This could mean the API call was successful but yielded no data for the period, or an error occurred silently within the fetcher (check console for detailed error messages from the fetcher).")
    except Exception as e:
        print(f"An unexpected error occurred while calling fetch_day_ahead_prices (ENTSO-E): {e}")
        print("This might indicate a problem with the request construction or response handling if not caught by the inner try-except.")
else:
    print("ENTSOE_API_KEY not configured in .env. Skipping ENTSO-E test.")
print("Common ENTSO-E Issues: Invalid/expired API key (403 Forbidden), incorrect domain codes, or data not available for the specific country/period.")
print("-" * 70 + "\n")

### Testing ENTSO-E (fetch_day_ahead_prices) ###
ENTSOE_API_KEY found: 3da45... (partially hidden)
ENTSO-E request parameters:
{'documentType': 'A44', 'processType': 'A01', 'in_Domain': '10YGB----------A', 'out_Domain': '10YGB----------A', 'periodStart': '202505230000', 'periodEnd': '202505232359', 'securityToken': '3da45...'}
URL preview: https://web-api.tp.entsoe.eu/api?documentType=A44&processType=A01&in_Domain=10YGB----------A&out_Domain=10YGB----------A&periodStart=202505230000&periodEnd=202505232359&securityToken=3da45...
Note: The actual URL will be properly encoded by the requests library
ENTSO-E Request URL: https://web-api.tp.entsoe.eu/api?documentType=A44&processType=A01&in_Domain=10YGB----------A&out_Domain=10YGB----------A&periodStart=202505230000&periodEnd=202505232359&securityToken=3da45fa2-81a3-4dfc-a79b-c2ae7fbc740f
Error processing ENTSO-E data: xpath does not return any nodes or attributes. Be sure to specify in `xpath` the parent nodes of children and attributes to 

## Elexon Data Source

Elexon operates the Balancing Mechanism Reporting Service (BMRS) which provides key data relating to the GB electricity balancing and settlement arrangements. The `fetch_elexon_prices` function can access various datasets:

- **MID**: Market Index Data - provides reference prices for trading
- **BOAL**: Bid-Offer Acceptances - records of balancing actions
- **BOD**: Bid-Offer Data - detailed pricing information for specific balancing mechanism units

**Key Requirements:**
- Valid Elexon API key configured in `.env` file
- Knowledge of relevant BM units for BOD dataset
- Correct dataset specification

In [3]:
# --- Test Elexon ---
print("### Testing Elexon Price Datasets ###")
if config.ELEXON_API_KEY:
    print(f"ELEXON_API_KEY found: {config.ELEXON_API_KEY[:5]}... (partially hidden)")
    
    # Dictionary to store results from each dataset
    elexon_datasets = {}
    
    # For MID and BOAL datasets (no BM units needed)
    for dataset in ["MID", "BOAL"]:
        print(f"\nFetching {dataset} dataset...")
        try:
            df = fetch_elexon_prices(date=test_date, dataset=dataset)
            if not df.empty:
                print(f"{dataset} data fetched successfully:")
                print(df.head())
                elexon_datasets[dataset] = df
            else:
                print(f"No data returned for {dataset}. Check if data is available for the date.")
        except Exception as e:
            print(f"Error fetching {dataset} dataset: {e}")
    
    # For BOD dataset (requires BM units)
    print("\nFetching BOD dataset...")
    try:
        # Replace with relevant BM units for your analysis
        bm_units = ["T_DRAXX-1", "T_DIDCB-1"]  # Example BM units 
        df = fetch_elexon_prices(date=test_date, dataset="BOD", bm_units=bm_units)
        if not df.empty:
            print("BOD data fetched successfully:")
            print(df.head())
            elexon_datasets["BOD"] = df
        else:
            print("No data returned for BOD. Check if data is available for the date and BM units.")
    except Exception as e:
        print(f"Error fetching BOD dataset: {e}")
    
    print("\nSummary of fetched datasets:")
    for dataset, df in elexon_datasets.items():
        print(f"- {dataset}: {len(df)} rows")
    
    if not elexon_datasets:
        print("No Elexon datasets were successfully fetched.")
else:
    print("ELEXON_API_KEY not configured in .env. Skipping Elexon test.")
print("-" * 70 + "\n")

### Testing Elexon Price Datasets ###
ELEXON_API_KEY found: 0r60y... (partially hidden)

Fetching MID dataset...
Requesting URL: https://data.elexon.co.uk/bmrs/api/v1/datasets/MID?from=2025-05-23T00:00:00Z&to=2025-05-23T23:59:59Z&format=json


MID data fetched successfully:
  dataset                 startTime dataProvider settlementDate  \
0     MID 2025-05-23 23:30:00+00:00      APXMIDP     2025-05-24   
1     MID 2025-05-23 23:30:00+00:00     N2EXMIDP     2025-05-24   
2     MID 2025-05-23 23:00:00+00:00      APXMIDP     2025-05-24   
3     MID 2025-05-23 23:00:00+00:00     N2EXMIDP     2025-05-24   
4     MID 2025-05-23 22:30:00+00:00      APXMIDP     2025-05-23   

   settlementPeriod  price  volume  
0                 2  65.48  1890.3  
1                 2   0.00     0.0  
2                 1  68.73  1882.1  
3                 1   0.00     0.0  
4                48  69.50  2744.7  

Fetching BOAL dataset...
Requesting URL for period 1: https://data.elexon.co.uk/bmrs/api/v1/balancing/acceptances/all?settlementDate=2025-05-23&settlementPeriod=1&format=json
Requesting URL for period 2: https://data.elexon.co.uk/bmrs/api/v1/balancing/acceptances/all?settlementDate=2025-05-23&settlementPeriod=2&format=json
Requesting URL for

## EIA Data Source

The U.S. Energy Information Administration (EIA) provides data on the U.S. electricity market. The `fetch_eia_prices` function retrieves data from EIA's API for specific time series.

**Key Requirements:**
- Valid EIA API key configured in `.env` file
- Correct Series ID configuration (determining which market/region data is retrieved)
- Data availability for the requested date

**Common Issues:**
- API key validation problems
- Incorrect Series ID configuration
- Data not available for the requested time period

In [4]:
# --- Test EIA ---
print("### Testing EIA (fetch_eia_prices) ###")
if config.EIA_API_KEY:
    print(f"EIA_API_KEY found: {config.EIA_API_KEY[:5]}... (partially hidden)")
    try:
        df_eia = fetch_eia_prices(date=test_date)
        if not df_eia.empty:
            print("EIA data fetched successfully:")
            print(df_eia.head())
        else:
            print("No data returned from EIA. A 404 error (seen in console from fetcher) usually means the series ID is incorrect or data for that series/date doesn't exist. A 403 could mean an API key issue.")
    except Exception as e:
        print(f"An unexpected error occurred while calling fetch_eia_prices: {e}")
else:
    print("EIA_API_KEY not configured in .env. Skipping EIA test.")
print(f"Common EIA Issues: Invalid API key, incorrect Series ID (current: {config.EIA_SERIES_ID}), or the series may not have data for the requested period. Check the EIA website to validate the series ID and data availability.")
print("-" * 70 + "\n")

### Testing EIA (fetch_eia_prices) ###
EIA_API_KEY found: 8QelB... (partially hidden)
EIA Request URL: https://api.eia.gov/series/?api_key=8QelBS9zCgy6SxvwxTw6glrtcox4KSGFx6mSDkTf&series_id=EBA.PJM-ALL.DF.H&start=20250523T00&end=20250523T23
Error fetching EIA data: 404 Client Error: Not Found for url: https://api.eia.gov/series/?api_key=8QelBS9zCgy6SxvwxTw6glrtcox4KSGFx6mSDkTf&series_id=EBA.PJM-ALL.DF.H&start=20250523T00&end=20250523T23
No data returned from EIA. A 404 error (seen in console from fetcher) usually means the series ID is incorrect or data for that series/date doesn't exist. A 403 could mean an API key issue.
Common EIA Issues: Invalid API key, incorrect Series ID (current: EBA.PJM-ALL.DF.H), or the series may not have data for the requested period. Check the EIA website to validate the series ID and data availability.
----------------------------------------------------------------------



## Nord Pool Data Source

Nord Pool operates the Nordic and Baltic power market. The `fetch_nord_pool_prices` function retrieves day-ahead market prices from the Nord Pool platform.

**Key Requirements:**
- Nord Pool API key (if accessing restricted data) configured in `.env` file
- Correct area configuration (e.g., "Oslo" for Oslo price area)
- Proper currency specification

**Notes:**
- Some Nord Pool data may be accessible without an API key via their public endpoints
- The API structure and access methods may change over time

In [5]:
# --- Test Nord Pool ---
print("### Testing Nord Pool (fetch_nord_pool_prices) ###")
# The fetch_nord_pool_prices function itself prints a message if the key is missing and it skips or attempts public access.
if config.NORD_POOL_API_KEY:
    print(f"NORD_POOL_API_KEY found: {config.NORD_POOL_API_KEY[:5]}... (partially hidden)")
else:
    print("NORD_POOL_API_KEY is not configured in .env. The fetcher might attempt public access if the endpoint allows.")
try:
    df_nordpool = fetch_nord_pool_prices(date=test_date)
    if not df_nordpool.empty:
        print("Nord Pool data fetched successfully:")
        print(df_nordpool.head())
    else:
        print("No data returned from Nord Pool. This could be due to the public API endpoint changing, data not being available for the area/date, or if an API key is now strictly required and not provided. Check console output from the fetch_nord_pool_prices function for details.")
except Exception as e:
    print(f"An unexpected error occurred while calling fetch_nord_pool_prices: {e}")
print(f"Common Nord Pool Issues: Public API endpoints can change. Ensure the configured area (current: {config.NORDPOOL_AREA}) and currency are correct. Some data might require specific authentication not handled by the public fetcher.")
print("-" * 70 + "\n")

### Testing Nord Pool (fetch_nord_pool_prices) ###
NORD_POOL_API_KEY is not configured in .env. The fetcher might attempt public access if the endpoint allows.
NORD_POOL_API_KEY not configured (or not required by public endpoint). Attempting fetch if endpoint is public, otherwise skipping.
Nord Pool Request URL: https://www.nordpoolgroup.com/api/marketdata/page/10?currency=EUR&endDate=23-05-2025
Error fetching Nord Pool data: 404 Client Error: Not Found for url: https://www.nordpoolgroup.com/api/marketdata/page/10?currency=EUR&endDate=23-05-2025
No data returned from Nord Pool. This could be due to the public API endpoint changing, data not being available for the area/date, or if an API key is now strictly required and not provided. Check console output from the fetch_nord_pool_prices function for details.
Common Nord Pool Issues: Public API endpoints can change. Ensure the configured area (current: Oslo) and currency are correct. Some data might require specific authentication not ha

## Mock Data Generator

The mock data generator creates synthetic price data for testing when real API data is not available. This is useful for:
- Testing downstream analysis without requiring API access
- Development work when offline
- Creating predictable data patterns for unit testing

The mock generator creates realistic but fictional price data following typical day-ahead market patterns.

In [6]:
# --- Test Mock Data Generator ---
print("### Testing Mock Data (generate_mock_price_data) ###")
try:
    df_mock = generate_mock_price_data(date=test_date)
    if not df_mock.empty:
        print("Mock data generated successfully:")
        print(df_mock.head())
    else:
        print("Failed to generate mock data (should not happen with current implementation).")
except Exception as e:
    print(f"An unexpected error occurred while calling generate_mock_price_data: {e}")
print("-" * 70 + "\n")

### Testing Mock Data (generate_mock_price_data) ###
Mock data generated successfully:
                 date  price_€/MWh
0 2025-05-23 00:00:00    50.288543
1 2025-05-23 01:00:00    58.205023
2 2025-05-23 02:00:00    61.058580
3 2025-05-23 03:00:00    68.518309
4 2025-05-23 04:00:00    31.066489
----------------------------------------------------------------------



## Smart Wrapper

The `get_day_ahead_prices` function is a smart wrapper that attempts to fetch data from multiple sources based on configuration. It will:

1. Try the primary source specified in the `.env` file's `DATA_SOURCE` variable
2. If the primary source fails, try alternative sources in the order specified in `PREFERRED_DATA_SOURCES`
3. Fall back to mock data generation if all real data sources fail

This provides resilience and ensures that analysis pipelines can continue even when individual data sources are unavailable.

In [7]:
# --- Test Smart Wrapper get_day_ahead_prices ---
# This will try sources based on DATA_SOURCE and PREFERRED_DATA_SOURCES from .env
print(f"### Testing Smart Wrapper (get_day_ahead_prices) with DATA_SOURCE='{config.DATA_SOURCE}' ###")
try:
    df_wrapper, source_name_wrapper = get_day_ahead_prices(date=test_date)
    if not df_wrapper.empty:
        print(f"Smart wrapper fetched data successfully from source: {source_name_wrapper}")
        print(df_wrapper.head())
    else:
        print(f"Smart wrapper did not return data. Fallback source reported by wrapper: {source_name_wrapper}")
        print("This means all configured/preferred sources failed or returned no data. Check console for detailed logs from the wrapper about each attempt.")
except Exception as e:
    print(f"An unexpected error occurred while calling get_day_ahead_prices (smart wrapper): {e}")
print("-" * 70 + "\n")

print("--- Individual Fetch Tests Complete ---")
print("Review the console output above each section for detailed messages from the fetcher functions, including requested URLs and specific error messages from APIs.")

### Testing Smart Wrapper (get_day_ahead_prices) with DATA_SOURCE='elexon" # Options: "auto", "mock", "entsoe", "elexon", "eia", "nordpool", "all' ###
No valid data sources to attempt based on configuration (e.g., missing API key for specified source or all preferred sources). Using mock data.
Smart wrapper fetched data successfully from source: mock
                 date  price_€/MWh
0 2025-05-23 00:00:00    48.436003
1 2025-05-23 01:00:00    47.849686
2 2025-05-23 02:00:00    34.113320
3 2025-05-23 03:00:00    51.631607
4 2025-05-23 04:00:00    66.152238
----------------------------------------------------------------------

--- Individual Fetch Tests Complete ---
Review the console output above each section for detailed messages from the fetcher functions, including requested URLs and specific error messages from APIs.


## General Debugging Advice

If APIs are consistently failing:

1.  **Verify API Keys**:
    *   **ENTSO-E**: Log in to the ENTSO-E Transparency Platform and check if your API key (security token) is active and has the correct permissions.
    *   **EIA**: Ensure your EIA API key is active. Keys can be obtained from the EIA website.
    *   **Elexon**: Check your Elexon API key on the Elexon portal.
    *   Ensure keys are correctly copied into your `.env` file.

2.  **Check API Documentation**:
    *   API endpoints, parameters, or authentication methods can change. Refer to the official documentation for each service:
        *   ENTSO-E: [ENTSO-E Transparency Platform RESTful API](https://transparency.entsoe.eu/content/static_content/Static%20content/web%20api/Guide.html)
        *   Elexon: [Elexon BMRS API](https://www.elexon.co.uk/guidance-note/bmrs-api-data-push-user-guide/) (or Insights API if different)
        *   EIA: [EIA Open Data API](https://www.eia.gov/opendata/documentation/)
        *   Nord Pool: [Nord Pool Market Data API](https://www.nordpoolgroup.com/services/market-data-services/) (check for public vs. commercial API details)

3.  **Test API Calls Externally**:
    *   Use tools like `curl` in your terminal or Postman to make direct API calls with the printed URLs and your keys. This helps isolate whether the issue is with the API itself, your key, or the Python code.
    *   Example `curl` for ENTSO-E (replace placeholders):
        `curl -G "https://transparency.entsoe.eu/api" --data-urlencode "documentType=A44" --data-urlencode "processType=A01" --data-urlencode "in_Domain=10YGB----------A" --data-urlencode "out_Domain=10YGB----------A" --data-urlencode "periodStart=YYYYMMDD0000" --data-urlencode "periodEnd=YYYYMMDD2359" --data-urlencode "securityToken=YOUR_ENTSOE_KEY"`

4.  **Check Data Availability**:
    *   For the specific dates and regions/series IDs you are requesting, verify on the provider's public data portal that data actually exists and is published. Day-ahead data often has a publication schedule.

5.  **Review Request Parameters**:
    *   Double-check that all required parameters (like `in_Domain`/`out_Domain` for ENTSO-E, `series_id` for EIA, `area` for Nord Pool) are correct for the data you're trying to fetch.

6.  **Network and Firewalls**:
    *   Ensure there are no local network issues, firewalls, or proxies blocking outbound requests to these APIs.

7.  **API Provider Status**:
    *   Check if the API provider has a status page indicating any ongoing outages or maintenance.

## Summary and Next Steps

This notebook has tested all available data sources and demonstrated the resilience of the smart wrapper approach. Based on the results above:

1. **Working Data Sources**: Note which data sources successfully returned data
2. **Failed Data Sources**: Investigate any failures using the debugging advice
3. **Data Quality**: Examine the returned data to ensure it meets your analysis requirements
4. **Configuration Updates**: Update your `.env` file to prioritize the most reliable sources

The next steps would typically involve using the fetched data in analysis notebooks, ensuring the data pipeline automation works correctly, and addressing any persistent issues with data sources.