# Locational Marginal Price (LMP) Query

In this notebook, the OASIS API is accessed to pull electricity price data. In particular, we will pull hourly locational marginal price (LMP) data for each default load aggregation point at five minute intervals from 2023 and 2024. Locational marginal pricing is used to set real-time power costs at different nodes on the grid, accounting for location-specific differences in load, generation, and transmission. Default load aggregation points (DLAPs) are nodes where regional demand is aggregated and priced (rather than pricing each meter separately). OASIS can be accessed here: https://oasis.caiso.com/mrioasis/logon.do.

In [1]:
## Import required libraries
import requests
import pandas as pd
from io import BytesIO
from zipfile import ZipFile
import time

First, we define the start and end timestamps.

In [2]:
## Define daterange from 2023 through 2024
start = pd.Timestamp('2023-01-01 00:00:00', tz='America/Los_Angeles').tz_convert('UTC')
end = pd.Timestamp('2025-01-01 00:00:00', tz='America/Los_Angeles').tz_convert('UTC')

Next, we pull the pricing data from the OASIS API one week at a time, continuously checking whether we've reached the end of the date range. This cell will take several minutes to run.

In [3]:
## Loop through requests
LMP_df = pd.DataFrame()
current = start
while current <= end:
    current_end = current + pd.Timedelta(days=7) - pd.Timedelta(seconds=1)
    if current_end >= end:
        current_end = end
    
    current_fmt = current.strftime('%Y%m%dT%H:%M-0000') # Convert timestamp to required format
    current_end_fmt = current_end.strftime('%Y%m%dT%H:%M-0000') # Convert timestamp to required format

    url = f'https://oasis.caiso.com/oasisapi/SingleZip?resultformat=6&queryname=PRC_RTM_LAP&version=6&startdatetime={current_fmt}&enddatetime={current_end_fmt}&market_run_id=RTM&node=DLAP_PGAE-APND,DLAP_SCE-APND,DLAP_SDGE-APND,DLAP_VEA-APND'

    response = requests.get(url)

    with ZipFile(BytesIO(response.content)) as z:
        csv_filename = z.namelist()[0] # Define filename
    
        with z.open(csv_filename) as f: # Open file
            oasis_df = pd.read_csv(f) # Read as a CSV
            LMP_df = pd.concat([LMP_df, oasis_df]) # Concatenate new data onto existing DataFrame

    current = current_end + pd.Timedelta(seconds=1) # Set up next request
    time.sleep(5) # Don't overload server with too many requests

In [4]:
## Preview data
LMP_df.head()

Unnamed: 0,INTERVAL_START_GMT,INTERVAL_END_GMT,OPR_DATE,INTERVAL_NUM,RESOURCE_NAME,MKT_TYPE,DATA_ITEM,VALUE,GROUP
0,2023-01-01T22:00:00-00:00,2023-01-01T23:00:00-00:00,2023-01-01,15,DLAP_PGAE-APND,RTM,LMP_CONG_PRC,0.0,1
1,2023-01-01T16:00:00-00:00,2023-01-01T17:00:00-00:00,2023-01-01,9,DLAP_PGAE-APND,RTM,LMP_CONG_PRC,0.0,1
2,2023-01-02T06:00:00-00:00,2023-01-02T07:00:00-00:00,2023-01-01,23,DLAP_PGAE-APND,RTM,LMP_CONG_PRC,6.57912,1
3,2023-01-01T13:00:00-00:00,2023-01-01T14:00:00-00:00,2023-01-01,6,DLAP_PGAE-APND,RTM,LMP_CONG_PRC,0.0,1
4,2023-01-01T18:00:00-00:00,2023-01-01T19:00:00-00:00,2023-01-01,11,DLAP_PGAE-APND,RTM,LMP_CONG_PRC,0.0,1


Our existing DataFrame includes total LMP as well as its individual components (i.e., congestion LMP = price component reflecting congestion) per DLAP. We will filter to only retain the total LMP data, and pivot the DataFrame such that the end timestamp of each interval becomes the index. We will retain the DLAP information which will be used to calculate an overall system load-weighted LMP in the final_data_query.ipynb notebook. We will also convert our timestamp to make DataFrame merging simpler in the final_data_query.ipynb notebook. Finally, we will save our data to a CSV.

In [5]:
## Transform overall LMP pricing data
LMP_only_df = LMP_df[LMP_df['DATA_ITEM'] == 'LMP_PRC'] # Filter for total LMP
LMP_only_df = LMP_only_df.pivot(index='INTERVAL_END_GMT', columns='RESOURCE_NAME', values='VALUE') # Time as index, DLAPs as columns
LMP_only_df.reset_index(inplace=True)
LMP_only_df['INTERVAL_END_GMT'] = pd.to_datetime(LMP_only_df['INTERVAL_END_GMT'], utc=True)
LMP_only_df['timestamp'] = LMP_only_df['INTERVAL_END_GMT'].dt.tz_convert('US/Pacific')

LMP_only_df.set_index('timestamp', inplace=True)
LMP_only_df.to_parquet('LMP_data.parquet', index=True) 

In [6]:
LMP_only_df.head()

RESOURCE_NAME,INTERVAL_END_GMT,DLAP_PGAE-APND,DLAP_SCE-APND,DLAP_SDGE-APND,DLAP_VEA-APND
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-01-01 01:00:00-08:00,2023-01-01 09:00:00+00:00,114.6323,114.83441,114.29102,114.31216
2023-01-01 02:00:00-08:00,2023-01-01 10:00:00+00:00,109.63738,107.76053,108.96042,108.23643
2023-01-01 03:00:00-08:00,2023-01-01 11:00:00+00:00,103.42981,102.00251,103.04744,103.63352
2023-01-01 04:00:00-08:00,2023-01-01 12:00:00+00:00,107.15742,105.95133,107.54311,107.83482
2023-01-01 05:00:00-08:00,2023-01-01 13:00:00+00:00,110.11199,110.69619,110.08753,109.26576


In [7]:
## Check for missing data
print('Missing values by column: ')
print(LMP_only_df.isna().sum())

Missing values by column: 
RESOURCE_NAME
INTERVAL_END_GMT    0
DLAP_PGAE-APND      0
DLAP_SCE-APND       0
DLAP_SDGE-APND      0
DLAP_VEA-APND       0
dtype: int64


In [8]:
## Check for duplicate timestamps
print('Total duplicate timestamps: ')
print(LMP_only_df.index.duplicated().sum())

Total duplicate timestamps: 
0


No duplicate or null values found.