Question about using PEMS datasets #343

khaled-alkilane · 2024-02-28T08:18:09Z

First, I would like to express my gratitude for your remarkable work in consolidating various models and datasets related to time series analysis in one comprehensive platform. It's an invaluable resource.

I am currently engaged in some experiments using the PEMS datasets ('03, 04, 07, 08'), specifically referenced in the iTransformer study. In this context, I have a few queries regarding the custom data_loader implementation, and I would greatly appreciate your insights to address these:

Given that PEMS data are aggregated in 5-minute intervals, should the multiplication factor be 12 (timesteps) rather than the 4 used in the custom data class?
Considering our objective encompasses all variables (i.e., sensors), how should the data loaders be structured for both input and target?
In the iTransformer research, the time intervals {12, 24, 48, 96} are mentioned. In terms of data organized every 5 minutes, does this equate to durations of 1 hour, 2 hours, 4 hours, and 8 hours, respectively?
How does the 'label_len' in data partitioning in this code snippet affect the process:

 def __getitem__(self, index):
        s_begin = index
        s_end = s_begin + self.seq_len
        r_begin = s_end - self.label_len
        r_end = r_begin + self.label_len + self.pred_len

        seq_x = self.data_x[s_begin:s_end]
        seq_y = self.data_y[r_begin:r_end]
        seq_x_mark = self.data_stamp[s_begin:s_end]
        seq_y_mark = self.data_stamp[r_begin:r_end]

        return seq_x, seq_y, seq_x_mark, seq_y_mark

From my observation, it appears that the results in various papers, such as iTransformer, are presented without data inversion (i.e., '--inverse' is not applied). Is this a correct understanding?
Lastly, if could you provide script files and code for how to process PEMS datasets, similar to iTransformer, it would be immensely helpful.

Just for info: I have already converted the NPZ files into CSV formats and add 'date' column using following code:

import numpy as np
import pandas as pd

def convert_npz_to_csv_with_datetime_index(npz_file_path, data_key, start_date, timestep_minutes, csv_file_path):
    # Load the NPZ file
    npz_file = np.load(npz_file_path)

    # Extract the data array
    data = npz_file[data_key]

    # Reshape the data array to 2D if it has more than 2 dimensions
    if data.ndim > 2 and data.shape[2] > 0:
        data = data[:, :, 0]

    # Number of rows in the data
    num_rows = data.shape[0]

    # Generate a date range with the specified start date and timestep
    timestamps = pd.date_range(start=start_date, periods=num_rows, freq=f'{timestep_minutes}T')

    # Create a DataFrame with the timestamps as index
    df = pd.DataFrame(data, index=timestamps)

    # Print shape and Index
    # Print DataFrame details
    print(f"DataFrame shape: {df.shape}")
    print(f"Index range: {df.index.min()} to {df.index.max()}")
    print(f"Index type: {type(df.index)}")

    # Reset the index to make the datetime a column, and rename it to 'date'
    df.reset_index(inplace=True)
    df.rename(columns={'index': 'date'}, inplace=True)

    # Convert 'date' column to string (object type)
    df['date'] = df['date'].astype(str)

    # Save to CSV
    df.to_csv(csv_file_path, index=False)

    print(f"File saved as '{csv_file_path}'.")

# Example usage
# convert_npz_to_csv_with_datetime_index('./dataset/PEMS/PEMS03.npz', 'data', '2012-01-05', 5, './dataset/PEMS/PEMS03.csv')
# convert_npz_to_csv_with_datetime_index('./dataset/PEMS/PEMS04.npz', 'data', '2017-01-07', 5, './dataset/PEMS/PEMS04.csv')
# convert_npz_to_csv_with_datetime_index('./dataset/PEMS/PEMS07.npz', 'data', '2017-01-05', 5, './dataset/PEMS/PEMS07.csv')
# convert_npz_to_csv_with_datetime_index('./dataset/PEMS/PEMS08.npz', 'data', '2012-01-03', 5, './dataset/PEMS/PEMS08.csv')

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about using PEMS datasets #343

Question about using PEMS datasets #343

khaled-alkilane commented Feb 28, 2024

Question about using PEMS datasets #343

Question about using PEMS datasets #343

Comments

khaled-alkilane commented Feb 28, 2024