Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about using PEMS datasets #343

Open
khaled-alkilane opened this issue Feb 28, 2024 · 0 comments
Open

Question about using PEMS datasets #343

khaled-alkilane opened this issue Feb 28, 2024 · 0 comments

Comments

@khaled-alkilane
Copy link

First, I would like to express my gratitude for your remarkable work in consolidating various models and datasets related to time series analysis in one comprehensive platform. It's an invaluable resource.

I am currently engaged in some experiments using the PEMS datasets ('03, 04, 07, 08'), specifically referenced in the iTransformer study. In this context, I have a few queries regarding the custom data_loader implementation, and I would greatly appreciate your insights to address these:

  1. Given that PEMS data are aggregated in 5-minute intervals, should the multiplication factor be 12 (timesteps) rather than the 4 used in the custom data class?
  2. Considering our objective encompasses all variables (i.e., sensors), how should the data loaders be structured for both input and target?
  3. In the iTransformer research, the time intervals {12, 24, 48, 96} are mentioned. In terms of data organized every 5 minutes, does this equate to durations of 1 hour, 2 hours, 4 hours, and 8 hours, respectively?
  4. How does the 'label_len' in data partitioning in this code snippet affect the process:
 def __getitem__(self, index):
        s_begin = index
        s_end = s_begin + self.seq_len
        r_begin = s_end - self.label_len
        r_end = r_begin + self.label_len + self.pred_len

        seq_x = self.data_x[s_begin:s_end]
        seq_y = self.data_y[r_begin:r_end]
        seq_x_mark = self.data_stamp[s_begin:s_end]
        seq_y_mark = self.data_stamp[r_begin:r_end]

        return seq_x, seq_y, seq_x_mark, seq_y_mark
  1. From my observation, it appears that the results in various papers, such as iTransformer, are presented without data inversion (i.e., '--inverse' is not applied). Is this a correct understanding?
  2. Lastly, if could you provide script files and code for how to process PEMS datasets, similar to iTransformer, it would be immensely helpful.

Just for info: I have already converted the NPZ files into CSV formats and add 'date' column using following code:

import numpy as np
import pandas as pd

def convert_npz_to_csv_with_datetime_index(npz_file_path, data_key, start_date, timestep_minutes, csv_file_path):
    # Load the NPZ file
    npz_file = np.load(npz_file_path)

    # Extract the data array
    data = npz_file[data_key]

    # Reshape the data array to 2D if it has more than 2 dimensions
    if data.ndim > 2 and data.shape[2] > 0:
        data = data[:, :, 0]

    # Number of rows in the data
    num_rows = data.shape[0]

    # Generate a date range with the specified start date and timestep
    timestamps = pd.date_range(start=start_date, periods=num_rows, freq=f'{timestep_minutes}T')

    # Create a DataFrame with the timestamps as index
    df = pd.DataFrame(data, index=timestamps)

    # Print shape and Index
    # Print DataFrame details
    print(f"DataFrame shape: {df.shape}")
    print(f"Index range: {df.index.min()} to {df.index.max()}")
    print(f"Index type: {type(df.index)}")

    # Reset the index to make the datetime a column, and rename it to 'date'
    df.reset_index(inplace=True)
    df.rename(columns={'index': 'date'}, inplace=True)

    # Convert 'date' column to string (object type)
    df['date'] = df['date'].astype(str)

    # Save to CSV
    df.to_csv(csv_file_path, index=False)

    print(f"File saved as '{csv_file_path}'.")

# Example usage
# convert_npz_to_csv_with_datetime_index('./dataset/PEMS/PEMS03.npz', 'data', '2012-01-05', 5, './dataset/PEMS/PEMS03.csv')
# convert_npz_to_csv_with_datetime_index('./dataset/PEMS/PEMS04.npz', 'data', '2017-01-07', 5, './dataset/PEMS/PEMS04.csv')
# convert_npz_to_csv_with_datetime_index('./dataset/PEMS/PEMS07.npz', 'data', '2017-01-05', 5, './dataset/PEMS/PEMS07.csv')
# convert_npz_to_csv_with_datetime_index('./dataset/PEMS/PEMS08.npz', 'data', '2012-01-03', 5, './dataset/PEMS/PEMS08.csv')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant