# The `spudtr` epochs data format

There are three key elements:

 1. `epoch_id` an index-like column, where each value designates a unique epoch
 2. `time` an index-like column of sequential timestamps, the same in each epoch
 3.  the rest of the data columns
 
 

In [1]:
import pandas as pd
from matplotlib import pyplot as plt
import spudtr.fake_epochs_data as fake_data
import spudtr.epf as epf

# Example: simulated categorical and continuous data

In [2]:
n_epochs_per_category = 2
epochs_df, channels = fake_data._generate(
    n_epochs=n_epochs_per_category,
    n_samples=32,
    n_categories=2,
    n_channels=4,
    time="days",
    epoch_id="epoch_id",
    seed=10,
)
display(epochs_df)

Unnamed: 0,epoch_id,days,categorical,continuous,channel0,channel1,channel2,channel3
0,0,0,cat0,0.771321,-13.170787,-30.197057,19.609869,43.177612
1,0,1,cat0,0.020752,4.233125,-7.726009,-65.298259,41.464399
2,0,2,cat0,0.633648,8.191480,21.915223,18.568468,27.639613
3,0,3,cat0,0.748804,-48.557122,-50.952045,14.317029,-17.186617
4,0,4,cat0,0.498507,-17.193401,50.222266,0.782896,38.251473
...,...,...,...,...,...,...,...,...
123,3,27,cat1,0.744603,33.167254,-7.658414,14.630878,14.329468
124,3,28,cat1,0.469785,-60.531560,0.774228,1.689442,0.882024
125,3,29,cat1,0.598256,16.216221,66.028993,16.373534,4.854384
126,3,30,cat1,0.147620,-43.268966,26.531028,-20.493672,-12.327708


# Verify the format

When things go well the check quietly succeeds.

When they don't the reason appears at the bottom of the messages.

Example: This check succeeds.

In [3]:
epf.check_epochs(epochs_df, ['channel0', 'channel1'], epoch_id="epoch_id", time="days")

Example: this checks for a data column that doesn't exist.

In [4]:
epf.check_epochs(epochs_df, ['bogus_channel0', 'channel1'], epoch_id="epoch_id", time="days")

ValueError: data_streams should all be present in the epochs dataframe, the following are missing: ['bogus_channel0']

Example: this checks for a time column that doesn't exist.

In [5]:
epf.check_epochs(epochs_df, ['channel0', 'channel1'], epoch_id="epoch_id", time="hours")

ValueError: time column not found: hours