[![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vanderschaarlab/temporai/blob/main/tutorials/data/tutorial03_dataloaders.ipynb)

# Data Tutorial 03: Data loaders

This tutorial shows TemporAI `Dataloader`s.

## `Dataloader` class

A TemporAI `Dataloader` implements a `load()` method which returns a TemporAI dataset.

`Dataloader`s are useful to load in some custom datasets, having done the necessary preprocessing,
perhaps user-configured.

Below is an example of `SineDataLoader`.

In [2]:
from tempor.utils.dataloaders import SineDataLoader

# The DataLoader class:
SineDataLoader

tempor.utils.dataloaders.sine.SineDataLoader

The constructor of the `Dataloader` can take various keyword arguments - this is where the user may customize the data
preprocessing etc.

In [3]:
# Initialize.

sine_dataloader = SineDataLoader(
    no=80,  # Here, number of samples.
    seq_len=5,  # Here, time series sequence length.
    # ...
)

sine_dataloader

<tempor.utils.dataloaders.sine.SineDataLoader at 0x7f8155017fa0>

In [4]:
# Load the Dataset:
data = sine_dataloader.load()

print(type(data))

data

<class 'tempor.data.dataset.OneOffPredictionDataset'>


OneOffPredictionDataset(
    time_series=TimeSeriesSamples([80, *, 5]),
    static=StaticSamples([80, 4]),
    predictive=OneOffPredictionTaskData(targets=StaticSamples([80, 1]))
)

In [5]:
data.time_series

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4
sample_idx,time_idx,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0,-0.151203,0.206110,0.783078,0.768667,0.957344
0,1,0.679518,0.785370,0.913243,0.999923,0.973799
0,2,0.997174,0.999603,0.985913,0.784278,0.730349
0,3,0.561921,0.749235,0.996514,0.218111,0.291970
0,4,-0.297606,0.150635,0.944377,-0.445537,-0.224335
...,...,...,...,...,...,...
79,0,0.999730,0.101680,-0.976039,-0.999547,-0.715265
79,1,0.803590,0.577241,-0.696389,-0.897416,-0.312411
79,2,0.269220,0.903914,-0.188132,-0.586595,0.160840
79,3,-0.378464,0.997443,0.381883,-0.139366,0.597849


In [6]:
data.static

Unnamed: 0_level_0,0,1,2,3
sample_idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.374540,0.950714,0.731994,0.598658
1,0.156019,0.155995,0.058084,0.866176
2,0.601115,0.708073,0.020584,0.969910
3,0.832443,0.212339,0.181825,0.183405
4,0.304242,0.524756,0.431945,0.291229
...,...,...,...,...
75,0.051682,0.531355,0.540635,0.637430
76,0.726091,0.975852,0.516300,0.322956
77,0.795186,0.270832,0.438971,0.078456
78,0.025351,0.962648,0.835980,0.695974


In [7]:
data.predictive.targets

Unnamed: 0_level_0,0
sample_idx,Unnamed: 1_level_1
0,1
1,1
2,1
3,0
4,1
...,...
75,1
76,0
77,1
78,1


## Provided `Dataloader`s

TemporAI comes with a number of dataloaders, see below.

In [21]:
# Display information about each dataloader's default loaded dataset.

from tempor.utils.dataloaders import all_dataloaders

from IPython.display import display

for dataloader_cls in all_dataloaders:
    print(f"\n{'-' * 80}\n")

    print(f"{dataloader_cls.__name__} loads the following dataset:\n")
    data = dataloader_cls().load()
    print(data)

    print("This contains:", end="\n\n")

    print("time_series:")
    display(data.time_series)
    if data.static is not None:
        print("static:")
        display(data.static)
    if data.predictive.targets is not None:
        print("predictive.targets:")
        display(data.predictive.targets)
    if data.predictive.treatments is not None:
        print("predictive.treatments:")
        display(data.predictive.treatments)


--------------------------------------------------------------------------------

DummyTemporalPredictionDataLoader loads the following dataset:

TemporalPredictionDataset(
    time_series=TimeSeriesSamples([100, *, 5]),
    static=StaticSamples([100, 3]),
    predictive=TemporalPredictionTaskData(
        targets=TimeSeriesSamples([100, *, 2])
    )
)
This contains:

time_series:


Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4
sample_idx,time_idx,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0,,0.893763,,,1.047522
0,1,1.257931,2.172271,2.226089,2.360713,1.981578
0,2,2.247657,0.853397,2.525946,3.213647,2.897191
0,3,3.396456,5.386071,3.721545,2.503248,3.517212
0,4,4.387812,3.365264,5.612532,5.573375,4.767746
...,...,...,...,...,...,...
99,12,12.654769,14.810888,12.914859,,12.818675
99,13,13.418815,12.135655,12.481295,13.336797,13.696168
99,14,13.785503,14.431228,15.193174,17.551818,14.464249
99,15,15.344934,15.916966,14.368132,15.965113,15.419334


static:


Unnamed: 0_level_0,0,1,2
sample_idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.753423,3.239284,0.995587
1,0.829240,3.175298,0.770566
2,0.674581,3.229741,1.302317
3,0.584040,3.234011,1.594861
4,0.501552,3.211027,0.639503
...,...,...,...
95,0.680235,3.287749,0.705369
96,0.788814,3.313229,1.318394
97,0.589116,3.268607,1.646737
98,0.551060,3.268599,0.998024


predictive.targets:


Unnamed: 0_level_0,Unnamed: 1_level_0,0,1
sample_idx,time_idx,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,-1.433570,0.714861
0,1,-0.600733,2.744446
0,2,0.622874,1.816995
0,3,1.879785,4.981217
0,4,2.477957,5.932101
...,...,...,...
99,12,10.736462,13.415872
99,13,11.617465,15.103293
99,14,12.858327,16.105966
99,15,13.652358,16.148926



--------------------------------------------------------------------------------

DummyTemporalTreatmentEffectsDataLoader loads the following dataset:

TemporalTreatmentEffectsDataset(
    time_series=TimeSeriesSamples([100, *, 5]),
    static=StaticSamples([100, 3]),
    predictive=TemporalTreatmentEffectsTaskData(
        targets=TimeSeriesSamples([100, *, 2]),
        treatments=TimeSeriesSamples([100, *, 2])
    )
)
This contains:

time_series:


Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4
sample_idx,time_idx,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0,,0.893763,,,1.047522
0,1,1.257931,2.172271,2.226089,2.360713,1.981578
0,2,2.247657,0.853397,2.525946,3.213647,2.897191
0,3,3.396456,5.386071,3.721545,2.503248,3.517212
0,4,4.387812,3.365264,5.612532,5.573375,4.767746
...,...,...,...,...,...,...
99,12,12.654769,14.810888,12.914859,,12.818675
99,13,13.418815,12.135655,12.481295,13.336797,13.696168
99,14,13.785503,14.431228,15.193174,17.551818,14.464249
99,15,15.344934,15.916966,14.368132,15.965113,15.419334


static:


Unnamed: 0_level_0,0,1,2
sample_idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.753423,3.239284,0.995587
1,0.829240,3.175298,0.770566
2,0.674581,3.229741,1.302317
3,0.584040,3.234011,1.594861
4,0.501552,3.211027,0.639503
...,...,...,...
95,0.680235,3.287749,0.705369
96,0.788814,3.313229,1.318394
97,0.589116,3.268607,1.646737
98,0.551060,3.268599,0.998024


predictive.targets:


Unnamed: 0_level_0,Unnamed: 1_level_0,0,1
sample_idx,time_idx,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,-1.433570,0.714861
0,1,-0.600733,2.744446
0,2,0.622874,1.816995
0,3,1.879785,4.981217
0,4,2.477957,5.932101
...,...,...,...
99,12,10.736462,13.415872
99,13,11.617465,15.103293
99,14,12.858327,16.105966
99,15,13.652358,16.148926


predictive.treatments:


Unnamed: 0_level_0,Unnamed: 1_level_0,0,1
sample_idx,time_idx,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,-1.433570,0.714861
0,1,-0.600733,2.744446
0,2,0.622874,1.816995
0,3,1.879785,4.981217
0,4,2.477957,5.932101
...,...,...,...
99,12,10.736462,13.415872
99,13,11.617465,15.103293
99,14,12.858327,16.105966
99,15,13.652358,16.148926



--------------------------------------------------------------------------------

GoogleStocksDataLoader loads the following dataset:

OneOffPredictionDataset(
    time_series=TimeSeriesSamples([50, *, 5]),
    predictive=OneOffPredictionTaskData(targets=StaticSamples([50, 1]))
)
This contains:

time_series:


Unnamed: 0_level_0,Unnamed: 1_level_0,Open,High,Low,Close,Volume
sample_idx,time_idx,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.875000,0.661264,0.652789,0.677836,0.696887,0.185147
0,0.886364,0.667446,0.716935,0.731552,0.748318,0.150912
0,0.897727,0.751374,0.784055,0.800261,0.791407,0.140203
0,0.909091,0.785577,0.838572,0.831813,0.832628,0.244291
0,0.920455,0.885578,0.879778,0.900782,0.889539,0.413625
...,...,...,...,...,...,...
9,0.806818,0.642857,0.647974,0.649153,0.639975,0.625178
9,0.818182,0.687362,0.757221,0.741200,0.789788,0.333141
9,0.829545,0.756044,0.732512,0.772230,0.732379,0.120629
9,0.840909,0.710852,0.687907,0.721525,0.713076,0.101900


predictive.targets:


Unnamed: 0_level_0,out
sample_idx,Unnamed: 1_level_1
0,0.710852
1,0.756044
10,0.564835
11,0.557005
12,0.552061
13,0.510852
14,0.451786
15,0.421704
16,0.387225
17,0.345879



--------------------------------------------------------------------------------

PBCDataLoader loads the following dataset:

TimeToEventAnalysisDataset(
    time_series=TimeSeriesSamples([312, *, 14]),
    static=StaticSamples([312, 1]),
    predictive=TimeToEventAnalysisTaskData(targets=EventSamples([312, 1]))
)
This contains:

time_series:


Unnamed: 0_level_0,Unnamed: 1_level_0,drug,ascites,hepatomegaly,spiders,edema,histologic,serBilir,serChol,albumin,alkaline,SGOT,platelets,prothrombin,age
sample_idx,time_idx,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
1,0.569489,0.0,1.0,1.0,1.0,1.0,3.0,3.281890,0.000000,-0.894575,0.195532,-1.485263,-0.529101,0.136768,0.248058
1,1.095170,0.0,1.0,1.0,1.0,1.0,3.0,2.015877,-0.469461,-1.570646,0.285613,0.195488,-0.456022,0.813132,0.248058
2,5.319790,0.0,1.0,1.0,1.0,1.0,2.0,0.172710,-0.658914,-1.431455,-0.605844,-0.442126,-1.395605,0.339677,1.292856
2,6.261636,0.0,1.0,1.0,1.0,1.0,2.0,-0.013468,-0.603657,-1.172958,-0.512364,-0.046806,-1.259888,0.339677,1.292856
2,7.266455,0.0,1.0,1.0,1.0,1.0,2.0,0.098239,0.000000,-1.312149,-0.443529,0.293680,-1.364286,0.339677,1.292856
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
312,1.045888,1.0,0.0,0.0,1.0,2.0,2.0,3.672865,3.319599,0.059878,1.385274,0.986129,-1.103291,1.624769,-1.962482
312,1.867265,1.0,0.0,0.0,1.0,2.0,1.0,2.350998,2.901224,-0.099197,0.916176,0.641817,-0.998892,1.354223,-1.962482
312,2.921367,1.0,0.0,0.0,0.0,0.0,1.0,0.694010,-0.066873,0.338261,0.327254,0.552551,-0.894494,0.474950,-1.962482
312,3.425145,1.0,0.0,0.0,0.0,0.0,1.0,0.340271,0.000000,-0.377580,0.251620,0.016956,-0.466462,-0.066141,-1.962482


static:


Unnamed: 0_level_0,sex
sample_idx,Unnamed: 1_level_1
1,0.0
2,0.0
3,1.0
4,0.0
5,0.0
...,...
308,0.0
309,0.0
310,0.0
311,0.0


predictive.targets:


Unnamed: 0_level_0,status
sample_idx,Unnamed: 1_level_1
1,"(0.569488555470374, True)"
2,"(14.1523381885883, False)"
3,"(0.7365020260650499, True)"
4,"(0.27653050049282957, True)"
5,"(4.12057824991786, False)"
...,...
308,"(4.98850071186069, False)"
309,"(4.55317051801555, False)"
310,"(4.4025846019056, False)"
311,"(4.12879202716022, False)"



--------------------------------------------------------------------------------

PKPDDataLoader loads the following dataset:

Generating simple PKPD dataset with random seed 100...
OneOffTreatmentEffectsDataset(
    time_series=TimeSeriesSamples([40, *, 2]),
    predictive=OneOffTreatmentEffectsTaskData(
        targets=TimeSeriesSamples([40, *, 1]),
        treatments=EventSamples([40, 1])
    )
)
This contains:

time_series:


Unnamed: 0_level_0,Unnamed: 1_level_0,k_in,p
sample_idx,time_idx,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,-0.781441,-0.245827
0,1,-1.001889,-0.541524
0,2,-1.070862,-0.589326
0,3,-1.425115,-1.065485
0,4,-1.841006,-1.542429
...,...,...,...
39,5,0.959902,-0.690057
39,6,1.683426,-0.128967
39,7,2.233045,0.637906
39,8,1.645018,1.056963


predictive.targets:


Unnamed: 0_level_0,Unnamed: 1_level_0,y
sample_idx,time_idx,Unnamed: 2_level_1
0,0,-0.197049
0,1,0.020346
0,2,-0.281120
0,3,-0.483934
0,4,-0.947253
...,...,...
39,5,-1.418583
39,6,-1.495843
39,7,-1.193632
39,8,-0.850845


predictive.treatments:


Unnamed: 0_level_0,a
sample_idx,Unnamed: 1_level_1
0,"(7, False)"
1,"(7, False)"
2,"(7, False)"
3,"(7, False)"
4,"(7, False)"
5,"(7, False)"
6,"(7, False)"
7,"(7, False)"
8,"(7, False)"
9,"(7, False)"



--------------------------------------------------------------------------------

SineDataLoader loads the following dataset:

OneOffPredictionDataset(
    time_series=TimeSeriesSamples([100, *, 5]),
    static=StaticSamples([100, 4]),
    predictive=OneOffPredictionTaskData(targets=StaticSamples([100, 1]))
)
This contains:

time_series:


Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4
sample_idx,time_idx,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0,-0.019015,-0.048177,-0.108546,0.441865,0.024508
0,1,0.300030,0.364550,0.576590,0.890053,0.534722
0,2,0.587904,0.713509,0.972993,0.986179,0.892946
0,3,0.814697,0.937661,0.882158,0.692221,0.997357
0,4,0.956846,0.997797,0.349572,0.124454,0.818278
...,...,...,...,...,...,...
99,5,0.967121,0.126890,0.926979,0.982022,0.963113
99,6,0.706533,0.413329,0.569034,0.656214,0.989224
99,7,0.252748,0.663121,0.024381,0.050668,0.999771
99,8,-0.270150,0.854119,-0.528273,-0.576478,0.994586


static:


Unnamed: 0_level_0,0,1,2,3
sample_idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.374540,0.950714,0.731994,0.598658
1,0.156019,0.155995,0.058084,0.866176
2,0.601115,0.708073,0.020584,0.969910
3,0.832443,0.212339,0.181825,0.183405
4,0.304242,0.524756,0.431945,0.291229
...,...,...,...,...
95,0.118165,0.696737,0.628943,0.877472
96,0.735071,0.803481,0.282035,0.177440
97,0.750615,0.806835,0.990505,0.412618
98,0.372018,0.776413,0.340804,0.930757


predictive.targets:


Unnamed: 0_level_0,0
sample_idx,Unnamed: 1_level_1
0,0
1,1
2,0
3,0
4,1
...,...
95,1
96,1
97,1
98,0
