Skip to content

Latest commit

 

History

History
63 lines (42 loc) · 1.82 KB

datasets.rst

File metadata and controls

63 lines (42 loc) · 1.82 KB

Datasets

Dataset declarations allow specifications to reference external data at scale, without having to replicate it manually in the specification.

Basics

Datasets are declared as follows using the @dataset annotation:

@dataset
myDataset : Tensor Rat [28, 28]

Datasets can be any type t that can be constructed from the following grammar:

t ::= List t | Vector t n | Tensor t ns | s
s ::= Index n | Nat | Int | Rat

where n is a known constant and ns is a list of known constants.

Once declared, datasets can be used as any other named List, Vector or Tensor would be, e.g.

forall x in myDataset . robustAround x

Supported formats

The actual implementation of the dataset is provided when using the specification during training or verification.

At the moment only the IDX format is supported. A full description of this format can be found at the bottom of this page.

There are numerous libraries for converting datasets into this format:

If you would be interested in other formats being supported, please get in touch.

Notice that the type restrictions on datasets means that you can't currently have heterogeneously-typed data within a single dataset. However, a common use case for heterogenous datasets is to import a training dataset with both training examples and training labels.

An explanation of how to achieve this with inferrable parameters is given in the parameters documentation.