Skip to content

Files

Latest commit

 

History

History

examples

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

We support multiple common data schemas and here are a few examples with corresponding configuration files. You may find the "nearest match" to start with.

Note: across all examples, iteration are set to a small number to ensure a quick E2E test. For generating high-quality synthetic data, we recommend increasing iteration by your experience and computational resources.

Prerequiste

We support four different fields:

  1. Bit field (encoded as bit strings) e.g.,

    {
        "column": "srcip",
        "type": "integer",
        "encoding": "bit",
        "n_bits": 32
    }

    An optional property to this field is truncate, which is a boolean value with default False. If truncate is set to true, then we will truncate large integers and consider only the most significant n_bits bits.

  2. Word2Vec field (encoded as Word2Vec vectors), e.g.,

    {
        "column": "srcport",
        "type": "integer",
        "encoding": "word2vec_port"
    }
  3. Categorical field (encoded as one-hot encoding), e.g.,

    {
        "column": "type",
        "type": "string",
        "encoding": "categorical"
    }
  4. Continuous field, e.g.,

    {
        "column": "pkt",
        "type": "float",
        "normalization": "ZERO_ONE",
        "log1p_norm": true
    }

Dataset type 1: single-event

Single-event schema contains one timeseries per row.

Data schema

Timestamp (optional) Metadata 1 Metadata 2 ... Timeseries 1 Timeseries 2 ...
t1
t2
...

Examples

  1. PCAP

    Timestamp Srcip Dstip Srcport Dstport Proto Pkt_size ...
    t1
    t2
    ...
  2. NetFlow (configuration_file)

Multi-event data schema contains multiple timeseries per row.

Data Schema

Metadata 1 Metadata 2 ... {Timestamp (optional), Timeseries 1, Timeseries 2, ...} {Timestamp (optional), Timeseries 1, Timeseries 2, ...} ...

Examples

  1. Wikipedia dataset (configuration_file)
    Domain Access type Agent {Date 1, page view} {Date 2, page view} ...