# Datasets

Topics:
* How are numerical data returned (and expected) in **DynamicalSystems.jl**.
* Basic `Dataset` handling.

Much of the functionality of **DynamicalSystems.jl** uses numerical data. We required a `struct` that would unify behavior across all functions that either return or require numerical data.

To this end, we use a struct called simply `Dataset`, which is nothing more than a wrapper of a `Vector{SVector}}`:

In [1]:
using DynamicalSystems

In [2]:
?Dataset

search: [1mD[22m[1ma[22m[1mt[22m[1ma[22m[1ms[22m[1me[22m[1mt[22m Abstract[1mD[22m[1ma[22m[1mt[22m[1ma[22m[1ms[22m[1me[22m[1mt[22m [1mD[22myn[1ma[22mmicalSys[1mt[22memsB[1ma[22m[1ms[22m[1me[22m @[1md[22m[1ma[22m[1mt[22meform[1ma[22mt_[1ms[22mtr



```
Dataset{D, T} <: AbstractDataset{D,T}
```

A dedicated interface for datasets, i.e. vectors of vectors. It contains *equally-sized datapoints* of length `D`, represented by `SVector{D, T}`.

It can be used exactly like a matrix that has each of the columns be the timeseries of each of the dynamic variables. [`trajectory`](@ref) always returns a `Dataset`. For example,

```julia
ds = Systems.towel()
data = trajectory(ds, 1000) #this returns a dataset
data[:, 2] # this is the second variable timeseries
data[1] == data[1, :] # this is the first datapoint (D-dimensional)
data[5, 3] # value of the third variable, at the 5th timepoint
```

Use `Matrix(dataset)` or `reinterpret(Matrix, dataset)` and `Dataset(matrix)` or `reinterpret(Dataset, matrix)` to convert. The `reinterpret` methods are cheaper but assume that each variable/timeseries is a *row* and not column of the `matrix`.

If you have various timeseries vectors `x, y, z, ...` pass them like `Dataset(x, y, z, ...)`. You can use `columns(dataset)` to obtain the reverse, i.e. all columns of the dataset in a tuple.


---
We chose a vector of `SVectors` because it gives big performance benefits in many different functions of the library.

Special attention has been given so that subtypes of `AbstractDataset` behave exactly like some `Matrix` where each *row* is a *point* of the dataset, while each *column* is *one dynamic variable*.T he reason we choose this approach is because traditionally this is how scientific data are recorded, exported and passed around.

**Datasets in DynamicalSystems.jl don't contain any time coordinate**.

---
## Handling a Dataset

The documentation string already discribed how one handles a `Dataset`, which is identical to a `Matrix`. Besides that however, a `Dataset` has many extra goodies defined on top of it:

In [3]:
data = Dataset(rand(1000, 2))

2-dimensional Dataset{Float64} with 1000 points:
 0.85245    0.907641 
 0.764112   0.64804  
 0.750546   0.083024 
 0.324464   0.823665 
 0.318449   0.173401 
 0.123943   0.972591 
 0.857732   0.837367 
 0.64909    0.756128 
 0.280373   0.740378 
 0.270682   0.630117 
 0.0364393  0.849739 
 0.820646   0.125905 
 0.0147168  0.513893 
 ⋮                   
 0.546555   0.56561  
 0.998632   0.704277 
 0.804399   0.446502 
 0.0994692  0.28497  
 0.600024   0.0581172
 0.976807   0.712164 
 0.36504    0.391619 
 0.22077    0.719023 
 0.362273   0.369374 
 0.153819   0.0491542
 0.440674   0.976756 
 0.449408   0.0406894


In [4]:
# iteration:
f = 0.0
for point in data
    f += mean(point)
end

# get columns
x, y = columns(data)
mean(x) ≈ mean(y)

# minima, maxima, etc...
mini = minima(data)
maxi = maxima(data)
mini, maxi = minmaxima(data)

([0.000237359, 0.00238046], [0.999345, 0.999775])

In [5]:
# Function minmaxima is faster than using the two individual ones
using BenchmarkTools
@btime maxima($data)
@btime minima($data)
@btime minmaxima($data)

  2.370 μs (1 allocation: 96 bytes)
  2.418 μs (1 allocation: 96 bytes)
  3.840 μs (2 allocations: 192 bytes)


([0.000237359, 0.00238046], [0.999345, 0.999775])

# I/O
Input/output functionality for an `AbstractDataset` is already achieved using base Julia, specifically `writedlm` and `readdlm`.

The thing to note is that all data of an `AbstractDataset` is contained within its field `data`.

To write and read a dataset, simply do:

In [6]:
# I will write and read using delimiter ','
writedlm("data.txt", data.data, ',')

# Don't forget to convert the matrix to a Dataset when reading
data = Dataset(readdlm("data.txt", ',', Float64))

2-dimensional Dataset{Float64} with 1000 points:
 0.85245    0.907641 
 0.764112   0.64804  
 0.750546   0.083024 
 0.324464   0.823665 
 0.318449   0.173401 
 0.123943   0.972591 
 0.857732   0.837367 
 0.64909    0.756128 
 0.280373   0.740378 
 0.270682   0.630117 
 0.0364393  0.849739 
 0.820646   0.125905 
 0.0147168  0.513893 
 ⋮                   
 0.546555   0.56561  
 0.998632   0.704277 
 0.804399   0.446502 
 0.0994692  0.28497  
 0.600024   0.0581172
 0.976807   0.712164 
 0.36504    0.391619 
 0.22077    0.719023 
 0.362273   0.369374 
 0.153819   0.0491542
 0.440674   0.976756 
 0.449408   0.0406894
