In [1]:
)clear
⎕PP←4

In [2]:
]link.import # .

# `data` namespace

**DISCLAIMER** This is a proof-of-concept. Use at your own risk. Send comments to jgl@dyalog.com

## Classes

<div style="background-color:#ffffee;">

### `data.Series` class

An instance of the `data.Series` class contains a labelled 1D array.

- **`label`** label of the series.

- **`values`** values of the series. It must be a 1D array.

Values of the array can be accessed by bracket indexing (eg: `s[2]`).

The **`loc`** property allows to *locate* values.
It returns the indices (the *location*) of the values given as indices in brackets (eg: `s.loc['A']`).
It also allows to assign to the located values.

The **`frames`** property gives access to the instances of `data.Frame` which contain this series.

</div>
<div style="background-color:#ffffee;">

### `data.Frame` class

An instance of the `data.Frame` class contains a list of series, all of them containing arrays of the same length.

- **`series`** list of series in the frame.

The **`labels`** property gives access to the array of labels of the series, while **`values`** gives access to the values as a list of nested arrays.

The series can be accessed by bracket indexing of rank-1 (eg: `f[⊂'label']`). Bracket indexing of rank 2 gives access to the values in the frame as a 2D array (eg: `f[2 3;'col1' 'col2']←2 2⍴⍳4`).

The **`loc`** property allows to *locate* columns and values.
Bracket indexing of rank-1 will return the index of the corresponding columns (eg: `f.loc['col1' 'col2']`).
Rank-2 indexing allows to locate values. The corresponding indices are returned as a 2D array.
It also allows to assign to the located values.

Frames are displayed with shades at fixed row intervals. The number of rows is controlled by the **SHADE** field. The maximum number of lines displayed is controlled by **MAXLINES**.

</div>

## Functions and operators

<div style="background-color:#eeffee;">

### `data.series` function

This function returns an instance or a list of instances of the `data.Series` class.

- `⍺ data.series ⍵` creates an instance of `data.Series` with label `⍺` and values `⍵`. If `⍺` is a series, the label is taken from it.
- `data.series ⍵` creates an instance of `data.Series` for each of the series in `⍵` and each of the series contained in each frame in `⍵`. If `⍵` is a 2D array, it must contain series with the same label in each column, and their values will be concatenated.

</div>
<div style="background-color:#eeffee;">

### `data.frame` function

This function returns an instance of the `data.Frame` class.

- `⍺ data.frame ⍵` creates an instance of `data.Frame` with labels `⍺` (or the labels of the series list or frame `⍺`) and values `⍵`. If `⍵` is a string, the result of `⍺ data.(frame csv) ⍵` is returned.
- `data.frame ⍵` creates an instance of `data.Frame` with each of the series returned by `data.series ⍵`. If `⍵` is a string, the result of `data.(frame csv) ⍵` is returned.

</div>
<div style="background-color:#eeffee;">

### `data.csv` function

This function reads/writes frames from/to csv files.

- `⍺ data.frame ⍵` writes the frame `⍺` to the CSV file `⍵` or read CSV file `⍵` without header and return frame with labels `⍺`.
- `data.frame ⍵` read file `⍵` as CSV and return a frame.

</div>
<div style="background-color:#ffeeff;">

### `data.sort` operator

This operator sorts data according to the left function.

- `⍺ (⍺⍺ data.sort) ⍵` returns the data in `⍵` (a frame or list of series) sorted according to the result of `⍺⍺ ⍺` (where `⍺⍺` typically is one of `⍒⍋`).
- `(⍺⍺ data.sort) ⍵` is equivalent to `(⍺⍺ data.sort)⍨⍵`.

</div>
<div style="background-color:#ffeeff;">

### `data.by` operator

This operator groups data by the right operand and applies the left function.

- `⍺ (⍺⍺ data.by ⍵⍵) ⍵` returns the data in `⍵` (a frame or list of series) grouped according to `⍵⍵` (also a frame or list of series) and apply `⍺⍺` to each group. A new frame is returned with the labels given in `⍺` (or `⍺.labels`). If `≢⍺` is lower than the number of series, it must contain a label for each of the additional series or a label for each of the series not in `⍵⍵`.
- `(⍺⍺ data.by ⍵⍵) ⍵` is equivalent to `(⍺⍺ data.by ⍵⍵)⍨⍵`.

</div>
<div style="background-color:#ffeeff;">

### `data.where` operator

This operator applies the left function to data that fulfills the condition given as right operand.

- `⍺ (⍺⍺ data.where ⍵⍵) ⍵` returns the data in `⍵` (a frame or list of series) after applying the function `⍺⍺` to the values which fulfill the condition `⍵⍵ ⍺`.
- `(⍺⍺ data.where ⍵⍵) ⍵` is equivalent to `(⍺⍺ data.where ⍵⍵)⍨⍵`.

</div>
<div style="background-color:#ffeeff;">

### `data.join` operator

This operator merges two frames (or lists of series).

- `⍺ (⍺⍺ data.join ⍵⍵) ⍵` returns frame with series labelled `⍺.labels ⍵⍵ ⍵.labels`. If two series at left and right have the same label, its values are combined as `⍺.values ⍺⍺ ⍵.values`.

</div>

## Example

In [3]:
f ←   data.frame'berkeley.csv'
a ←   data.('Applicants' 'Accepted'{(≢⍵),('A'+.=⊃¨)⍵}by'Major' 'Gender'⊢)f[]~f[⊂'Year']
g ← a data.(⍋sort⊣,join⊣(⊂'Gender')(+⌿,'T'⍨)by(⊂'Major')⊢)a[]~a[⊂'Gender']
m ← g data.(⍋sort⊣,join⊣(⊂'Major')(+⌿,(⊂'Total')⍨)by(⊂'Gender')⊢)g[]~g[⊂'Major']
r ← m data.(frame⊣,'%Accepted'series⊢)100×÷/m[;'Accepted' 'Applicants']
    r data.(frame⊣,'%Applicants'series⊢)(100×⊢÷≢⍴('T'=⊃¨r[;⊂'Major'])⌿⊢)⊢r[;⊂'Applicants']