# Toki - Data expression library

`Toki` aims to be a data expression library written in `Rust`. Its main objetive is to offer a simple and intuitive API to handle data expressions that can be evaluated for different backends.

In the context of this document, a data expression is a way to express a data in with graph that will be evaluate later by demand.

For example, in `Python`, there are some libraries that work with this concepts, such as [dask](https://dask.org/), [ibis-framework](https://ibis-project.org/), [sqlalchemy](https://www.sqlalchemy.org/) and [metadsl](https://metadsl.readthedocs.io/en/latest/).

To ilustrate this concept, mathematic expressions are very useful:

```python
x = 0
y = x + 1
```

Using a common programing language, these lines are treated as statements and are evaluated automatically. But, if these lines were 
treated as expressions, the `y` value is unknown until the user calls the evaluation for `y` value.

Consider the follow example using `dask`:

```python
>>> import dask.array as da
>>> x = da.arange(10, chunks=2).sum()
>>> y = da.arange(10, chunks=2).mean()
>>> x2, y2 = optimize(x, y)

>>> x2.compute() == a.compute()
True
>>> y2.compute() == b.compute()
True
```

As it can be observed, at running time, x2 and y2 values are unknown until user calls the `compute` method.

At this moment, there are some similar data expression libraries written in `Rust`, such as [Diesel](https://docs.diesel.rs/), etc.

Consider the follow code using `Diesel`:

```rust
let data = animals
    .select(species)
    .filter(name.is_null())
    .first::<String>(&connection)?;
```

The `Toki`'s goal is to allow the same operation but using a simpler approach:

```rust
let data = animals[animals[species].name.is_null()].head(1);
```

This document explores `Rust` in a way to achive this goal.

## Data Expression Code Design

Some common elements that a data expression can have:

- Data Type expression (literal types, such as Integer32, String)
- Table expression (such as table, columns, etc)
- Operation expression

## Rust language structure

Compared with other languages, `Rust` can be quite challenging. Some examples about `Rust` characterists:

- `Rust` doesn't have classes, instead structs and traits should be used.
- `Rust` native dictionary (HashMap) is very verbose, but [maplit](https://docs.rs/maplit/1.0.2/maplit/) `hashmap` macro can be used instead.

In the following sections, there are some proofs to check the viability to create a data expresion library in `Rust` with a user experience (similar to libraries, such as `dask` or `ibis-framework`).

In [2]:
use std::fmt;
use std::ops;
use std::collections::HashMap;

### Toki - Proof of Concept

In [9]:
trait Expression {
    fn __str__(&self) -> String;
}

trait DataType {}

trait NumericType {}

impl Expression for dyn DataType {
    fn __str__(&self) -> String {
        "DataType".to_string()
    }
}

impl Expression for dyn NumericType {
    fn __str__(&self) -> String {
        "NumericType".to_string()
    }
}

impl fmt::Debug for dyn Expression {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        let msg: &str = &(self.__str__())[..];
        f.debug_struct(msg).finish()
    }
}

impl fmt::Display for dyn Expression {
    fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        let msg: &str = &(self.__str__())[..];
        formatter.write_str(msg)
    }
}

#[derive(Debug)]
struct Integer32 {
    parent: Option<Box<dyn Expression>>,
    value: i32,
}


impl Expression for Integer32 {
    fn __str__(&self) -> String {
        "Integer32".to_string()
    }
}

impl DataType for Integer32 {}
impl NumericType for Integer32 {}

trait Integer32Type {
    fn new(value: i32) -> Integer32;
}

impl Integer32Type for Integer32 {
    fn new(value: i32) -> Integer32 {
        Integer32 { value: value , parent: None }
    }
    
    // fn new_with_parent(value: i32, parent: Option<dyn Expression + 'static>) -> Integer32 {
    //     Integer32 { value: value , parent: Some<parent>}
    // }
}


#[derive(Debug)]
struct Integer64 {
    parent: Option<Box<dyn Expression>>,
    value: i64,
}

trait Integer64Type {
    fn new(value: i64) -> Integer64;
}

impl Integer64Type for Integer64 {
    fn new(value: i64) -> Integer64 {
        Integer64 { value: value , parent: None }
    }
    
    // fn new_with_parent(value: i32, parent: Option<dyn Expression + 'static>) -> Integer32 {
    //     Integer32 { value: value , parent: Some<parent>}
    // }
}

impl Expression for Integer64 {
    fn __str__(&self) -> String {
        "Integer64".to_string()
    }
}

impl DataType for Integer64 {}
impl NumericType for Integer64 {}


let obj_i32: Integer32 = Integer32::new(1);
println!("{:?}", obj_i32);

let obj_i64 = Integer64::new(2);
println!("{:?}", obj_i64);


// OPERATION

trait BinaryOp {
    fn resolve_expression();
}

#[derive(Debug)]
struct Add {
    left: Box<dyn Expression>,
    right: Box<dyn Expression>,
}

// impl BynaryOp for Add {}


impl ops::Add<Integer32> for Integer32 {
    type Output = Add;

    fn add(self, rhs: Integer32) -> Add {
        Add {
            left: Box::new(self),
            right: Box::new(rhs)
        }
    }
}

impl ops::Add<Integer64> for Integer64 {
    type Output = Add;

    fn add(self, rhs: Integer64) -> Add {
        Add {
            left: Box::new(self),
            right: Box::new(rhs)
        }
    }
}

impl ops::Add<Integer64> for Integer32 {
    type Output = Add;

    fn add(self, rhs: Integer64) -> Add {
        Add {
            left: Box::new(self),
            right: Box::new(rhs)
        }
    }
}

impl ops::Add<Integer32> for Integer64 {
    type Output = Add;

    fn add(self, rhs: Integer32) -> Add {
        Add {
            left: Box::new(self),
            right: Box::new(rhs)
        }
    }
}


let x: Integer32 = Integer32::new(1);
let y: Integer32 = Integer32::new(2);
println!("{:?}", x + y);

let x = Integer64::new(1);
let y = Integer64::new(2);
println!("{:?}", x + y);


let x = Integer32::new(1);
let y = Integer64::new(2);
println!("{:?}", x + y);

let x = Integer64::new(1);
let y = Integer32::new(2);
println!("{:?}", x + y);

Integer32 { parent: None, value: 1 }
Integer64 { parent: None, value: 2 }
Add { left: Integer32, right: Integer32 }
Add { left: Integer64, right: Integer64 }
Add { left: Integer32, right: Integer64 }
Add { left: Integer64, right: Integer32 }


In [None]:
#[derive(Debug)]
struct A {}

#[derive(Debug)]
struct B {}

trait TraitAB {
    fn __str__(&self) -> &str;
}

impl fmt::Debug for dyn TraitAB {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct(self.__str__()).finish()
    }
}

#[derive(Debug)]
struct C {
    A_or_B: dyn TraitAB
}

impl TraitAB for A {
    fn __str__(&self) -> &str {
        "A"
    }
}
impl TraitAB for B {
    fn __str__(&self) -> &str{
        "B"
    }
}

let c = C {
   A_or_B: *B {}
};

In [11]:
#[derive(Debug)]
struct ColumnInteger32 {
    name: String
}

#[derive(Debug)]
struct ColumnInteger64 {
    name: String
}

#[derive(Debug)]
struct ColumnString {
    name: String
}

trait ColumnType {
    fn __str__(&self) -> String {
        "ColumnType".to_string()
    }
}

impl fmt::Debug for dyn ColumnType {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        let msg = &(self.__str__())[..];
        f.debug_struct(msg).finish()
    }
}

impl ColumnType for ColumnInteger32 {
    fn __str__(&self) -> String {
        format!("{}: {}", self.name, "ColumnInteger32".to_string())
    }
}
impl ColumnType for ColumnInteger64 {
    fn __str__(&self) -> String {
        format!("{}: {}", self.name, "ColumnInteger64".to_string())
    }
}


// impl Integer32Type for ColumnInteger32 {}
// impl Integer64Type for ColumnInteger64 {}

#[derive(Debug)]
struct Schema {
    name: String,
    fields: Vec<Box<dyn ColumnType>>,
}

impl Schema {
    fn new(name: String, fields: Vec<Box<dyn ColumnType>>) -> Schema {
        Schema {name: name, fields: fields }
    }
}

trait TableType {}

impl Expression for dyn TableType {
    fn __str__(&self) -> String {
        "TableType".to_string()
    }
}

impl fmt::Debug for dyn TableType {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        let msg = &(self.__str__())[..];
        f.debug_struct(msg).finish()
    }
}


#[derive(Debug)]
struct Table {
    name: String,
    schema: Schema
}

#[derive(Debug)]
struct TableProjection {
    parent: Box<dyn TableType>,
    projection: Vec<String>,
}

impl TableProjection {
    fn new(parent: Box<dyn TableType>, projection: Vec<String>) -> TableProjection {
        TableProjection {
            parent: parent, 
            projection: projection
        }
    }
}


// impl<Idx> std::ops::Index<Idx> for Table
// where
//     Idx: std::slice::SliceIndex<[String]>,
// {
//     type Output = Idx::Output;

//     fn index(&self, index: Idx) -> TableProjection {
//         TableProjection::new(self, index)
//     }
// }

let schema = Schema::new(
    "table".to_string(), 
    vec!(
        Box::new(ColumnInteger32 {name: "x".to_string()}),
        Box::new(ColumnInteger32 {name: "y".to_string()}),
    )
);
println!("{:?}", schema);

Schema { name: "table", fields: [x: ColumnInteger32, y: ColumnInteger32] }


## Conclusions

The code above indicates that it is possible to create a data expression data

## References

- Rust
  - https://doc.rust-lang.org/std/fmt/trait.Debug.html
  - https://doc.rust-lang.org/stable/rust-by-example/std/hash.html
  - https://doc.rust-lang.org/rust-by-example/macros/variadics.html
  - https://stackoverflow.com/questions/24512356/how-to-use-variadic-macros-to-call-nested-constructors
  - https://stackoverflow.com/questions/53688202/does-rust-have-an-equivalent-to-pythons-dictionary-comprehension-syntax
  - https://play.rust-lang.org/?gist=3dad589a10c43a66ad08ab051c668e58&version=stable&backtrace=0
  - https://docs.rs/maplit/1.0.2/maplit/