## Prepare environment

IElixir has "boyle" to build an env and install dependencies for jupyter lab. See more in: https://hexdocs.pm/ielixir/Boyle.html. To install IElixir, please follow this guide: https://github.com/pprzetacznik/IElixir#install-kernel.

In [55]:
Boyle.mk("polars")
Boyle.activate("polars")
Boyle.install({:ex_polars, "~> 0.3.6-dev"})

All dependencies are up to date
Resolving Hex dependencies...
Dependency resolution completed:
Unchanged:
  deneb 0.2.2
  elixir_uuid 1.2.1
  ex_polars 0.3.6-dev
  jason 1.2.2
  rustler 0.22.0-rc.0
  toml 0.5.2
  typed_struct 0.2.1
All dependencies are up to date
==> ex_polars
Compiling NIF crate :expolars (native/expolars)...
    Finished release [optimized] target(s) in 0.27s


  mix.exs:1



:ok

## Import ExPolars DataFrame and Series

The two major data structures in ExPolars are DataFrame and Series. They behave very similar to pandas. However due to the limitation of the elixir language,
you have to import the overloaded operators for easy Series operations.

In [56]:
alias ExPolars.Series, as: S
alias ExPolars.DataFrame, as: DF
alias ExPolars.Datasets
import Kernel, except: [+: 2, -: 2, *: 2, /: 2, ==: 2, <>: 2, >: 2, >=: 2, <: 2, <=: 2]
import S, only: [+: 2, -: 2, *: 2, /: 2, ==: 2, <>: 2, >: 2, >=: 2, <: 2, <=: 2]

ExPolars.Series

## Loading a DataFrame from file

ExPolars supports to load dataframe from JSON, CSV and Parquet files. Below is an example of loading a csv file.

Once file is loaded, you can inspect it by using `head`, `tail`, `get_columns`, etc. See all the functionalities in https://hexdocs.pm/ex_polars/ExPolars.DataFrame.html. Since I don't have enough time (a good excuse for being lazy), I just put the typespec without actual doc in the document.

In [76]:
{:ok, df} = DF.read_csv("./priv/datasets/airports.csv")

{:ok, shape: (3376, 7)
╭──────┬───────────────────────────┬──────────────────┬───────┬─────────┬──────────┬────────────╮
│ iata ┆ name                      ┆ city             ┆ state ┆ country ┆ latitude ┆ longitude  │
│ ---  ┆ ---                       ┆ ---              ┆ ---   ┆ ---     ┆ ---      ┆ ---        │
│ str  ┆ str                       ┆ str              ┆ str   ┆ str     ┆ f64      ┆ f64        │
╞══════╪═══════════════════════════╪══════════════════╪═══════╪═════════╪══════════╪════════════╡
│ 00M  ┆ Thigpen                   ┆ Bay Springs      ┆ MS    ┆ USA     ┆ 31.954   ┆ -8.9235e1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 00R  ┆ Livingston Municipal      ┆ Livingston       ┆ TX    ┆ USA     ┆ 30.686   ┆ -9.5018e1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 00V  ┆ Meadow Lake               ┆ Colorado Springs ┆ CO    ┆ USA     ┆ 38.946   ┆ -1.0457e2 

In [77]:
DF.tail(df)

{:ok, shape: (5, 7)
╭──────┬───────────────────────────┬─────────────┬───────┬─────────┬──────────┬────────────╮
│ iata ┆ name                      ┆ city        ┆ state ┆ country ┆ latitude ┆ longitude  │
│ ---  ┆ ---                       ┆ ---         ┆ ---   ┆ ---     ┆ ---      ┆ ---        │
│ str  ┆ str                       ┆ str         ┆ str   ┆ str     ┆ f64      ┆ f64        │
╞══════╪═══════════════════════════╪═════════════╪═══════╪═════════╪══════════╪════════════╡
│ ZEF  ┆ Elkin Municipal           ┆ Elkin       ┆ NC    ┆ USA     ┆ 36.28    ┆ -8.0786e1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ZER  ┆ Schuylkill Cty/Joe Zerbey ┆ Pottsville  ┆ PA    ┆ USA     ┆ 40.706   ┆ -7.6373e1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ZPH  ┆ Zephyrhills Municipal     ┆ Zephyrhills ┆ FL    ┆ USA     ┆ 28.228   ┆ -8.2156e1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼

## Interact with predefined datasets

In `priv/datasets` there are many different datasets for you to use. These are good starting point if you'd like to explore the functionalities of ExPolars.

In [78]:
weather = ExPolars.Datasets.seattle_weather

shape: (1461, 6)
╭───────────────┬──────────┬──────────┬──────┬─────────┬──────────────╮
│ precipitation ┆ temp_max ┆ temp_min ┆ wind ┆ weather ┆ date         │
│ ---           ┆ ---      ┆ ---      ┆ ---  ┆ ---     ┆ ---          │
│ f64           ┆ f64      ┆ f64      ┆ f64  ┆ str     ┆ date32(days) │
╞═══════════════╪══════════╪══════════╪══════╪═════════╪══════════════╡
│ 0.0           ┆ 12.8     ┆ 5        ┆ 4.7  ┆ drizzle ┆ 2012-01-01   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 10.9          ┆ 10.6     ┆ 2.8      ┆ 4.5  ┆ rain    ┆ 2012-01-02   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 0.8           ┆ 11.7     ┆ 7.2      ┆ 2.3  ┆ rain    ┆ 2012-01-03   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 20.3          ┆ 12.2     ┆ 5.6      ┆ 4.7  ┆ rain    ┆ 2012-01-04   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ...           ┆ ...      ┆ ...      ┆ ...  ┆ 

## Complicated Operations

You can do complicated operations with ExPolars.

Operations include:

- window, e.g. rolling_min, rolling_max
- aggregate, eg. min, max, mean, std, var
- comparison, e.g. ==, <>, <, >, <=, >=
- shift
- filter
- ...

For example, below filter operation is equal to `df[df["temp_min"] < -5.0]` in pandas. Here you can see how Python's grammar (and magic functions) help to make the expressiveness much easier.

In [79]:
df = DF.filter(weather, DF.column(weather, "temp_min") < -5.0)

{:ok, shape: (4, 6)
╭───────────────┬──────────┬──────────┬──────┬─────────┬──────────────╮
│ precipitation ┆ temp_max ┆ temp_min ┆ wind ┆ weather ┆ date         │
│ ---           ┆ ---      ┆ ---      ┆ ---  ┆ ---     ┆ ---          │
│ f64           ┆ f64      ┆ f64      ┆ f64  ┆ str     ┆ date32(days) │
╞═══════════════╪══════════╪══════════╪══════╪═════════╪══════════════╡
│ 0.0           ┆ 0.0      ┆ -7.1e0   ┆ 3.1  ┆ sun     ┆ 2013-12-07   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 0.0           ┆ 2.2      ┆ -6.6e0   ┆ 2.2  ┆ sun     ┆ 2013-12-08   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 0.0           ┆ -5e-1    ┆ -5.5e0   ┆ 6.6  ┆ sun     ┆ 2014-02-05   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 0.0           ┆ -1.6e0   ┆ -6e0     ┆ 4.5  ┆ sun     ┆ 2014-02-06   │
╰───────────────┴──────────┴──────────┴──────┴─────────┴──────────────╯}

## DataFrame Persistence

Right now ExPolars only support to persist the data frame to csv file. JSON/Parquet is not supported since the underlying library arrow hasn't support it: see: https://github.com/apache/arrow/blob/master/rust/parquet/README.md. 

In [61]:
{:ok, csv} = DF.to_csv(df)
csv |> IO.puts

precipitation,temp_max,temp_min,wind,weather,date
0,0,-7.1,3.1,sun,2013-12-07
0,2.2,-6.6,2.2,sun,2013-12-08
0,-0.5,-5.5,6.6,sun,2014-02-05
0,-1.6,-6,4.5,sun,2014-02-06



:ok

In [62]:
DF.to_csv_file(df, "/tmp/weather.csv")

{:ok, {}}

In [63]:
DF.read_csv("/tmp/weather.csv")

{:ok, shape: (4, 6)
╭───────────────┬──────────┬──────────┬──────┬─────────┬────────────╮
│ precipitation ┆ temp_max ┆ temp_min ┆ wind ┆ weather ┆ date       │
│ ---           ┆ ---      ┆ ---      ┆ ---  ┆ ---     ┆ ---        │
│ i64           ┆ f64      ┆ f64      ┆ f64  ┆ str     ┆ str        │
╞═══════════════╪══════════╪══════════╪══════╪═════════╪════════════╡
│ 0             ┆ 0.0      ┆ -7.1e0   ┆ 3.1  ┆ sun     ┆ 2013-12-07 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 0             ┆ 2.2      ┆ -6.6e0   ┆ 2.2  ┆ sun     ┆ 2013-12-08 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 0             ┆ -5e-1    ┆ -5.5e0   ┆ 6.6  ┆ sun     ┆ 2014-02-05 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 0             ┆ -1.6e0   ┆ -6e0     ┆ 4.5  ┆ sun     ┆ 2014-02-06 │
╰───────────────┴──────────┴──────────┴──────┴─────────┴────────────╯}

## Plotting

Showing data as table won't give you much insights, so we need better visual encoding of the data. Normally it means using proper markers (point, line, bar, area, etc.) with proper x/y axis and colors to visualize the data. Pandas has built in support for matplotlib, and there's a big ecosystem around it (plotly, seaborn, altair, etc.). Personally I like the tools behind [altair](https://altair-viz.github.io/) a lot, which is [vega-lite](https://vega.github.io/vega-lite/). Thus I built a very simple library called [deneb](https://github.com/tyrchen/deneb), which is an elixir thin wrapper of vega-lite. ExPolars integrates with deneb and provide a simple, easy to use `plot_*` functionalities around dataframe:

- `plot_by_type`: special purpose plots, e.g. candlestick plot
- `plot_single`: plot a single chart with minimum inputs
- `plot_repeat`: plot a grid of charts to compare with

In [80]:
DF.plot_single(weather, :tick, "date", "temp_max", color: "weather", width: 800, height: 400) |> Deneb.Viewer.display

In [74]:
DF.plot_repeat(weather, :circle, "date", ["temp_max", "temp_min", "wind"], color: "weather", width: 800, height: 500, columns: 1) |> Deneb.Viewer.display