## Prepare environment

IElixir has "boyle" to build an env and install dependencies for jupyter lab. See more in: https://hexdocs.pm/ielixir/Boyle.html. To install IElixir, please follow this guide: https://github.com/pprzetacznik/IElixir#install-kernel.

In [1]:
Boyle.mk("polars")
Boyle.activate("polars")
Boyle.install({:ex_polars, "~> 0.3.2-dev"})

## Import ExPolars DataFrame and Series

The two major data structures in ExPolars are DataFrame and Series. They behave very similar to pandas. However due to the limitation of the elixir language,
you have to import the overloaded operators for easy Series operations.

In [6]:
alias ExPolars.Series, as: S
alias ExPolars.DataFrame, as: DF
alias ExPolars.Datasets
import Kernel, except: [+: 2, -: 2, *: 2, /: 2, ==: 2, <>: 2, >: 2, >=: 2, <: 2, <=: 2]
import S, only: [+: 2, -: 2, *: 2, /: 2, ==: 2, <>: 2, >: 2, >=: 2, <: 2, <=: 2]

ExPolars.Series

## Loading a DataFrame from file

ExPolars supports to load dataframe from JSON, CSV and Parquet files. Below is an example of loading a csv file.

Once file is loaded, you can inspect it by using `head`, `tail`, `get_columns`, etc. See all the functionalities in https://hexdocs.pm/ex_polars/ExPolars.DataFrame.html. Since I don't have enough time (a good excuse for being lazy), I just put the typespec without actual doc in the document.

In [2]:
{:ok, df} = DF.read_csv("./priv/datasets/airports.csv")

{:ok, shape: (3376, 7)
╭──────┬───────────────────────────┬──────────────────┬───────┬─────────┬──────────┬────────────╮
│ iata ┆ name                      ┆ city             ┆ state ┆ country ┆ latitude ┆ longitude  │
│ ---  ┆ ---                       ┆ ---              ┆ ---   ┆ ---     ┆ ---      ┆ ---        │
│ str  ┆ str                       ┆ str              ┆ str   ┆ str     ┆ f64      ┆ f64        │
╞══════╪═══════════════════════════╪══════════════════╪═══════╪═════════╪══════════╪════════════╡
│ 00M  ┆ Thigpen                   ┆ Bay Springs      ┆ MS    ┆ USA     ┆ 31.954   ┆ -8.9235e1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 00R  ┆ Livingston Municipal      ┆ Livingston       ┆ TX    ┆ USA     ┆ 30.686   ┆ -9.5018e1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 00V  ┆ Meadow Lake               ┆ Colorado Springs ┆ CO    ┆ USA     ┆ 38.946   ┆ -1.0457e2 

In [3]:
DF.tail(df)

{:ok, shape: (5, 7)
╭──────┬───────────────────────────┬─────────────┬───────┬─────────┬──────────┬────────────╮
│ iata ┆ name                      ┆ city        ┆ state ┆ country ┆ latitude ┆ longitude  │
│ ---  ┆ ---                       ┆ ---         ┆ ---   ┆ ---     ┆ ---      ┆ ---        │
│ str  ┆ str                       ┆ str         ┆ str   ┆ str     ┆ f64      ┆ f64        │
╞══════╪═══════════════════════════╪═════════════╪═══════╪═════════╪══════════╪════════════╡
│ ZEF  ┆ Elkin Municipal           ┆ Elkin       ┆ NC    ┆ USA     ┆ 36.28    ┆ -8.0786e1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ZER  ┆ Schuylkill Cty/Joe Zerbey ┆ Pottsville  ┆ PA    ┆ USA     ┆ 40.706   ┆ -7.6373e1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ZPH  ┆ Zephyrhills Municipal     ┆ Zephyrhills ┆ FL    ┆ USA     ┆ 28.228   ┆ -8.2156e1  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼

## Interact with predefined datasets

In `priv/datasets` there are many different datasets for you to use. These are good starting point if you'd like to explore the functionalities of ExPolars.

In [5]:
weather = ExPolars.Datasets.seattle_weather

shape: (1461, 6)
╭────────────┬───────────────┬──────────┬──────────┬──────┬─────────╮
│ date       ┆ precipitation ┆ temp_max ┆ temp_min ┆ wind ┆ weather │
│ ---        ┆ ---           ┆ ---      ┆ ---      ┆ ---  ┆ ---     │
│ str        ┆ f64           ┆ f64      ┆ f64      ┆ f64  ┆ str     │
╞════════════╪═══════════════╪══════════╪══════════╪══════╪═════════╡
│ 2012/01/01 ┆ 0.0           ┆ 12.8     ┆ 5        ┆ 4.7  ┆ drizzle │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2012/01/02 ┆ 10.9          ┆ 10.6     ┆ 2.8      ┆ 4.5  ┆ rain    │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2012/01/03 ┆ 0.8           ┆ 11.7     ┆ 7.2      ┆ 2.3  ┆ rain    │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2012/01/04 ┆ 20.3          ┆ 12.2     ┆ 5.6      ┆ 4.7  ┆ rain    │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ ...        ┆ ...           ┆ ...      ┆ ...      ┆ ...  ┆ ...     │
├╌╌

## Complicated Operations

You can do complicated operations with ExPolars.

Operations include:

- window, e.g. rolling_min, rolling_max
- aggregate, eg. min, max, mean, std, var
- comparison, e.g. ==, <>, <, >, <=, >=
- shift
- filter
- ...

For example, below filter operation is equal to `df[df["temp_min"] < -5.0]` in pandas. Here you can see how Python's grammar (and magic functions) help to make the expressiveness much easier.

In [17]:
df = DF.filter(weather, (DF.column(weather, "temp_min") < -5.0) )

{:ok, shape: (4, 6)
╭────────────┬───────────────┬──────────┬──────────┬──────┬─────────╮
│ date       ┆ precipitation ┆ temp_max ┆ temp_min ┆ wind ┆ weather │
│ ---        ┆ ---           ┆ ---      ┆ ---      ┆ ---  ┆ ---     │
│ str        ┆ f64           ┆ f64      ┆ f64      ┆ f64  ┆ str     │
╞════════════╪═══════════════╪══════════╪══════════╪══════╪═════════╡
│ 2013/12/07 ┆ 0.0           ┆ 0.0      ┆ -7.1e0   ┆ 3.1  ┆ sun     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2013/12/08 ┆ 0.0           ┆ 2.2      ┆ -6.6e0   ┆ 2.2  ┆ sun     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2014/02/05 ┆ 0.0           ┆ -5e-1    ┆ -5.5e0   ┆ 6.6  ┆ sun     │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ 2014/02/06 ┆ 0.0           ┆ -1.6e0   ┆ -6e0     ┆ 4.5  ┆ sun     │
╰────────────┴───────────────┴──────────┴──────────┴──────┴─────────╯}

## DataFrame Persistence

Right now ExPolars only support to persist the data frame to csv file. JSON/Parquet is not supported since the underlying library arrow hasn't support it: see: https://github.com/apache/arrow/blob/master/rust/parquet/README.md. 

In [19]:
{:ok, csv} = DF.to_csv(df)


ArgumentError: 1