<center>

# Polars - Introduction
Polars is an open-source library for data manipulation, known for being one of the fastest data processing solutions on a single machine. It features a well-structured, typed API that is both expressive and easy to use.

</center>

## Key Features
- **Fast:** Written from scratch in Rust, designed close to the machine and without external dependencies.
- **I/O:** First class support for all common data storage layers: local, cloud storage & databases.
- **Intuitive API:** Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query
  optimizer.
- **Out of Core:** The streaming API allows you to process your results without requiring all your data to be in memory at the same time.
- **Parallel:** Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
- **Vectorized Query Engine:** Using Apache Arrow, a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage.
- **GPU Support:** Optionally run queries on NVIDIA GPUs for maximum performance for in-memory workloads.

----

### Polars User Guide - [User-guide](https://docs.pola.rs/)

### Polars API Reference - [API-Reference](https://docs.pola.rs/api/python/stable/reference/index.html)

----

### All the Required Imports Goes in to this Cell

In [47]:
import polars as pl
from polars import DataFrame
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from datetime import datetime as dt

### Reading the input dataset `prices-split-adjusted.csv`[ [API Ref - I/O ](https://docs.pola.rs/api/python/stable/reference/api/polars.read_csv.html) ]

In [48]:
df: DataFrame = pl.read_csv("prices-split-adjusted.csv")
df.head()

date,symbol,open,close,low,high,volume
str,str,f64,f64,f64,f64,f64
"""2016-01-05""","""WLTW""",123.43,125.839996,122.309998,126.25,2163600.0
"""2016-01-06""","""WLTW""",125.239998,119.980003,119.940002,125.540001,2386400.0
"""2016-01-07""","""WLTW""",116.379997,114.949997,114.93,119.739998,2489500.0
"""2016-01-08""","""WLTW""",115.480003,116.620003,113.5,117.440002,2006300.0
"""2016-01-11""","""WLTW""",117.010002,114.970001,114.089996,117.330002,1408600.0


----
### Converting the date from `string` to actual `date` type using Expression's  `String` - `to_date` function [ [API Ref - Expression](https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.str.to_date.html) ]

In [49]:
df = df.with_columns([
    pl.col("date").str.to_date().alias("date")
]).sort(["symbol","date"])
df.head()

date,symbol,open,close,low,high,volume
date,str,f64,f64,f64,f64,f64
2010-01-04,"""A""",22.453504,22.389128,22.267525,22.62518,3815500.0
2010-01-05,"""A""",22.324749,22.145923,22.002861,22.331903,4186000.0
2010-01-06,"""A""",22.06724,22.06724,22.002861,22.174536,3243700.0
2010-01-07,"""A""",22.017168,22.038626,21.816881,22.04578,3095100.0
2010-01-08,"""A""",21.917024,22.031474,21.74535,22.06724,3733900.0


----
### Adding a New Column(s) - using Dataframes - Manipulation function - `with_columns` [ [API Ref - Manipulation](https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.with_columns.html) ]

In [50]:
df = df.with_columns([
    ((pl.col("low") + pl.col("high"))/2).alias("avg"),
    (pl.col("open") - pl.col("close")).alias("open_close_diff"),
    ((pl.col("open") - pl.col("close")) / pl.col("open") * 100 ).alias("percentage_change")
])
df.head()

date,symbol,open,close,low,high,volume,avg,open_close_diff,percentage_change
date,str,f64,f64,f64,f64,f64,f64,f64,f64
2010-01-04,"""A""",22.453504,22.389128,22.267525,22.62518,3815500.0,22.446352,0.064376,0.286709
2010-01-05,"""A""",22.324749,22.145923,22.002861,22.331903,4186000.0,22.167382,0.178825,0.801019
2010-01-06,"""A""",22.06724,22.06724,22.002861,22.174536,3243700.0,22.088698,0.0,0.0
2010-01-07,"""A""",22.017168,22.038626,21.816881,22.04578,3095100.0,21.931331,-0.021458,-0.097459
2010-01-08,"""A""",21.917024,22.031474,21.74535,22.06724,3733900.0,21.906295,-0.114449,-0.522193


----
### Generating Missing date records using Dataframes - `upsample()` function and filling the missing fields values using select expression `forward_fill()` function - [ [API Ref - upsample](https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.upsample.html)] [ [API Ref - forward_fill()](https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.forward_fill.html)]

In [51]:
df = df.upsample(time_column="date",every="1d",group_by=["symbol"]).select(pl.all().forward_fill()).sort(["symbol","date"])
df.head()

date,symbol,open,close,low,high,volume,avg,open_close_diff,percentage_change
date,str,f64,f64,f64,f64,f64,f64,f64,f64
2010-01-04,"""A""",22.453504,22.389128,22.267525,22.62518,3815500.0,22.446352,0.064376,0.286709
2010-01-05,"""A""",22.324749,22.145923,22.002861,22.331903,4186000.0,22.167382,0.178825,0.801019
2010-01-06,"""A""",22.06724,22.06724,22.002861,22.174536,3243700.0,22.088698,0.0,0.0
2010-01-07,"""A""",22.017168,22.038626,21.816881,22.04578,3095100.0,21.931331,-0.021458,-0.097459
2010-01-08,"""A""",21.917024,22.031474,21.74535,22.06724,3733900.0,21.906295,-0.114449,-0.522193
