## Reading CSV into a DataFrame

In [1]:
import polars as pl

# Read the CSV
df = pl.read_csv("sales_data.csv")

# Print the DataFrame
print(df)

shape: (6, 5)
┌────────────┬──────────┬──────────┬───────┬────────┐
│ date       ┆ product  ┆ quantity ┆ price ┆ region │
│ ---        ┆ ---      ┆ ---      ┆ ---   ┆ ---    │
│ str        ┆ str      ┆ i64      ┆ f64   ┆ str    │
╞════════════╪══════════╪══════════╪═══════╪════════╡
│ 2023-01-01 ┆ Widget A ┆ 10       ┆ 25.5  ┆ North  │
│ 2023-01-02 ┆ Widget B ┆ 15       ┆ 30.0  ┆ South  │
│ 2023-01-03 ┆ Widget A ┆ 20       ┆ 25.5  ┆ North  │
│ 2023-01-04 ┆ Widget C ┆ 5        ┆ 40.0  ┆ East   │
│ 2023-01-05 ┆ Widget B ┆ 10       ┆ 30.0  ┆ South  │
│ 2023-01-06 ┆ Widget A ┆ 25       ┆ 25.5  ┆ North  │
└────────────┴──────────┴──────────┴───────┴────────┘


* Polars infers data types automatically (e.g., quantity as int64, price as float64).
* Use `pl.read_csv('file.csv', infer_schema_length=0)` if you want to force string types.

## Basic Operations: Select, Filter, Sort

Polars uses an expressive API with methods like `select`, `filter`, and `sort`.

### Example: Select Columns and Add a New One

In [2]:
# Select specific columns and create a new 'revenue' column
df_selected = df.select(
    pl.col("date"),
    pl.col("product"),
    pl.col("quantity"),
    pl.col("price"),
    (pl.col("quantity") * pl.col("price")).alias(
        "revenue"
    ),  # New column: revenue = quantity * price
)

print(df_selected)

shape: (6, 5)
┌────────────┬──────────┬──────────┬───────┬─────────┐
│ date       ┆ product  ┆ quantity ┆ price ┆ revenue │
│ ---        ┆ ---      ┆ ---      ┆ ---   ┆ ---     │
│ str        ┆ str      ┆ i64      ┆ f64   ┆ f64     │
╞════════════╪══════════╪══════════╪═══════╪═════════╡
│ 2023-01-01 ┆ Widget A ┆ 10       ┆ 25.5  ┆ 255.0   │
│ 2023-01-02 ┆ Widget B ┆ 15       ┆ 30.0  ┆ 450.0   │
│ 2023-01-03 ┆ Widget A ┆ 20       ┆ 25.5  ┆ 510.0   │
│ 2023-01-04 ┆ Widget C ┆ 5        ┆ 40.0  ┆ 200.0   │
│ 2023-01-05 ┆ Widget B ┆ 10       ┆ 30.0  ┆ 300.0   │
│ 2023-01-06 ┆ Widget A ┆ 25       ┆ 25.5  ┆ 637.5   │
└────────────┴──────────┴──────────┴───────┴─────────┘


### Example: Filter Rows

In [3]:
# Filter rows where quantity > 10 and region is 'North'
df_filtered = df.filter((pl.col("quantity") > 10) & (pl.col("region") == "North"))

print(df_filtered)

shape: (2, 5)
┌────────────┬──────────┬──────────┬───────┬────────┐
│ date       ┆ product  ┆ quantity ┆ price ┆ region │
│ ---        ┆ ---      ┆ ---      ┆ ---   ┆ ---    │
│ str        ┆ str      ┆ i64      ┆ f64   ┆ str    │
╞════════════╪══════════╪══════════╪═══════╪════════╡
│ 2023-01-03 ┆ Widget A ┆ 20       ┆ 25.5  ┆ North  │
│ 2023-01-06 ┆ Widget A ┆ 25       ┆ 25.5  ┆ North  │
└────────────┴──────────┴──────────┴───────┴────────┘


### Example: Sort Data

In [4]:
# Sort by quantity descending
df_sorted = df.sort("quantity", descending=True)

print(df_sorted)

shape: (6, 5)
┌────────────┬──────────┬──────────┬───────┬────────┐
│ date       ┆ product  ┆ quantity ┆ price ┆ region │
│ ---        ┆ ---      ┆ ---      ┆ ---   ┆ ---    │
│ str        ┆ str      ┆ i64      ┆ f64   ┆ str    │
╞════════════╪══════════╪══════════╪═══════╪════════╡
│ 2023-01-06 ┆ Widget A ┆ 25       ┆ 25.5  ┆ North  │
│ 2023-01-03 ┆ Widget A ┆ 20       ┆ 25.5  ┆ North  │
│ 2023-01-02 ┆ Widget B ┆ 15       ┆ 30.0  ┆ South  │
│ 2023-01-01 ┆ Widget A ┆ 10       ┆ 25.5  ┆ North  │
│ 2023-01-05 ┆ Widget B ┆ 10       ┆ 30.0  ┆ South  │
│ 2023-01-04 ┆ Widget C ┆ 5        ┆ 40.0  ┆ East   │
└────────────┴──────────┴──────────┴───────┴────────┘


## 6. Group By and Aggregate
Polars excels at aggregations.

### Example: Group by Region and Calculate Totals

In [5]:
# Group by region, sum quantity and revenue
df_grouped = df.group_by("region").agg(
    pl.col("quantity").sum().alias("total_quantity"),
    (pl.col("quantity") * pl.col("price")).sum().alias("total_revenue"),
)

print(df_grouped)

shape: (3, 3)
┌────────┬────────────────┬───────────────┐
│ region ┆ total_quantity ┆ total_revenue │
│ ---    ┆ ---            ┆ ---           │
│ str    ┆ i64            ┆ f64           │
╞════════╪════════════════╪═══════════════╡
│ North  ┆ 55             ┆ 1402.5        │
│ East   ┆ 5              ┆ 200.0         │
│ South  ┆ 25             ┆ 750.0         │
└────────┴────────────────┴───────────────┘


## 7. Joins
Join DataFrames like in SQL.

### Example: Inner Join with Products Data

In [6]:
# Load the second CSV
df_products = pl.read_csv("products.csv")

# Inner join on 'product'
df_joined = df.join(df_products, on="product", how="inner")

print(df_joined)

shape: (6, 6)
┌────────────┬──────────┬──────────┬───────┬────────┬─────────────┐
│ date       ┆ product  ┆ quantity ┆ price ┆ region ┆ category    │
│ ---        ┆ ---      ┆ ---      ┆ ---   ┆ ---    ┆ ---         │
│ str        ┆ str      ┆ i64      ┆ f64   ┆ str    ┆ str         │
╞════════════╪══════════╪══════════╪═══════╪════════╪═════════════╡
│ 2023-01-01 ┆ Widget A ┆ 10       ┆ 25.5  ┆ North  ┆ Electronics │
│ 2023-01-02 ┆ Widget B ┆ 15       ┆ 30.0  ┆ South  ┆ Home Goods  │
│ 2023-01-03 ┆ Widget A ┆ 20       ┆ 25.5  ┆ North  ┆ Electronics │
│ 2023-01-04 ┆ Widget C ┆ 5        ┆ 40.0  ┆ East   ┆ Electronics │
│ 2023-01-05 ┆ Widget B ┆ 10       ┆ 30.0  ┆ South  ┆ Home Goods  │
│ 2023-01-06 ┆ Widget A ┆ 25       ┆ 25.5  ┆ North  ┆ Electronics │
└────────────┴──────────┴──────────┴───────┴────────┴─────────────┘


- Note: 'Widget D' from `products.csv` is not in `sales_data.csv`, so it's excluded in the inner join.

## 8. Writing to CSV
Save your DataFrame back to a file.

In [7]:
# Write the joined DataFrame to a new CSV
df_joined.write_csv("joined_data.csv")
print("Written to joined_data.csv")

Written to joined_data.csv


## 9. Advanced: Lazy Evaluation
Polars supports lazy mode for optimized execution (e.g., for large datasets). Use `pl.scan_csv` instead of `pl.read_csv`.

### Example: Lazy Query

In [8]:
# Lazy load and query
lazy_df = (
    pl.scan_csv("sales_data.csv").filter(pl.col("quantity") > 10).collect()
)  # .collect() executes the query

print(lazy_df)

shape: (3, 5)
┌────────────┬──────────┬──────────┬───────┬────────┐
│ date       ┆ product  ┆ quantity ┆ price ┆ region │
│ ---        ┆ ---      ┆ ---      ┆ ---   ┆ ---    │
│ str        ┆ str      ┆ i64      ┆ f64   ┆ str    │
╞════════════╪══════════╪══════════╪═══════╪════════╡
│ 2023-01-02 ┆ Widget B ┆ 15       ┆ 30.0  ┆ South  │
│ 2023-01-03 ┆ Widget A ┆ 20       ┆ 25.5  ┆ North  │
│ 2023-01-06 ┆ Widget A ┆ 25       ┆ 25.5  ┆ North  │
└────────────┴──────────┴──────────┴───────┴────────┘


This optimizes the query plan before execution, which is great for big data.

## Next Steps
- Explore more: Check Polars docs (https://pola.rs/) for window functions, pivots, and integrations with Arrow/Parquet.
- Performance: Polars is multi-threaded—try it on larger datasets!
- Common Pitfalls: Polars is immutable (operations return new DataFrames), and it uses expressions (e.g., `pl.col`) for selections.

&mdash; *Grok-4*