In [1]:
%run "Ch0 - setup.ipynb"

Polars provides expressions/methods for horizontal aggregations like sum, min, mean, etc. by setting the argument axis=1. However, when you need a more complex aggregation the default methods provided by the Polars library may not be sufficient. That's when folds come in handy.

The Polars fold expression operates on columns for maximum speed. It utilizes the data layout very efficiently and often has vectorized execution.

Let's start with an example by implementing the sum operation ourselves, with a fold.

In [11]:
df = pl.DataFrame(
    {
        "a": [1, 2, 3],
        "b": [10, 20, 30],
    }
)
print(df)

out = df.select(
    pl.fold(acc=pl.lit(0), function=lambda acc, x: acc + x, exprs=pl.col("*")).alias("sum"),
    (pl.col('a') + pl.col('b')).alias('sum1')
)
print(out)

shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 10  │
│ 2   ┆ 20  │
│ 3   ┆ 30  │
└─────┴─────┘
shape: (3, 2)
┌─────┬──────┐
│ sum ┆ sum1 │
│ --- ┆ ---  │
│ i64 ┆ i64  │
╞═════╪══════╡
│ 11  ┆ 11   │
│ 22  ┆ 22   │
│ 33  ┆ 33   │
└─────┴──────┘


Conditional
In the case where you'd want to apply a condition/predicate on all columns in a DataFrame a fold operation can be a very concise way to express this.

In [36]:
df = pl.DataFrame(
    {
        "a": [1, 2, 3,2],
        "b": [0, 1, 2, 0],
    }
)

print(df)

print(
        df.filter(
            (pl.col('a') > 1 )
            # & (pl.col('b') > 1)
        )
    
    )

out = df.filter(
    pl.fold( # row wise
        acc=pl.lit(True),
        function=lambda acc, x: acc & x, # AND logic on columns per rows
        exprs=pl.col("*") > 1, # do this with all columns
    )
)
print(out)

shape: (4, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 0   │
│ 2   ┆ 1   │
│ 3   ┆ 2   │
│ 2   ┆ 0   │
└─────┴─────┘
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 2   ┆ 1   │
│ 3   ┆ 2   │
│ 2   ┆ 0   │
└─────┴─────┘
shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 3   ┆ 2   │
└─────┴─────┘


In [40]:
# Concat columns in string
df = pl.DataFrame(
    {
        "a": ["a", "b", "c"],
        "b": [1,2,3],
    }
)

out = df.select(
    [
        pl.concat_str(["a", "b"]),
    ]
)
print(out)


shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ str │
╞═════╡
│ a1  │
│ b2  │
│ c3  │
└─────┘
