# Pandas' Function application

See [Pandas' User Guide > Essential basic functionality > Function application](https://pandas.pydata.org/docs/user_guide/basics.html#function-application)

In [None]:
import pandas as pd
import numpy as np

Functions can be applied:

- to each row or column (row/column-wise)
- to each individual element (element-wise)

Or it may be applied to the data structure as a whole, whether `DataFrame` or `Series`.

### 1. Tablewise Function Application: [`pipe()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pipe.html#pandas.DataFrame.pipe)

A pipeline is a sequence of data processing steps. In Pandas:

- input: `DataFrame`, output: `DataFrame`
- input: `Series`, output: `Series`

Example:

In [None]:
def extract_city_name(df):
    """
    Chicago, IL -> Chicago for city_name column
    """
    df["city_name"] = df["city_and_code"].str.split(",").str.get(0)
    return df

def add_country_name(df, country_name=None):
    """
    Chicago -> Chicago-US for city_name column
    """
    col = "city_name"
    df["city_and_country"] = df[col] + country_name
    return df

df_p = pd.DataFrame({"city_and_code": ["Chicago, IL"]})

Regular function calling works, but it can get messy:

In [None]:
add_country_name(extract_city_name(df_p), country_name="US")

Unnamed: 0,city_and_code,city_name,city_and_country
0,"Chicago, IL",Chicago,ChicagoUS


The issue with nested function calls is that they can be hard to read. For example, consider the following code (5 steps):

```python
result = format_currency(
    calculate_tax(
        add_country_name(
            extract_city_name(
                clean_date_format(df_p)
            ), 
            country_name="US"
        ), 
        tax_rate=0.08
    )
)
```

The issue with this **inside-out** is:

- Can you easily see the order of operations? (`clean_date_format -> extract_city_name -> ...`)
- Can you easily add or remove steps?
- Can you easily disable just one step for debugging?
- Can you easily tell which parameters go with which function?

Using the `pipe()` you can keep the logic linear (top-to-bottom) and improve readability:

```python
result = (df_p
    .pipe(clean_date_format)
    .pipe(extract_city_name)
    .pipe(add_country_name, country_name="US")
    .pipe(calculate_tax, tax_rate=0.08)
    .pipe(format_currency)
)
```
> Note: The parentheses around the entire expression are necessary to allow line breaks.

Advantages of using `pipe()`:

- The order is clear (top-to-bottom).
- You can easily add/remove steps.
- You can easily disable just one step for debugging.
- You can easily tell which parameters go with which function.

Let's apply that to our example functions:

In [None]:
(
    df_p
    .pipe(extract_city_name)
    .pipe(add_country_name, country_name="US")
)

Unnamed: 0,city_and_code,city_name,city_and_country
0,"Chicago, IL",Chicago,ChicagoUS


### 2. Row or Column-wise Function Application: [`apply(func, axis=0)`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply)

`axis`: Axis along which the function is applied:

- `0` or `'index'`: apply function to each column.
- `1` or `'columns'`: apply function to each row.

letâ€™s create some example objects:

In [None]:
index = pd.date_range("1/1/2000", periods=8)
s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=["A", "B", "C"])

In [None]:
def custom_mean(x: pd.Series):
    return np.mean(x)

In [None]:
df.apply(custom_mean, axis=0) # column-wise mean

A   -0.310340
B   -0.472755
C    0.325519
dtype: float64

In [None]:
df.apply(custom_mean, axis=1) # row-wise mean

2000-01-01   -0.239737
2000-01-02   -0.155064
2000-01-03   -0.175493
2000-01-04    0.373051
2000-01-05   -1.122354
2000-01-06    0.447404
2000-01-07   -0.635230
2000-01-08    0.287220
Freq: D, dtype: float64

Using a broadcastable operation (like addition, multiplication, etc.) means the axis parameter doesn't matter:

- Example: `axis=0` (column-wise) or `axis=1` (row-wise) when doing `x + 10`

In [None]:
def my_square(x: pd.Series):
    return x ** 2

# Equal results whether applied column-wise or row-wise
df.apply(my_square, axis=0) == df.apply(my_square, axis=1)

Unnamed: 0,A,B,C
2000-01-01,True,True,True
2000-01-02,True,True,True
2000-01-03,True,True,True
2000-01-04,True,True,True
2000-01-05,True,True,True
2000-01-06,True,True,True
2000-01-07,True,True,True
2000-01-08,True,True,True


In [None]:
# Using lambda functions
df.apply(lambda x: np.mean(x), axis=1) # row-wise mean

2000-01-01   -0.239737
2000-01-02   -0.155064
2000-01-03   -0.175493
2000-01-04    0.373051
2000-01-05   -1.122354
2000-01-06    0.447404
2000-01-07   -0.635230
2000-01-08    0.287220
Freq: D, dtype: float64

### 3. Applying Elementwise Functions: [`map()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.map.html#pandas.DataFrame.map)

The method `map()`:

- Accepts any Python function taking a single value and returning a single value
- Works for both `Series` and `DataFrame`

For example:

In [None]:
def f(x: pd.DataFrame | pd.Series):
    return x + 10

In [None]:
df.map(f)

Unnamed: 0,A,B,C
2000-01-01,9.728402,8.549829,9.816187
2000-01-02,10.260756,9.561032,9.648915
2000-01-03,9.320493,10.089545,11.456899
2000-01-04,9.237168,9.664241,10.353561
2000-01-05,9.387044,8.610235,10.95588
2000-01-06,9.158378,9.305512,10.49878
2000-01-07,9.265393,9.797601,8.91278
2000-01-08,8.430285,10.316812,8.396696


In [None]:
df['A'].map(f)

2000-01-01     9.728402
2000-01-02    10.260756
2000-01-03     9.320493
2000-01-04     9.237168
2000-01-05     9.387044
2000-01-06     9.158378
2000-01-07     9.265393
2000-01-08     8.430285
Freq: D, Name: A, dtype: float64