
# Function Application

In [4]:
import pandas as pd
import numpy as np

While coding, one has to apply functions to Pandas objects. To apply these pandas function applications – pipe(), apply(), and applymap(), you should know these three important methods. The knowledge of these methods helps us to choose the method of application wisely while coding. The appropriate method for applying the functions depends on whether your function expects to operate element-wise, row wise, or column wise.

- Tablewise Function Application: `pipe()`
- Row or Column-wise Function Application: `apply()`
- Applying Elementwise Functions: `applymap()`

<a class="anchor" id="tablewise_function_application"></a>
## Tablewise function application

The custom operations performed by passing a function and an appropriate number of parameters. These are known as pipe arguments. Hence, the operation is performed on the entire DataFrame or Series. When we want to apply one function to a series or DataFrame, then apply another, then another, and so on, the notation can become messy. It can also makes the program more prone to error. Here, pipe() becomes useful.

pipe syntax : 
```python
DataFrame.pipe(func, *args, **kwargs)    
```

example on series :

In [5]:
def adder(ele1,ele2):
    return ele1+ele2

In [7]:
data_s1 = pd.Series([11,21,31,41,51])
data_s1

0    11
1    21
2    31
3    41
4    51
dtype: int64

In [8]:
data_s1.pipe(adder,3)

0    14
1    24
2    34
3    44
4    54
dtype: int64

example on dataframe :

In [14]:
>>> data_df1=pd.DataFrame(6*np.random.randn(6,3),columns=['c1','c2','c3'])
>>> data_df1

Unnamed: 0,c1,c2,c3
0,-0.87662,9.783652,-9.054336
1,0.822029,-7.189354,-3.101616
2,-4.417143,-12.818045,0.686851
3,-6.362847,-4.155875,2.411202
4,4.957938,-0.663725,-11.147275
5,7.06422,-10.636215,8.474016


In [12]:
data_df1.pipe(adder,4)

Unnamed: 0,c1,c2,c3
0,-0.985653,2.294141,11.061124
1,9.677193,0.287188,-6.673374
2,-0.542047,9.885663,9.60251
3,6.182124,1.844314,-3.915804
4,5.015571,7.221821,12.54351
5,6.705111,2.682463,3.789743


another example of pipe

In [None]:
def extract_city_name(df):
    """
 Chicago, IL -> Chicago for city_name column
 """
    df["city_name"] = df["city_and_code"].str.split(",").str.get(0)
    return df

In [None]:
def add_country_name(df, country_name=None):
    """
 Chicago -> Chicago-US for city_name column
 """
    col = "city_name"
    df["city_and_country"] = df[col] + ' - ' + country_name
    return df

In [None]:
df_p = pd.DataFrame({"city_and_code": ["Chicago, IL"]})

In [None]:
df_p

Unnamed: 0,city_and_code
0,"Chicago, IL"


`extract_city_name` and `add_country_name` are functions taking and returning `DataFrames`.

Now compare the following:

In [None]:
add_country_name(extract_city_name(df_p), country_name="US")

Unnamed: 0,city_and_code,city_name,city_and_country
0,"Chicago, IL",Chicago,Chicago - US


Is equivalent to:

In [None]:
df_p.pipe(extract_city_name).pipe(add_country_name, country_name="US")

Unnamed: 0,city_and_code,city_name,city_and_country
0,"Chicago, IL",Chicago,Chicago - US


pandas encourages the second style, which is known as method chaining.
`pipe` makes it easy to use your own or another library’s functions
in method chains, alongside pandas’ methods.

In the example above, the functions `extract_city_name` and `add_country_name` each expected a `DataFrame` as the first positional argument.
What if the function you wish to apply takes its data as, say, the second argument?
In this case, provide `pipe` with a tuple of `(callable, data_keyword)`.
`.pipe` will route the `DataFrame` to the argument specified in the tuple.

The pipe method is inspired by unix pipes and more recently [dplyr](https://github.com/tidyverse/dplyr) and [magrittr](https://github.com/tidyverse/magrittr), which
have introduced the popular `(%>%)` (read pipe) operator for [R](https://www.r-project.org).
The implementation of `pipe` here is quite clean and feels right at home in Python.
We encourage you to view the source code of [`pipe()`](../reference/api/pandas.DataFrame.pipe.html#pandas.DataFrame.pipe "pandas.DataFrame.pipe").

<a class="anchor" id="row_or_column-wise_function_application"></a>
## Row or column-wise function application

You may apply arbitrary functions to the axes of a DataFrame or Panel by using the apply() method. It can also be applied to a Series. It takes an optional axis argument. By default, the operation will be performed column-wise, taking every column as an array. It enables the user, to pass a function and then apply it to all the values of the DataFrame or Series. It is a huge improvement for the library as it allows the segregation of data according to the given conditions, making it efficiently usable in machine learning and data science.

syntax:
```python
DataFrame.apply(func, axis=0, raw=False, args=() , **kwargs)
```

example : 

In [34]:
def sq(ele1):
    return ele1*ele1

In [35]:
data_s1.apply(sq)

0     121
1     441
2     961
3    1681
4    2601
dtype: int64

example2:

In [38]:
df = pd.DataFrame(
    {
        "one": pd.Series(np.random.randn(3), index=["a", "b", "c"]),
        "two": pd.Series(np.random.randn(4), index=["a", "b", "c", "d"]),
        "three": pd.Series(np.random.randn(3), index=["b", "c", "d"]),
    }
)

In [39]:
df

Unnamed: 0,one,two,three
a,0.710409,0.41092,
b,0.875225,-0.604144,0.337241
c,-2.2759,-2.534002,-0.28352
d,,-0.761122,-0.222561


In [40]:
df.apply(np.mean)

one     -0.230089
two     -0.872087
three   -0.056280
dtype: float64

In [41]:
df.apply(np.mean, axis=1)

a    0.560664
b    0.202774
c   -1.697807
d   -0.491841
dtype: float64

In [42]:
df.apply(lambda x: x.max() - x.min())

one      3.151125
two      2.944922
three    0.620761
dtype: float64

In [43]:
df.apply(np.cumsum)

Unnamed: 0,one,two,three
a,0.710409,0.41092,
b,1.585634,-0.193224,0.337241
c,-0.690266,-2.727226,0.053721
d,,-3.488349,-0.16884


In [44]:
df.apply(np.exp)

Unnamed: 0,one,two,three
a,2.034823,1.508205,
b,2.399416,0.546542,1.401076
c,0.102704,0.079341,0.753128
d,,0.467142,0.800466


The [`apply()`](../reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply") method will also dispatch on a string method name.

In [15]:
df.apply("mean")

one      0.046925
two     -0.068739
three    0.626751
dtype: float64

In [16]:
df.apply("mean", axis=1)

a   -0.007249
b    0.283281
c    0.118333
d    0.277863
dtype: float64

The return type of the function passed to [`apply()`](../reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply") affects the
type of the final output from `DataFrame.apply` for the default behaviour:

- If the applied function returns a Series, the final output is a DataFrame. The columns match the index of the Series returned by the applied function.
- If the applied function returns any other type, the final output is a Series.

This default behaviour can be overridden using the `result_type`, which
accepts three options: `reduce`, `broadcast`, and `expand`.
These will determine how list-likes return values expand (or not) to a `DataFrame`.

[`apply()`](../reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply") combined with some cleverness can be used to answer many questions
about a data set. For example, suppose we wanted to extract the date where the
maximum value for each column occurred:

In [46]:
tsdf = pd.DataFrame(
    np.random.randn(1000, 3),
    columns=["A", "B", "C"],
    index=pd.date_range("1/1/2000", periods=1000),
)

In [47]:
tsdf

Unnamed: 0,A,B,C
2000-01-01,-1.471362,-0.194772,0.733680
2000-01-02,0.080383,0.174907,0.356573
2000-01-03,0.715979,-2.229359,0.023912
2000-01-04,-0.254742,-0.789298,0.948314
2000-01-05,-0.744724,-0.096009,0.544551
...,...,...,...
2002-09-22,-0.653686,1.062378,-1.025522
2002-09-23,-0.395147,-0.634123,-1.503995
2002-09-24,-1.418340,0.349916,1.751640
2002-09-25,1.074147,-0.425855,-0.537850


In [48]:
tsdf.apply(lambda x: x.idxmax())

A   2001-09-19
B   2000-10-19
C   2001-12-10
dtype: datetime64[ns]

You may also pass additional arguments and keyword arguments to the [`apply()`](../reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply")
method. For instance, consider the following function you would like to apply:

In [20]:
def subtract_and_divide(x, sub, divide=1):
    return (x - sub) / divide

You may then apply this function as follows:

In [21]:
df.apply(subtract_and_divide, args=(5, 3))

Unnamed: 0,one,two,three
a,-1.831129,-1.507037,
b,-1.698957,-1.658219,-1.359542
c,-1.422989,-1.925227,-1.533451
d,,-1.667835,-1.480257


In [22]:
df.apply(subtract_and_divide, args=(5,), divide=3)

Unnamed: 0,one,two,three
a,-1.831129,-1.507037,
b,-1.698957,-1.658219,-1.359542
c,-1.422989,-1.925227,-1.533451
d,,-1.667835,-1.480257


In [23]:
df.apply(subtract_and_divide, sub=5, divide=3)

Unnamed: 0,one,two,three
a,-1.831129,-1.507037,
b,-1.698957,-1.658219,-1.359542
c,-1.422989,-1.925227,-1.533451
d,,-1.667835,-1.480257


Another useful feature is the ability to pass Series methods to carry out some
Series operation on each column or row:

In [24]:
s = pd.Series([0, 2, np.nan, 8])
s.interpolate(method='polynomial', order=2)

0    0.000000
1    2.000000
2    4.666667
3    8.000000
dtype: float64

Let's apply interpolation on tsdf:

In [25]:
# First let's add some null values
tsdf.iloc[3] = np.nan

In [26]:
tsdf.head()

Unnamed: 0,A,B,C
2000-01-01,1.171402,0.061935,-1.178171
2000-01-02,-0.160439,0.235054,0.774846
2000-01-03,-0.651207,1.354219,-1.026847
2000-01-04,,,
2000-01-05,-0.08739,1.231878,-0.568446


In [27]:
tsdf.apply(pd.Series.interpolate, method='linear').head()

Unnamed: 0,A,B,C
2000-01-01,1.171402,0.061935,-1.178171
2000-01-02,-0.160439,0.235054,0.774846
2000-01-03,-0.651207,1.354219,-1.026847
2000-01-04,-0.369299,1.293049,-0.797647
2000-01-05,-0.08739,1.231878,-0.568446


Finally, `apply()` takes an argument `raw` which is False by default, which
converts each row or column into a Series before applying the function. When
set to True, the passed function will instead receive an ndarray object, which
has positive performance implications if you do not need the indexing
functionality.

<a class="anchor" id="applying_elementwise_functions"></a>
## Applying elementwise functions

Since not all functions can be vectorized (accept NumPy arrays and return
another array or value), the methods [`applymap()`](../reference/api/pandas.DataFrame.applymap.html#pandas.DataFrame.applymap "pandas.DataFrame.applymap") on DataFrame
and analogously [`map()`](../reference/api/pandas.Series.map.html#pandas.Series.map "pandas.Series.map") on Series accept any Python function taking
a single value and returning a single value. For example:

In [66]:
df

Unnamed: 0,one,two,three
a,-0.493386,0.478888,
b,-0.096872,0.025342,0.921374
c,0.731032,-0.775681,0.399647
d,,-0.003504,0.55923


In [68]:
def f(x):
    return len(str(x))

In [57]:
df["len_one"] = df["one"].map(f)

In [58]:
df

Unnamed: 0,one,two,three,len_one
a,0.710409,0.41092,,18
b,0.875225,-0.604144,0.337241,17
c,-2.2759,-2.534002,-0.28352,18
d,,-0.761122,-0.222561,3


In [69]:
df.applymap(f)

Unnamed: 0,one,two,three,len_one
a,18,18,3,2
b,17,19,19,2
c,18,17,19,2
d,3,18,19,1


[`Series.map()`](../reference/api/pandas.Series.map.html#pandas.Series.map "pandas.Series.map") has an additional feature; it can be used to easily
“link” or “map” values defined by a secondary series:

In [70]:
s = pd.Series(
    ["six", "seven", "six", "seven", "six"], index=["a", "b", "c", "d", "e"]
)

In [71]:
t = pd.Series({"six": 6.0, "seven": 7.0})

In [72]:
s

a      six
b    seven
c      six
d    seven
e      six
dtype: object

In [73]:
s.map(t)

a    6.0
b    7.0
c    6.0
d    7.0
e    6.0
dtype: float64


## Vectorized string methods : dont use apply for string methods

Series is equipped with a set of string processing methods that make it easy to
operate on each element of the array. Perhaps most importantly, these methods
exclude missing/NA values automatically. These are accessed via the Series’s
`str` attribute and generally have names matching the equivalent (scalar)
built-in string methods. For example:

In [21]:
s = pd.Series(
    ["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"],
    dtype="string",
)

In [22]:
s

0       A
1       B
2       C
3    Aaba
4    Baca
5    <NA>
6    CABA
7     dog
8     cat
dtype: string

In [23]:
s.str.lower()

0       a
1       b
2       c
3    aaba
4    baca
5    <NA>
6    caba
7     dog
8     cat
dtype: string

In [24]:
s.str.upper()

0       A
1       B
2       C
3    AABA
4    BACA
5    <NA>
6    CABA
7     DOG
8     CAT
dtype: string

In [28]:
s.str.len()

0    14
1    20
2    11
dtype: int64

In [31]:
s = pd.Series(
    ['Nika Shakarami', 'Sarina EsmaeilZadeh', 'Mahsa Amini']
)

In [32]:
s.str.split(' ')

0         [Nika, Shakarami]
1    [Sarina, EsmaeilZadeh]
2            [Mahsa, Amini]
dtype: object

In [33]:
s.str.split(' ').str.get(0)

0      Nika
1    Sarina
2     Mahsa
dtype: object

In [34]:
s.str.split(' ', expand=True)

Unnamed: 0,0,1
0,Nika,Shakarami
1,Sarina,EsmaeilZadeh
2,Mahsa,Amini


Powerful pattern-matching methods are provided as well, but note that
pattern-matching generally uses [regular expressions](https://docs.python.org/3/library/re.html) by default (and in some cases
always uses them).

> **Note:**
> 
> Prior to pandas 1.0, string methods were only available on `object` -dtype
`Series`. pandas 1.0 added the [`StringDtype`](../reference/api/pandas.StringDtype.html#pandas.StringDtype "pandas.StringDtype") which is dedicated
to strings. See [Text data types](text.html#text-types) for more.

Please see [Vectorized String Methods](https://pandas.pydata.org/docs/user_guide/text.html#text-string-methods) for a complete
description.