# Basics - Apply, Map and Vectorised Functions ( review again )

In [28]:
import pandas as pd
import numpy as np

data = np.round(np.random.normal(size=(4, 3)), 2)
df = pd.DataFrame(data, columns=["A", "B", "C"])
df.head()

Unnamed: 0,A,B,C
0,-0.24,2.08,-1.1
1,0.81,-0.74,-0.58
2,-0.2,-1.74,-0.41
3,-0.63,1.79,-0.7


## Apply

Used to execute an arbitrary function again an entire dataframe, or a subection. Applies in a vectorised fashion.

In [47]:
 df.apply(lambda x: 1 + np.abs(x))

Unnamed: 0,A,B,C
0,1.24,3.08,2.1
1,1.81,1.74,1.58
2,1.2,2.74,1.41
3,1.63,2.79,1.7


In [3]:
df.A.apply(np.abs)

0    0.55
1    0.27
2    0.28
3    0.22
Name: A, dtype: float64

In [4]:
#def double_if_positive(x):
#    if x > 0:
#        return 2 * x
#    return x
#
#df.apply(double_if_positive)

In [5]:
def double_if_positive(x):
    x[x > 0] *= 2
    return x

df.apply(double_if_positive)

Unnamed: 0,A,B,C
0,1.1,-0.26,0.34
1,0.54,1.16,-0.54
2,0.56,-0.79,-0.04
3,-0.22,0.66,-0.38


In [6]:
df

Unnamed: 0,A,B,C
0,1.1,-0.26,0.34
1,0.54,1.16,-0.54
2,0.56,-0.79,-0.04
3,-0.22,0.66,-0.38


In [7]:
def double_if_positive(x):
    x = x.copy()
    x[x > 0] *= 2
    return x

df.apply(double_if_positive, raw=True)

Unnamed: 0,A,B,C
0,2.2,-0.26,0.68
1,1.08,2.32,-0.54
2,1.12,-0.79,-0.04
3,-0.22,1.32,-0.38


## Map

Similar to apply, but operators on Series, and uses dictionary based inputs rather than an array of values.

Thing to distinguish, map and apply, you'll remember that you can run a play both on the
data frame and on a series as well. Map can only be run on series.


In [8]:
series = pd.Series(["Steve", "Alex", "Jess", "Mark"])

In [9]:
series.map({"Steve": "Stephen"})

0    Stephen
1        NaN
2        NaN
3        NaN
dtype: object

In [10]:
series.map(lambda d: f"I am {d}")

0    I am Steve
1     I am Alex
2     I am Jess
3     I am Mark
dtype: object

## Vectorised functions

Pandas and numpy obviously have tons of these, here are some examples

In [11]:
display(df, df.abs())

Unnamed: 0,A,B,C
0,1.1,-0.26,0.34
1,0.54,1.16,-0.54
2,0.56,-0.79,-0.04
3,-0.22,0.66,-0.38


Unnamed: 0,A,B,C
0,1.1,0.26,0.34
1,0.54,1.16,0.54
2,0.56,0.79,0.04
3,0.22,0.66,0.38


In [12]:
series = pd.Series(["Obi-Wan Kenobi", "Luke Skywalker", "Han Solo", "Leia Organa"])

In [13]:
"Luke Skywalker".split()

['Luke', 'Skywalker']

In [51]:
series.str.split(expand=True)

Unnamed: 0,0,1
0,Obi-Wan,Kenobi
1,Luke,Skywalker
2,Han,Solo
3,Leia,Organa


In [15]:
series.str.contains("Skywalker")

0    False
1     True
2    False
3    False
dtype: bool

In [16]:
series.str.upper().str.split()

0    [OBI-WAN, KENOBI]
1    [LUKE, SKYWALKER]
2          [HAN, SOLO]
3       [LEIA, ORGANA]
dtype: object

## User defined functions

Lets investigate a super simple example of trying to find the hypotenuse given x and y distances.


In [67]:
data2 = np.random.normal(10, 2, size=(100000, 2))
df2 = pd.DataFrame(data2, columns=["x", "y"])
df2

Unnamed: 0,x,y
0,9.262179,7.475419
1,10.777538,10.816566
2,12.914274,11.716087
3,12.676722,6.256489
4,10.699370,11.373505
...,...,...
99995,12.702670,8.271747
99996,6.785199,12.436718
99997,8.333232,12.790576
99998,9.629266,10.756780


In [18]:
hypot = (df2.x**2 + df2.y**2)**0.5
print(hypot[0])

17.98377922628161


In [54]:
def hypot1(x, y):
    return np.sqrt(x**2 + y**2)

h1 = []
for index, (x, y) in df2.iterrows():
    h1.append(hypot1(x, y))
print(h1[0])

17.98377922628161


In [55]:
def hypot2(row):
    return np.sqrt(row.x**2 + row.y**2)

h2 = df2.apply(hypot2, axis=1)
print(h2[0])

17.98377922628161


In [21]:
def hypot3(xs, ys):
    return np.sqrt(xs**2 + ys**2)
h3 = hypot3(df2.x, df2.y)
print(h3[0])

17.98377922628161


Vectorising everything you can is the key to speeding up your code. Once you've done that, you should use other tools to investigate. PyCharm Professional has a great optimisation tool built in. Jupyter has %lprun (line profiler) command you can find here: https://github.com/rkern/line_profiler

### Recap

* apply # ve apply and apply runs in vectorized fashion, either column Map runs only on series and it does do everything element by element.by column or row by row, depending on what you tell it.
* map # Map runs only on series and it does do everything element by element.
* .str & similar