<a href="https://colab.research.google.com/github/vcdemy/pandas/blob/main/pandas_map%2Capply%2Capplymap%2Cagg.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 對Series/DataFrame的內容做函式運算

In [6]:
import numpy as np
import pandas as pd

## .map()

https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html

In [2]:
names = ['John', 'Tom', 'Matt']

In [3]:
s = pd.Series(names)

In [4]:
# replace 只會替換掉找的到的值，其他的不改變。
# map 會把找不到的值都換成 NaN。
s.map({'John':'Josh', 'Tom':'Tim'})

0    Josh
1     Tim
2     NaN
dtype: object

In [5]:
s.map(lambda x: f"He is {x}")

0    He is John
1     He is Tom
2    He is Matt
dtype: object

## .apply()

針對單一個欄或列使用指定的函式做運算。

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

我們可以這樣做類比，就像在numpy裡面有ufunc可以針對ndarray做運算一樣。在DataFrame中，row跟column是Series，我們可以使用apply去針對整個Series做特定的函式運算。

In [40]:
np.random.seed(987)
data = np.random.randint(1, 6, (4, 4))

In [41]:
df = pd.DataFrame(data)

In [42]:
df.columns = ['col1', 'col2', 'col3', 'col4']
df.index = ['row1', 'row2', 'row3', 'row4']

In [16]:
df

Unnamed: 0,col1,col2,col3,col4
row1,4,2,4,3
row2,3,4,3,5
row3,3,5,4,2
row4,4,3,1,3


In [17]:
df.apply(lambda x:x**2)

Unnamed: 0,col1,col2,col3,col4
row1,16,4,16,9
row2,9,16,9,25
row3,9,25,16,4
row4,16,9,1,9


In [18]:
df.apply(sum)

col1    14
col2    14
col3    12
col4    13
dtype: int64

In [19]:
df.apply(sum, axis=1)

row1    13
row2    15
row3    14
row4    11
dtype: int64

## .applymap()

針對單一個element使用指定的函式做運算。

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.applymap.html

In [20]:
import math

In [21]:
df.applymap(lambda x: math.sin(x))

Unnamed: 0,col1,col2,col3,col4
row1,-0.756802,0.909297,-0.756802,0.14112
row2,0.14112,-0.756802,0.14112,-0.958924
row3,0.14112,-0.958924,-0.756802,0.909297
row4,-0.756802,0.14112,0.841471,0.14112


In [22]:
df.apply(lambda x: math.sin(x))

TypeError: ignored

In [24]:
df.apply(lambda x: np.sin(x))

Unnamed: 0,col1,col2,col3,col4
row1,-0.756802,0.909297,-0.756802,0.14112
row2,0.14112,-0.756802,0.14112,-0.958924
row3,0.14112,-0.958924,-0.756802,0.909297
row4,-0.756802,0.14112,0.841471,0.14112


In [43]:
df.apply({'col1':np.sin, 'col2':np.cos, 'col3':lambda x:x**2, 'col4':lambda x:np.sqrt(x)})

Unnamed: 0_level_0,col1,col1,col2,col3,col4
Unnamed: 0_level_1,sin,cos,cos,<lambda>,<lambda>
row1,-0.756802,-0.653644,-0.416147,16,1.732051
row2,0.14112,-0.989992,-0.653644,9,2.236068
row3,0.14112,-0.989992,0.283662,16,1.414214
row4,-0.756802,-0.653644,-0.989992,1,1.732051


In [27]:
df.apply([np.sum, np.mean])

Unnamed: 0,col1,col2,col3,col4
sum,14.0,14.0,12.0,13.0
mean,3.5,3.5,3.0,3.25


## .agg()

可以一次多種彙總方式，也可以針對不同的欄位用不同的方法彙總。

.agg()是.aggregate()的別名。

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.agg.html

In [28]:
df.agg([np.sum, np.mean])

Unnamed: 0,col1,col2,col3,col4
sum,14.0,14.0,12.0,13.0
mean,3.5,3.5,3.0,3.25


In [45]:
np.random.seed(987)
names = np.random.choice(['A','B','C'], 20)
score1 = np.random.randint(1, 11, 20)
score2 = np.random.randint(1, 11, 20)
score3 = np.random.randint(1, 11, 20)

In [46]:
df = pd.DataFrame({'names':names, 'score1':score1, 'score2':score2, 'score3':score3})

In [47]:
df.groupby('names').apply({'score1':np.sum, 'score2':np.mean, 'score3':np.std})

TypeError: ignored

In [48]:
df.groupby('names').agg({'score1':np.sum, 'score2':np.mean, 'score3':np.std})

Unnamed: 0_level_0,score1,score2,score3
names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,15,3.666667,2.645751
B,33,7.333333,3.311596
C,66,5.454545,2.467977
