# R语言中的Apply家族函数





![apply](http://blog.fens.me/wp-content/uploads/2016/04/apply.png)

`Apply`系列函数作为 R 语言中用于替换`for`和`while`循环的利器，在批量循环计算的过程中发挥着非常重要的作用.

apply的家族函数，包括`apply`, `sapply`, `tapply`, `mapply`, `lapply`, `rapply`, `vapply`, `eapply`等
```r
Usage:
apply(X, MARGIN, FUN, ...)

Arguments:

X: 矩阵、数据框、数组、列表

MARGIN: 按行或列进行循环计算，如果是多维的还可以设置dimnames

FUN: 自定义的调用函数

… : 更多参数，可选

```

apply函数是最常用的代替`for`循环的函数。`apply`函数可以对矩阵、数据框、数组、列表（**X**），按行或列(**margin**)进行循环计算，对子元素进行迭代，并把子元素以参数传递的形式给自定义的**FUN**函数中，并以返回计算结果。

## lapply

lapply函数是一个最基础循环操作函数之一，用来对`list`、`data.frame`数据集进行循环，并返回和 X 长度同样的**list 结构**作为结果集，通过lapply的开头的第一个字母’l’就可以判断返回结果集的类型。

```r
lapply(X, FUN, ...)
```
If` FUN` requires additional arguments, you pass them **after**you've specified X and FUN (...).

如果 FUN 定义的函数里需要规定参数，那么就将这个参数的设定放到省略号`...`的位置。

In [4]:
# Example 01
# 定义一个 list
nyc<- list(pop=8405837,
          boroughs = c("Manhattan","Bronx","Brooklyn","Queens","Staten Island"),
          capital = FALSE)

# for 语句
for(info in nyc){
    print (class(info))
}

# lapply
lapply(nyc,class) 
# x = nyc
# fun = class()
# 结果会以 list 的结果呈现。
unlist(lapply(nyc,class))

[1] "numeric"
[1] "character"
[1] "logical"


In [13]:
# Example 02
# 建一个向量
cities<- c("New york","paris","london","tokyo","rio de janeiro","cape town")

# for 语句
num_chars<- c()  # 先建一个空向量
for(i in 1:length(cities)){
    num_chars[i]<- nchar(cities[i]) # 将值赋予给空向量里对应的格子
}
num_chars

# lapply语句
lapply(cities,nchar) # 返回的是list

unlist(lapply(cities,nchar))# 返回的是 vector

In [18]:
# Example 03
oil_prices <- list(2.37, 2.49, 2.18, 2.22, 2.47, 2.32) 
triple <- function(x){
    3 * x
}
result<- lapply(oil_prices,triple)
str(result)
unlist(result)
#========================================================
multiply<- function(x,times){
    x * times
}
# lapply( x, fun, ...)这里的省略号用来补充不同的参数设置
# 比如说这里就规定了函数 multiply 的 times 参数
triple33<- lapply(oil_prices,multiply, times = 3) 
unlist(triple33)
fivetimes<- lapply(oil_prices,multiply,times = 5)
unlist(fivetimes)

List of 6
 $ : num 7.11
 $ : num 7.47
 $ : num 6.54
 $ : num 6.66
 $ : num 7.41
 $ : num 6.96


In [None]:
##===================
#     练习
##===================
# The vector pioneers has already been created for you
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")

# Split names from birth year
split_math <- strsplit(pioneers, split = ":")

# Convert to lowercase strings: split_low
split_low<- lapply(split_math,tolower)

# Take a look at the structure of split_low
str(split_low)

写两个函数分别提取 `split_low`里的第一列和第二列

In [None]:
#============================================
# Code from previous exercise:
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split_low <- lapply(split, tolower)
# Write function select_first()
select_first <- function(x) {
  x[1]
}

# Apply select_first() over split_low: names
names<- lapply(split_low,select_first)

# Write function select_second()
select_second<- function(x){
  x[2]
}

# Apply select_second() over split_low: years
years<- lapply(split_low,select_second)

这种提取一行就写一个函数的效率太慢了，可以根据上面 `triple`变`multiply`的方法，通过**把常数变成一个参数**；

In [3]:
# ===========================================
# Code from previous exercise:
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split_low <- lapply(split, tolower)
# Generic select function
select_el <- function(x, index) { # 用给 index 赋值的方法来替代多次重复写程序
  x[index]
}

# Use lapply() twice on split_low: names and years
names<-lapply(split_low,select_el,index=1)
years<-lapply(split_low,select_el,index=2)
names
years


##### 注意
函数 `strsplit()`：用于以特定格式分割字符串；

strsplit(x, split, fixed = FALSE, perl = FALSE, useBytes = FALSE)

x: 分割的对象，必须是 character
split: 如果是 "" 则将每个字符分割,上述例子是以`:`分割的

函数`tolower()`: 改变字母大小写

 - tolower(x) 小写

 - toupper(x) 大写

## sapply

`sapply` 是 `lapply`的简化版；

`sapply`增加了2个参数`simplify`和`USE.NAMES`，主要就是让输出看起来更友好，**返回值为向量**，而不是list对象。
```r
sapply(X, FUN, ..., simplify=TRUE, USE.NAMES = TRUE)
```

   - X: 数组、矩阵、数据框
   - FUN: 自定义的调用函数
   - …: 更多参数，可选
   - simplify: 是否数组化，当值array时，输出结果按数组进行分组
   - USE.NAMES: 如果X为字符串，TRUE设置字符串为数据名，FALSE不设置


In [6]:
# 还是以 cities 为例子: 
cities <- c("New York", "Paris", "London", "Tokyo", 
              "Rio de Janeiro", "Cape Town") 
unlist(lapply(cities,nchar))

sapply(cities,nchar) # use.names默认是TRUE,所以会把字符串设为数据名
sapply(cities,nchar,USE.NAMES = FALSE)


In [14]:
# Example 02 
# 筛选出每个城市单词里最小和最大的字母（按字母表顺序）

first_and_last<- function(name){
    name<- gsub(" ","",name)
    letters<- strsplit(name,split="")[[1]] #返回的结果是list ,所以要把内容取出来
    c(first= min(letters),last= max(letters)) #等到一个两个元素的向量 min和max
}

first_and_last("New York")

# 使用 sapply 完成对list 每一个元素的检测；
sapply(cities,first_and_last,USE.NAMES = TRUE )
sapply(cities,first_and_last,USE.NAMES = FALSE)

Unnamed: 0,New York,Paris,London,Tokyo,Rio de Janeiro,Cape Town
first,e,a,d,k,a,a
last,Y,s,o,y,R,w


0,1,2,3,4,5,6
first,e,a,d,k,a,a
last,Y,s,o,y,R,w


关于 `gsub ()`
```r
gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
     fixed = FALSE, useBytes = FALSE)
```

如果`simplify = FALSE`和`USE.NAMES= FALSE`，那么 **sapply 函数就等于 lapply函数**了。

In [26]:
# simplify:输出结果按数组进行分组 
# 有些结果 sapply 也无法简化
unique_letters <- function(name) { 
    name <- gsub(" ", "", name) 
    letters <- strsplit(name, split = "")[[1]] 
    unique(letters) 
  } 
 unique_letters("London") 

#当使用 lapply()
lapply(cities,unique_letters)

# 使用 sapply()
sapply(cities,unique_letters)
identical(lapply(cities,unique_letters),sapply(cities,unique_letters))

## Vapply

`vapply`类似于`sapply`，提供了`FUN.VALUE`参数，用来控制**返回值的行名**
```r
vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
```
   - X:数组、矩阵、数据框
   - FUN: 自定义的调用函数
   - FUN.VALUE: 定义返回值的行名row.names
   - …: 更多参数，可选
   - USE.NAMES: 如果X为字符串，TRUE设置字符串为数据名，FALSE不设置


#### sapply() vs. vapply()

In [43]:
# 例子 01
cities <- c("New York", "Paris", "London", "Tokyo", 
              "Rio de Janeiro", "Cape Town") 
# 使用 sapply
sapply(cities,nchar)

# 使用 vapply()
vapply(cities, nchar,FUN.VALUE = numeric(1)) # 

In [47]:
# 还是用上面那个 first_and_last 的例子
 first_and_last <- function(name) { 
    name <- gsub(" ", "", name) 
    letters <- strsplit(name, split = "")[[1]] 
    return(c(first = min(letters), last = max(letters),say="yes")) 
  }
sapply(cities,first_and_last)

# vapply
vapply(cities,first_and_last,FUN.VALUE = character(3))
# 这里会输出三个值（first,last,say）所以是character(3)
# ？？？？actually i don't really get this 

Unnamed: 0,New York,Paris,London,Tokyo,Rio de Janeiro,Cape Town
first,e,a,d,k,a,a
last,Y,s,o,y,R,w
say,yes,yes,yes,yes,yes,yes


Unnamed: 0,New York,Paris,London,Tokyo,Rio de Janeiro,Cape Town
first,e,a,d,k,a,a
last,Y,s,o,y,R,w
say,yes,yes,yes,yes,yes,yes


In [41]:
?vapply


In [44]:
?numeric