Learning from subsamples is one way to test the robustness of a machine learning algorithm and improve the accuracy of a learning algorithm. It is a simple approach for accuracy estimation and provides the bias or variance of the estimator. One such approach is bootstrapping. This is the practice of learning and estimating properties from subsamples of the learning data, as well as iteratively improving the performance. This section will explain bootstrapping with respect to R programming.

从子样本中学习是检验机器学习算法强度和提高学习算法精度的一种方法。它是一种简单的精度估计方法，并提供了估计后的偏差或方差。其中一种方法是bootstrapping。这是从学习数据的子样本中学习和估计属性的实践，并迭代地改进性能。本节将解释有关R编程的bootstrapping。

1. First, we need our dataset, and as mentioned earlier, we use the iris data for our purpose,,as follow:

  本教程使用iris数据集：

In [1]:
data(iris)
myData = iris[c(1:150),]

2. For bootstrapping, create a function that will be bootstapped across the dataset. The function is an SVM classifier, and it returns the number of true positive(TP), which serves as your performance value, as follows(this has been selected for simplicity; a more complete measurement of performance discussed in the upcoming recipes can be used as well):

  对于bootstrapping，创建一个函数贯穿整个数据集，其功能是一个SVM分类器，返回true positive（TP），TP作为性能值（只是简单的挑选，更复杂的挑选在后面会讲述到）：

In [9]:
#install.packages("e1071")
library(e1071)

Installing package into 'C:/Users/acer/Documents/R/win-library/3.5'
(as 'lib' is unspecified)


package 'e1071' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\acer\AppData\Local\Temp\Rtmpkp5eH8\downloaded_packages


"package 'e1071' was built under R version 3.5.3"

In [10]:
corPred <- function(data,label,indices){
    train=myData[indices,] #indexes for training data 
    test=myData[-indices,] #indexes for test data
    testClass =test[,label] #assigns class labels(species)
    colnames(train)[ncol(train)]="Class"
    mySVM= svm(Class~.,data=train,cost=100,gamma=1)#learning model using SVM
    myPred =predict(mySVM,test) #prediction on test set
    TP=sum(myPred==testClass) #calculate True positives
    return(TP)
}

3. Now, write another function to bootstrap with the data and write the number of bootstraps to the function as input,as follows:
    
 然后，创建另一个包含数据的函数以及包含一系列bootstrap函数：

In [14]:
myboot <- function(d,label,iter){
    bootres=c()
    for(i in 1:iter){
        indices=sample(1:nrow(d),floor((2*nrow(d))/3)) #sample indexes样本指标
        res = corPred(d,label,indices)#runs corPred function
        bootres =c(bootres,res) #append result
    }
    return(list(BOOT.RES=bootres,BOOT.MEAN = mean(bootres),BOOP.SD = sd(bootres)))
}

Your code contains a unicode char which cannot be displayed in your
current locale and R will silently convert it to an escaped form when the
R kernel executes this code. This can lead to subtle errors if you use
such chars to do comparisons. For more information, please see
https://github.com/IRkernel/repr/wiki/Problems-with-unicode-on-windows

4. Then, run your bootstrap function as follows:

 最后，运行bootstrap函数：

In [12]:
res.bs <- myboot(d=myData,label="Species",iter=10000)

In [13]:
res.bs