### Meta-model Stacking：  
在这种方法中，我们在平均基础模型上添加Meta-model，并使用这些基模型的out-of-folds预测来训练我们的Meta-model。  
训练部分的步骤如下：  
1、将整个训练集分解成两个不相交的集合（这里是train和.holdout）。   
2、在第一部分（train）上训练几个基本模型。   
3、在第二个部分（holdout）上测试这些基本模型。   
4、使用(3)中的预测（称为 out-of-fold 预测）作为输入，并将正确的标签（目标变量）作为输出来训练更高层次的学习模型称为元模型。   
前三个步骤是迭代完成的。例如，如果我们采取5倍的fold，我们首先将训练数据分成5次。然后我们会做5次迭代。在每次迭代中，我们训练每个基础模型4倍，并预测剩余的fold（holdout fold）。

构建了一个Stacking averaged Models的类：

In [1]:
import numpy as np
np.random.seed(7)
from sklearn.base import BaseEstimator,RegressorMixin,clone,ClassifierMixin
from sklearn.model_selection import KFold

class StackingAveragedModels:
    def __init__(self, base_models, meta_model, n_folds=5):
        self.base_models = base_models
        self.meta_model = meta_model
        self.n_folds = n_folds
   
    # 遍历拟合原始模型
    def fit(self, X, y):
        self.base_models_ = [list() for x in self.base_models]
        self.meta_model_ = clone(self.meta_model)
        kfold = KFold(n_splits=self.n_folds, shuffle=True, random_state=156)
        
        # 得到基模型，并用基模型对out_of_fold做预估，为学习stacking的第2层做数据准备
        out_of_fold_predictions = np.zeros((X.shape[0], len(self.base_models)))
        for i, model in enumerate(self.base_models):
            for train_index, holdout_index in kfold.split(X, y):
                instance = clone(model)
                self.base_models_[i].append(instance)
                instance.fit(X[train_index], y[train_index])
                y_pred = instance.predict(X[holdout_index])
                out_of_fold_predictions[holdout_index, i] = y_pred
                
        self.meta_model_.fit(out_of_fold_predictions, y)
        
        return self
   
    # 做stacking预估
    def predict(self, X):
        #如果是回归问题
        if isinstance(self.meta_model,RegressorMixin):
            meta_features = np.column_stack([
            np.column_stack([model.predict(X) for model in base_models]).mean(axis=1)
            for base_models in self.base_models_ ])
            return self.meta_model_.predict(meta_features)
        #如果是分类问题
        elif isinstance(self.meta_model,ClassifierMixin):
            meta_features=[]
            for i in self.base_models_:
                l=np.array([np.argmax(np.bincount(k)) for k in 
                            (np.column_stack(j.predict(X) for j in i))])
                meta_features.append(l)
            return self.meta_model_.predict(np.array(meta_features).T)
        else:
            raise Exception('Please check the model type')      

### 模型融合
- 回归问题

In [2]:
from sklearn.datasets import load_boston,load_iris
from sklearn.ensemble import RandomForestRegressor,RandomForestClassifier
from sklearn.svm import SVR,SVC
from sklearn.linear_model import LinearRegression,LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error,accuracy_score
import warnings
warnings.filterwarnings('ignore')

X,y=load_boston().data,load_boston().target
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)

svm=SVR()
lr=LinearRegression()
randomForest=RandomForestRegressor()


Models=StackingAveragedModels(base_models=(svm,lr,randomForest),
                              meta_model=lr)
Models.fit(X_train,y_train)
y_pre=Models.predict(X_test)

mean_squared_error(y_test,y_pre)

15.892954700262408

- 分类问题

In [3]:


X,y=load_iris().data,load_iris().target
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)

svm=SVC().fit(X_train,y_train)
lr=LogisticRegression().fit(X_train,y_train)
randomForest=RandomForestClassifier().fit(X_train,y_train)



Models=StackingAveragedModels(base_models=(svm,lr,randomForest),
                              meta_model=LogisticRegression())
Models.fit(X_train,y_train)
y_pre=Models.predict(X_test)
accuracy_score(y_test,y_pre)

0.9555555555555556