# 疑难杂症：sklearn中的pipeline报错问题

scikit-learn 0.20.3，在python3.5以后都会报错

<div class="alert alert-danger alertdanger" style="margin-top: 20px">

TypeError: 'Pipeline' object is not subscriptable


![img_0201](.\img\0101.png)

原因

[changelog](https://scikit-learn.org/0.21/whats_new.html#sklearn-pipeline)

![img_0201](.\img\0102.png)

解决办法，升级相关库的版本


```
!pip install --upgrade scikit-learn

```



#### 版本声明

|名称|版本|简介|
|----|----|----|
|$python$|$3.7.3$|编程语言|
|$scikit-learn$|$0.22.1$|机器学习|
|$numpy$|$1.18.1$|数组运算|
|$scipy$|$1.4.1$|数学运算|
|$joblib$|$0.14.1$|不清楚|

## 例子

导入几个库

In [1]:
from sklearn import svm
from sklearn.datasets import make_classification
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline

In [2]:
# generate some data to play with
X, y = make_classification(n_informative=5, n_redundant=0, random_state=42)
X, y

(array([[ 0.61118028,  0.07396296, -0.49596905, ..., -0.51753365,
         -0.37339927, -0.70521074],
        [-0.55470506, -1.26634051, -1.03437283, ..., -0.05798395,
          0.07377011,  0.60247721],
        [ 0.72456704, -0.22624522,  1.28626861, ...,  1.06456868,
         -0.45374431,  0.44663973],
        ...,
        [ 1.25561121,  0.40561759,  1.5316888 , ...,  0.71500701,
          0.48056211,  0.40041203],
        [ 1.72707396, -0.00827807,  1.20562808, ...,  0.69476103,
          1.3238748 ,  0.93299664],
        [-0.47240735, -0.03014427,  1.7691167 , ..., -0.56770578,
          0.28012139,  0.3905229 ]]),
 array([0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0,
        1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,
        0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1,
        1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1,
        0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1]))

生成两种算法实例

In [3]:
# ANOVA SVM-C
anova_filter = SelectKBest(f_regression, k=5)
clf = svm.SVC(kernel='linear')

Pipeline

In [4]:
anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])

通过参数名设置参数

In [5]:
# You can set the parameters using the names issued
# For instance, fit using a k of 10 in the SelectKBest
# and a parameter 'C' of the svm
anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)

Pipeline(memory=None,
         steps=[('anova',
                 SelectKBest(k=10,
                             score_func=<function f_regression at 0x000002451623F1E0>)),
                ('svc',
                 SVC(C=0.1, break_ties=False, cache_size=200, class_weight=None,
                     coef0=0.0, decision_function_shape='ovr', degree=3,
                     gamma='scale', kernel='linear', max_iter=-1,
                     probability=False, random_state=None, shrinking=True,
                     tol=0.001, verbose=False))],
         verbose=False)

In [6]:
prediction = anova_svm.predict(X)
anova_svm.score(X, y)

0.83

In [7]:
anova_svm

Pipeline(memory=None,
         steps=[('anova',
                 SelectKBest(k=10,
                             score_func=<function f_regression at 0x000002451623F1E0>)),
                ('svc',
                 SVC(C=0.1, break_ties=False, cache_size=200, class_weight=None,
                     coef0=0.0, decision_function_shape='ovr', degree=3,
                     gamma='scale', kernel='linear', max_iter=-1,
                     probability=False, random_state=None, shrinking=True,
                     tol=0.001, verbose=False))],
         verbose=False)

In [8]:
type(anova_svm['anova'])

sklearn.feature_selection._univariate_selection.SelectKBest

报错，似乎svm用不了索引

In [11]:
# getting the selected features chosen by anova_filter
anova_svm['anova'].get_support()

array([False, False,  True,  True, False, False,  True,  True, False,
        True, False,  True,  True, False,  True, False,  True,  True,
       False, False])

In [12]:
# Another way to get selected features chosen by anova_filter
anova_svm.named_steps.anova.get_support()

array([False, False,  True,  True, False, False,  True,  True, False,
        True, False,  True,  True, False,  True, False,  True,  True,
       False, False])

In [13]:
# Indexing can also be used to extract a sub-pipeline.
sub_pipeline = anova_svm[:1]
sub_pipeline

Pipeline(memory=None,
         steps=[('anova',
                 SelectKBest(k=10,
                             score_func=<function f_regression at 0x000002451623F1E0>))],
         verbose=False)

In [14]:
coef = anova_svm[-1].coef_

In [15]:
anova_svm['svc'] is anova_svm[-1]

True

In [16]:
coef.shape

(1, 10)

In [17]:
sub_pipeline.inverse_transform(coef).shape

(1, 20)

In [18]:
import sys 
sys.version

'3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)]'

### 完整代码

In [None]:
from sklearn import svm
from sklearn.datasets import make_classification
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
# generate some data to play with
X, y = make_classification(
     n_informative=5, n_redundant=0, random_state=42)
#ANOVA SVM-C
anova_filter = SelectKBest(f_regression, k=5)
clf = svm.SVC(kernel='linear')
anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
#You can set the parameters using the names issued
#For instance, fit using a k of 10 in the SelectKBest
#and a parameter 'C' of the svm
anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)
#Pipeline(steps=[('anova', SelectKBest(...)), ('svc', SVC(...))])
prediction = anova_svm.predict(X)
print(anova_svm.score(X, y))
# getting the selected features chosen by anova_filter
print(anova_svm['anova'].get_support())
# Another way to get selected features chosen by anova_filter
print(anova_svm.named_steps.anova.get_support())
# Indexing can also be used to extract a sub-pipeline.
sub_pipeline = anova_svm[:1]
print(sub_pipeline)
Pipeline(steps=[('anova', SelectKBest(...))])
coef = anova_svm[-1].coef_
print(anova_svm['svc'] is anova_svm[-1])
print(coef.shape)
print(sub_pipeline.inverse_transform(coef).shape)


In [None]:
!pip install --upgrade scikit-learn