### 앙상블(ENSENMBLE) - Voting 방식
- 여러개의 모델 또는 동일 모델과 샘플링 데이터셋으로 병렬학습 진행하는 방식
- Voting 방식/기법
    * 구성 : 동일 데이터셋 + 학습 알고리즘이 다른 모델
    * 결과 도출 : Hard(직접), Soft(간접)
    

In [9]:
# 데이터 준비
import pandas as pd
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 모듈로딩
cancer = load_breast_cancer(as_frame=True)
print(cancer.keys())

print(cancer['target_names'])
print(f'feature names: {cancer["feature_names"]}')
print(f'{cancer["DESCR"]}')

featuredf = cancer['data']
target = cancer['target']
print(featuredf.shape, target.shape)



dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])
['malignant' 'benign']
feature names: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radi

- 유방암 판별 모델 구현 --> 피쳐 : 30개, 타겟 : 2개(2진분류)


In [30]:
from sklearn.tree import DecisionTreeClassifier
# 학습진행
X_train, X_test, y_train, y_test = train_test_split(featuredf, target, random_state=10, stratify=target)

# 학습 진행 할건데 앙상블의 Voting 방식으로 진행
# 데이터셋은 동일
# 알고리즘 모델 : KNeighbors, LogisticRegression, DecisionTreeClassifier

# 모델 인스턴스 생성
kmodel = KNeighborsClassifier()
Lmodel = LogisticRegression(solver='liblinear')
DTmodel = DecisionTreeClassifier(random_state=10)

# 보팅 인스턴스 생성
Vmodel = VotingClassifier(estimators=[('kmodel', kmodel), ('DTmodel', DTmodel), ('Lmodel', Lmodel)], voting='hard')

V_smodel = VotingClassifier(estimators=[('kmodel', kmodel), ('DTmodel', DTmodel), ('Lmodel', Lmodel)], voting='soft')

# 학습 진행
Vmodel.fit(X_train.values, y_train.values) # Hard 직접
V_smodel.fit(X_train.values, y_train.values) # soft 간접 모델 마다의 확률값을 더함

# 모델 파라미터 확인
print(f'[Vmodel.classes_] : {Vmodel.classes_}')
print(f'[Vmodel.estimators_] : {Vmodel.estimators_}')
print(f'[Vmodel.named_estimators_] : {Vmodel.named_estimators_}')
print('-'*100)
print(f'[Vmodel.n_features_in_] : {Vmodel.n_features_in_}')
print('-'*100)


# 성능 확인 --> train과 validation 이 없어서 train 과 test를 함
train_score = Vmodel.score(X_train.values, y_train.values)
test_score = Vmodel.score(X_test.values, y_test.values)

vstrain_score = V_smodel.score(X_train.values, y_train.values)
vstest_score = V_smodel.score(X_test.values, y_test.values)

print('Hard')
print(train_score)
print(test_score)
print('Soft')
print(vstrain_score)
print(vstest_score)


[Vmodel.classes_] : [0 1]
[Vmodel.estimators_] : [KNeighborsClassifier(), DecisionTreeClassifier(random_state=10), LogisticRegression(solver='liblinear')]
[Vmodel.named_estimators_] : {'kmodel': KNeighborsClassifier(), 'DTmodel': DecisionTreeClassifier(random_state=10), 'Lmodel': LogisticRegression(solver='liblinear')}
----------------------------------------------------------------------------------------------------
[Vmodel.n_features_in_] : 30
----------------------------------------------------------------------------------------------------
Hard
0.9741784037558685
0.951048951048951
Soft
0.9882629107981221
0.951048951048951


In [None]:
#### ERROR MASSAGE AttributeError: 'Flags' object has no attribute 'c_contiguous'
### -> ndarray 관련 에러. DataFrame을 array 값으로 바꿔주고 fit을 해야함.


