### sklearn.ensemble.GradientBoostingClassifier
* _class_ sklearn.ensemble.GradientBoostingClassifier(_*_, _loss='log_loss'_, _learning_rate=0.1_, _n_estimators=100_, _subsample=1.0_, _criterion='friedman_mse'_, _min_samples_split=2_, _min_samples_leaf=1_, _min_weight_fraction_leaf=0.0_, _max_depth=3_, _min_impurity_decrease=0.0_, _init=None_, _random_state=None_, _max_features=None_, _verbose=0_, _max_leaf_nodes=None_, _warm_start=False_, _validation_fraction=0.1_, _n_iter_no_change=None_, _tol=0.0001_, _ccp_alpha=0.0_)[[source]](https://github.com/scikit-learn/scikit-learn/blob/7db5b6a98/sklearn/ensemble/_gb.py#L851)[¶](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#sklearn.ensemble.GradientBoostingClassifier "Permalink to this definition")
Gradient Boosting for classification.

This algorithm builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage  `n_classes_`  regression trees are fit on the negative gradient of the loss function, e.g. binary or multiclass log loss. Binary classification is a special case where only a single regression tree is induced.

[`sklearn.ensemble.HistGradientBoostingClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html#sklearn.ensemble.HistGradientBoostingClassifier "sklearn.ensemble.HistGradientBoostingClassifier")  is a much faster variant of this algorithm for intermediate datasets (`n_samples  >=  10_000`).

Read more in the  [User Guide](https://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting).

In [7]:
import pandas as pd

feature_name_df = pd.read_csv("./datasets/features.txt",sep='\s+',
           header=None,names=['column_index','column_name'])

feature_name = feature_name_df.iloc[:, 1].values.tolist()
print(feature_name[:10])

['tBodyAcc-mean()-X', 'tBodyAcc-mean()-Y', 'tBodyAcc-mean()-Z', 'tBodyAcc-std()-X', 'tBodyAcc-std()-Y', 'tBodyAcc-std()-Z', 'tBodyAcc-mad()-X', 'tBodyAcc-mad()-Y', 'tBodyAcc-mad()-Z', 'tBodyAcc-max()-X']


In [8]:
def get_new_feature_name_df(old_feature_name_df):
    feature_dup_df = pd.DataFrame(data=old_feature_name_df.groupby('column_name').cumcount(),columns=['dup_cnt'])
    feature_dup_df = feature_dup_df.reset_index()
    new_feature_name_df = pd.merge(old_feature_name_df.reset_index(),feature_dup_df, how='outer')
    new_feature_name_df['column_name'] = new_feature_name_df[['column_name','dup_cnt']].apply(lambda x : x[0]+'_'+str(x[1])
                                                                                              if x[1]>0 else x[0], axis=1)
    new_feature_name_df = new_feature_name_df.drop(['index'], axis=1)

    return new_feature_name_df

In [9]:
import pandas as pd



def get_human_dataset():
    
    feature_name_df = pd.read_csv('./datasets/features.txt', sep='\s+',
                                                     header=None, names=['column_index', 'column_name'])
    new_feature_name_df = get_new_feature_name_df(feature_name_df)
    feature_name = new_feature_name_df.iloc[:, 1].values.tolist()
    
    X_train = pd.read_csv('./datasets/train/X_train.txt', sep='\s+', names=feature_name)
    X_test = pd.read_csv('./datasets/test/X_test.txt', sep='\s+', names=feature_name)
    
    y_train = pd.read_csv('./datasets/train/y_train.txt', sep='\s+', header=None, names=['action'])
    y_test = pd.read_csv('./datasets/test/y_test.txt', sep='\s+', header=None, names=['action'])
    
    return X_train, X_test, y_train, y_test

X_train, X_test, y_train, y_test = get_human_dataset()

In [4]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

gb_clf = GradientBoostingClassifier

X_train, X_test, y_train, y_test = get_human_dataset()

gb_clf = GradientBoostingClassifier(random_state=0)
gb_clf.fit(X_train, y_train)
gb_pred = gb_clf.predict(X_test)
accuracy = accuracy_score(y_test, gb_pred)

print('GBM 정확도:', round(accuracy, 4))

  y = column_or_1d(y, warn=True)


랜덤포레스트 정확도: 0.9389


In [5]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score

ab_clf = AdaBoostClassifier

X_train, X_test, y_train, y_test = get_human_dataset()

ab_clf = AdaBoostClassifier(random_state=0)
ab_clf.fit(X_train, y_train)
ab_pred = ab_clf.predict(X_test)
accuracy = accuracy_score(y_test, ab_pred)

print('adaboost 정확도:', round(accuracy, 4))

  y = column_or_1d(y, warn=True)


adaboost 정확도: 0.531


In [15]:
import xgboost as xgb
import matplotlib.pyplot as plt

X_train, X_test, y_train, y_test = get_human_dataset()

model = xgb.XGBClassifier() 
model.fit(X_train, y_train_1) 
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print('xgb 정확도:', round(accuracy, 4))

xgb 정확도: 0.0271


In [14]:
from sklearn.preprocessing import LabelEncoder
y_train_1 = (LabelEncoder().fit_transform(y_train))
# y_train

  y = column_or_1d(y, warn=True)
