#**스마트폰 센서 데이터 기반 모션 분류**
# 단계2 : 기본 모델링


## 0.미션

* 데이터 전처리
    * 가변수화, 데이터 분할, NaN 확인 및 조치, 스케일링 등 필요한 전처리 수행
* 다양한 알고리즘으로 분류 모델 생성
    * 최소 4개 이상의 알고리즘을 적용하여 모델링 수행
    * 성능 비교
        * 각 모델의 성능을 관리하는 별도의 엑셀파일을 만들어 봅시다.
        * 성능 가이드 : Accuracy 0.900 ~

## 1.환경설정

* 세부 요구사항
    - 경로 설정 : 로컬 수행(Ananconda)
        * 제공된 압축파일을 다운받아 압축을 풀고
        * anaconda의 root directory(보통 C:/Users/< ID > 에 project3_1 폴더를 만들고, 복사해 넣습니다.
    - 기본적으로 필요한 라이브러리를 import 하도록 코드가 작성되어 있습니다.
        * 필요하다고 판단되는 라이브러리를 추가하세요.


In [1]:
import sklearn
print(sklearn.__version__)

1.4.2


### (1) 라이브러리 로딩

In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.auto import tqdm
import warnings
warnings.filterwarnings('ignore')

import joblib

# 필요한 라이브러리, 함수 로딩 ------------------
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import DecisionTreeRegressor
from lightgbm import LGBMClassifier
from lightgbm import LGBMRegressor
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import *

* 제공 함수 생성
    * 변수 중요도를 시각화할 수 있는 함수를 제공합니다.
    * 입력 :
        * importance : 트리모델의 변수 중요도(예: model.feature_importances_)
        * names : 변수 이름 목록(예 : x_train.columns
        * result_only  : 변수 중요도 순으로 데이터프레임만 return할지, 그래프도 포함할지 결정. False이면 결과 데이터프레임 + 그래프
        * topn : 중요도 상위 n개만 표시. all 이면 전체.
    * 출력 :
        * 중요도 그래프 : 중요도 내림차순으로 정렬
        * 중요도 데이터프레임 : 중요도 내림차순으로 정렬

In [None]:
# 변수의 특성 중요도 계산하기
def plot_feature_importance(importance, names, result_only = False, topn = 'all'):
    feature_importance = np.array(importance)
    feature_name = np.array(names)

    data={'feature_name':feature_name,'feature_importance':feature_importance}
    fi_temp = pd.DataFrame(data)

    #변수의 특성 중요도 순으로 정렬하기
    fi_temp.sort_values(by=['feature_importance'], ascending=False,inplace=True)
    fi_temp.reset_index(drop=True, inplace = True)

    if topn == 'all' :
        fi_df = fi_temp.copy()
    else :
        fi_df = fi_temp.iloc[:topn]

    #변수의 특성 중요도 그래프로 그리기
    if result_only == False :
        plt.figure(figsize=(10,20))
        sns.barplot(x='feature_importance', y='feature_name', data = fi_df)

        plt.xlabel('importance')
        plt.ylabel('feature name')
        plt.grid()

    return fi_df

### (2) 데이터 불러오기

* 주어진 데이터셋
    * data01_train.csv : 학습 및 검증용
    * data01_test.csv : 테스트용

* 세부 요구사항
    * 칼럼 삭제 : data01_train.csv와 data01_test.csv 에서 'subject' 칼럼은 불필요하므로 삭제합니다.

#### 1) 데이터로딩

In [8]:
data01 = pd.read_csv('data01_train.csv')

In [10]:
data02 = pd.read_csv('data01_test.csv')

In [12]:
drop_cols = ['subject']
data01.drop(columns = drop_cols,  axis=1,inplace = True)
data02.drop(columns = drop_cols,  axis=1,inplace = True)

#### 2) 기본 정보 조회

In [14]:
data01.info()
data02.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5881 entries, 0 to 5880
Columns: 562 entries, tBodyAcc-mean()-X to Activity
dtypes: float64(561), object(1)
memory usage: 25.2+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1471 entries, 0 to 1470
Columns: 562 entries, tBodyAcc-mean()-X to Activity
dtypes: float64(561), object(1)
memory usage: 6.3+ MB


In [16]:
data01.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
tBodyAcc-mean()-X,5881.0,0.274811,0.067614,-0.503823,0.262919,0.277154,0.288526,1.000000
tBodyAcc-mean()-Y,5881.0,-0.017799,0.039422,-0.684893,-0.024877,-0.017221,-0.010920,1.000000
tBodyAcc-mean()-Z,5881.0,-0.109396,0.058373,-1.000000,-0.121051,-0.108781,-0.098163,1.000000
tBodyAcc-std()-X,5881.0,-0.603138,0.448807,-1.000000,-0.992774,-0.943933,-0.242130,1.000000
tBodyAcc-std()-Y,5881.0,-0.509815,0.501815,-0.999844,-0.977680,-0.844575,-0.034499,0.916238
...,...,...,...,...,...,...,...,...
"angle(tBodyGyroMean,gravityMean)",5881.0,0.009340,0.608190,-1.000000,-0.481718,0.011448,0.499857,0.998702
"angle(tBodyGyroJerkMean,gravityMean)",5881.0,-0.007099,0.476738,-1.000000,-0.373345,-0.000847,0.356236,0.996078
"angle(X,gravityMean)",5881.0,-0.491501,0.509069,-1.000000,-0.811397,-0.709441,-0.511330,0.977344
"angle(Y,gravityMean)",5881.0,0.059299,0.297340,-1.000000,-0.018203,0.182893,0.248435,0.478157


In [14]:
data02.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
tBodyAcc-mean()-X,1471.0,0.273198,0.079989,-1.000000,0.263787,0.277322,0.288058,0.631510
tBodyAcc-mean()-Y,1471.0,-0.017281,0.045957,-1.000000,-0.024792,-0.017187,-0.010238,0.359587
tBodyAcc-mean()-Z,1471.0,-0.108123,0.049082,-0.418354,-0.120733,-0.108124,-0.096606,0.543939
tBodyAcc-std()-X,1471.0,-0.614634,0.448480,-0.999717,-0.992669,-0.952426,-0.245405,0.899922
tBodyAcc-std()-Y,1471.0,-0.515427,0.506094,-0.999873,-0.979082,-0.867309,-0.030639,0.782590
...,...,...,...,...,...,...,...,...
"angle(tBodyGyroMean,gravityMean)",1471.0,0.006272,0.608954,-0.995222,-0.485998,-0.005036,0.518184,0.994366
"angle(tBodyGyroJerkMean,gravityMean)",1471.0,-0.001510,0.483028,-0.969066,-0.380300,0.002408,0.374583,0.979522
"angle(X,gravityMean)",1471.0,-0.481737,0.522714,-0.999380,-0.814060,-0.708911,-0.486534,1.000000
"angle(Y,gravityMean)",1471.0,0.055771,0.298124,-0.995073,-0.017413,0.178814,0.248126,0.432496


In [15]:
data01.shape

(5881, 562)

In [17]:
data02.shape

(1471, 562)

## **2. 데이터 전처리**

* 가변수화, 데이터 분할, NaN 확인 및 조치, 스케일링 등 필요한 전처리를 수행한다.


In [22]:
data01.isna().sum()

tBodyAcc-mean()-X                       0
tBodyAcc-mean()-Y                       0
tBodyAcc-mean()-Z                       0
tBodyAcc-std()-X                        0
tBodyAcc-std()-Y                        0
                                       ..
angle(tBodyGyroJerkMean,gravityMean)    0
angle(X,gravityMean)                    0
angle(Y,gravityMean)                    0
angle(Z,gravityMean)                    0
Activity                                0
Length: 562, dtype: int64

In [24]:
data02.isna().sum()

tBodyAcc-mean()-X                       0
tBodyAcc-mean()-Y                       0
tBodyAcc-mean()-Z                       0
tBodyAcc-std()-X                        0
tBodyAcc-std()-Y                        0
                                       ..
angle(tBodyGyroJerkMean,gravityMean)    0
angle(X,gravityMean)                    0
angle(Y,gravityMean)                    0
angle(Z,gravityMean)                    0
Activity                                0
Length: 562, dtype: int64

In [26]:
data01.head().T

Unnamed: 0,0,1,2,3,4
tBodyAcc-mean()-X,0.288508,0.265757,0.278709,0.289795,0.394807
tBodyAcc-mean()-Y,-0.009196,-0.016576,-0.014511,-0.035536,0.034098
tBodyAcc-mean()-Z,-0.103362,-0.098163,-0.108717,-0.150354,0.091229
tBodyAcc-std()-X,-0.988986,-0.989551,-0.99772,-0.231727,0.088489
tBodyAcc-std()-Y,-0.962797,-0.994636,-0.981088,-0.006412,-0.106636
...,...,...,...,...,...
"angle(tBodyGyroJerkMean,gravityMean)",0.07279,0.771524,0.021528,-0.072944,-0.887846
"angle(X,gravityMean)",-0.60112,0.345205,-0.833564,-0.695819,-0.705029
"angle(Y,gravityMean)",0.331298,-0.769186,0.202434,0.287154,0.264952
"angle(Z,gravityMean)",0.165163,-0.147944,-0.032755,0.111388,0.137758


### (1) 데이터 분할1 : x, y

* 세부 요구사항
    - x, y로 분할합니다.

In [20]:
target = 'Activity'
x = data01.drop(target, axis = 1)
y = data01.loc[:, target]

x1 = data02.drop(target, axis = 1)
y1 = data02.loc[:, target]

### (2) 데이터분할2 : train, validation

* 세부 요구사항
    - train : val = 8 : 2 혹은 7 : 3
    - random_state 옵션을 사용하여 다른 모델과 비교를 위해 성능이 재현되도록 합니다.

In [22]:
from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size = 0.2,random_state = 1)
x_test, x_val2, y_test, y_val2 = train_test_split(x1, y1, test_size = 0.2,random_state = 1)

### (3) 스케일링


* 세부 요구사항
    - 스케일링을 필요로 하는 알고리즘 사용을 위해서 코드 수행
    - min-max 방식 혹은 standard 방식 중 한가지 사용.

In [24]:
# 모듈 불러오기
from sklearn.preprocessing import MinMaxScaler,StandardScaler
# 정규화
scaler = MinMaxScaler()
col_names = list(x_train)
x_train = scaler.fit_transform(x_train)
x_val = scaler.transform(x_val)

x_train = pd.DataFrame(x_train, columns = col_names)
x_val = pd.DataFrame(x_val, columns = col_names)

## **3. 기본 모델링**



* 세부 요구사항
    - 최소 5개 이상의 알고리즘을 적용하여 모델링을 수행한다.
    - 각 알고리즘 별로 다음 중 몇가지를 시도하며 성능을 비교한다.

### (1) 모델1

In [34]:
#knn
model1 = KNeighborsClassifier()
model1.fit(x_train, y_train)
ypred = model1.predict(x_val)
print(accuracy_score(y_val, ypred))
print(confusion_matrix(y_val, ypred))
print(classification_report(y_val, ypred))

0.9549702633814783
[[220   1   0   0   1   0]
 [  1 166  31   0   0   0]
 [  0  14 221   0   0   0]
 [  0   0   0 192   0   0]
 [  0   0   0   2 152   2]
 [  0   0   0   1   0 173]]
                    precision    recall  f1-score   support

            LAYING       1.00      0.99      0.99       222
           SITTING       0.92      0.84      0.88       198
          STANDING       0.88      0.94      0.91       235
           WALKING       0.98      1.00      0.99       192
WALKING_DOWNSTAIRS       0.99      0.97      0.98       156
  WALKING_UPSTAIRS       0.99      0.99      0.99       174

          accuracy                           0.95      1177
         macro avg       0.96      0.96      0.96      1177
      weighted avg       0.96      0.95      0.95      1177



### (2) 모델2

In [36]:
#로지스틱회귀
model2 = LogisticRegression()
model2.fit(x_train,y_train)
ypred2 = model2.predict(x_val)
print(accuracy_score(y_val, ypred2))
print(confusion_matrix(y_val, ypred2))
print(classification_report(y_val, ypred2))

0.9830076465590484
[[222   0   0   0   0   0]
 [  0 191   7   0   0   0]
 [  0  11 224   0   0   0]
 [  0   0   0 192   0   0]
 [  0   0   0   2 154   0]
 [  0   0   0   0   0 174]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       222
           SITTING       0.95      0.96      0.95       198
          STANDING       0.97      0.95      0.96       235
           WALKING       0.99      1.00      0.99       192
WALKING_DOWNSTAIRS       1.00      0.99      0.99       156
  WALKING_UPSTAIRS       1.00      1.00      1.00       174

          accuracy                           0.98      1177
         macro avg       0.98      0.98      0.98      1177
      weighted avg       0.98      0.98      0.98      1177



### (3) 모델3

In [39]:
#rdf
model3 = RandomForestClassifier(max_depth=5, n_estimators=100, random_state=1)
model3.fit(x_train, y_train)
ypred3 = model3.predict(x_val)
print(confusion_matrix(y_val, ypred3))
print(classification_report(y_val, ypred3))

[[220   0   0   0   0   2]
 [  0 186  12   0   0   0]
 [  0  19 216   0   0   0]
 [  0   0   0 182   5   5]
 [  0   0   0  10 136  10]
 [  0   0   0   4   3 167]]
                    precision    recall  f1-score   support

            LAYING       1.00      0.99      1.00       222
           SITTING       0.91      0.94      0.92       198
          STANDING       0.95      0.92      0.93       235
           WALKING       0.93      0.95      0.94       192
WALKING_DOWNSTAIRS       0.94      0.87      0.91       156
  WALKING_UPSTAIRS       0.91      0.96      0.93       174

          accuracy                           0.94      1177
         macro avg       0.94      0.94      0.94      1177
      weighted avg       0.94      0.94      0.94      1177



### (4) 모델4

In [40]:
#decison tree
model4 = DecisionTreeClassifier(max_depth=5)
model4.fit(x_train, y_train)
ypred4 = model4.predict(x_val)
print(confusion_matrix(y_val, ypred4))
print(classification_report(y_val, ypred4))

[[222   0   0   0   0   0]
 [  0 182  16   0   0   0]
 [  0  25 210   0   0   0]
 [  0   0   0 179   2  11]
 [  0   0   0  15 127  14]
 [  0   0   0  15   6 153]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       222
           SITTING       0.88      0.92      0.90       198
          STANDING       0.93      0.89      0.91       235
           WALKING       0.86      0.93      0.89       192
WALKING_DOWNSTAIRS       0.94      0.81      0.87       156
  WALKING_UPSTAIRS       0.86      0.88      0.87       174

          accuracy                           0.91      1177
         macro avg       0.91      0.91      0.91      1177
      weighted avg       0.91      0.91      0.91      1177



### (5) 모델5

In [41]:
#SVM(Support Vector Machine)
model5 = SVC(kernel='linear', C=3)
model5.fit(x_train, y_train)
ypred5 = model5.predict(x_val)
print(accuracy_score(y_val, ypred5))
print(confusion_matrix(y_val, ypred5))
print(classification_report(y_val, ypred5))

0.9830076465590484
[[222   0   0   0   0   0]
 [  0 193   5   0   0   0]
 [  0  15 220   0   0   0]
 [  0   0   0 192   0   0]
 [  0   0   0   0 156   0]
 [  0   0   0   0   0 174]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       222
           SITTING       0.93      0.97      0.95       198
          STANDING       0.98      0.94      0.96       235
           WALKING       1.00      1.00      1.00       192
WALKING_DOWNSTAIRS       1.00      1.00      1.00       156
  WALKING_UPSTAIRS       1.00      1.00      1.00       174

          accuracy                           0.98      1177
         macro avg       0.98      0.99      0.98      1177
      weighted avg       0.98      0.98      0.98      1177



## 4.성능비교

* 세부 요구사항
    - 각 모델에 대해서 test 데이터로 성능 측정후 비교
    

In [53]:
#knn
model6 = KNeighborsClassifier()
model6.fit(x_test, y_test)
ypred6 = model6.predict(x_val2)
print(accuracy_score(y_val2, ypred6))
print(confusion_matrix(y_val2, ypred6))
print(classification_report(y_val2, ypred6))


0.9457627118644067
[[53  0  0  0  0  0]
 [ 0 40  9  0  0  0]
 [ 0  3 51  0  0  0]
 [ 0  0  0 52  0  0]
 [ 0  0  0  2 38  0]
 [ 0  0  0  1  1 45]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00        53
           SITTING       0.93      0.82      0.87        49
          STANDING       0.85      0.94      0.89        54
           WALKING       0.95      1.00      0.97        52
WALKING_DOWNSTAIRS       0.97      0.95      0.96        40
  WALKING_UPSTAIRS       1.00      0.96      0.98        47

          accuracy                           0.95       295
         macro avg       0.95      0.94      0.95       295
      weighted avg       0.95      0.95      0.95       295



In [57]:
#Logistic Regression
model7 = LogisticRegression()
model7.fit(x_test,y_test)
ypred7 = model7.predict(x_val2)
print(accuracy_score(y_val2, ypred7))
print(confusion_matrix(y_val2, ypred7))
print(classification_report(y_val2, ypred7))

0.9661016949152542
[[53  0  0  0  0  0]
 [ 0 43  6  0  0  0]
 [ 0  2 51  1  0  0]
 [ 0  0  0 52  0  0]
 [ 0  0  0  0 40  0]
 [ 0  0  0  1  0 46]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00        53
           SITTING       0.96      0.88      0.91        49
          STANDING       0.89      0.94      0.92        54
           WALKING       0.96      1.00      0.98        52
WALKING_DOWNSTAIRS       1.00      1.00      1.00        40
  WALKING_UPSTAIRS       1.00      0.98      0.99        47

          accuracy                           0.97       295
         macro avg       0.97      0.97      0.97       295
      weighted avg       0.97      0.97      0.97       295



In [59]:
#rdf
model8 = RandomForestClassifier(max_depth=5, n_estimators=100, random_state=1)
model8.fit(x_test, y_test)
ypred8 = model8.predict(x_val2)
print(accuracy_score(y_val2, ypred8))
print(confusion_matrix(y_val2, ypred8))
print(classification_report(y_val2, ypred8))

0.9389830508474576
[[53  0  0  0  0  0]
 [ 0 40  9  0  0  0]
 [ 0  1 53  0  0  0]
 [ 0  0  0 51  0  1]
 [ 0  0  0  3 36  1]
 [ 0  0  0  1  2 44]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00        53
           SITTING       0.98      0.82      0.89        49
          STANDING       0.85      0.98      0.91        54
           WALKING       0.93      0.98      0.95        52
WALKING_DOWNSTAIRS       0.95      0.90      0.92        40
  WALKING_UPSTAIRS       0.96      0.94      0.95        47

          accuracy                           0.94       295
         macro avg       0.94      0.94      0.94       295
      weighted avg       0.94      0.94      0.94       295



In [61]:
#decison tree
model9 = DecisionTreeClassifier(max_depth=5)
model9.fit(x_test, y_test)
ypred9 = model9.predict(x_val2)
print(accuracy_score(y_val2, ypred9))
print(confusion_matrix(y_val2, ypred9))
print(classification_report(y_val2, ypred9))

0.911864406779661
[[53  0  0  0  0  0]
 [ 0 44  5  0  0  0]
 [ 0  3 51  0  0  0]
 [ 0  0  0 47  1  4]
 [ 0  0  0  2 32  6]
 [ 0  0  0  2  3 42]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00        53
           SITTING       0.94      0.90      0.92        49
          STANDING       0.91      0.94      0.93        54
           WALKING       0.92      0.90      0.91        52
WALKING_DOWNSTAIRS       0.89      0.80      0.84        40
  WALKING_UPSTAIRS       0.81      0.89      0.85        47

          accuracy                           0.91       295
         macro avg       0.91      0.91      0.91       295
      weighted avg       0.91      0.91      0.91       295



In [63]:
#SVM(Support Vector Machine)
model10 = SVC(kernel='linear', C=3)
model10.fit(x_test, y_test)
ypred10 = model10.predict(x_val2)
print(accuracy_score(y_val2, ypred10))
print(confusion_matrix(y_val2, ypred10))
print(classification_report(y_val2, ypred10))

0.976271186440678
[[53  0  0  0  0  0]
 [ 0 45  4  0  0  0]
 [ 0  2 52  0  0  0]
 [ 0  0  0 52  0  0]
 [ 0  0  0  0 40  0]
 [ 0  0  0  0  1 46]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00        53
           SITTING       0.96      0.92      0.94        49
          STANDING       0.93      0.96      0.95        54
           WALKING       1.00      1.00      1.00        52
WALKING_DOWNSTAIRS       0.98      1.00      0.99        40
  WALKING_UPSTAIRS       1.00      0.98      0.99        47

          accuracy                           0.98       295
         macro avg       0.98      0.98      0.98       295
      weighted avg       0.98      0.98      0.98       295



In [49]:
train01  = {'knn':accuracy_score(y_val, ypred),
             'Logistic':accuracy_score(y_val, ypred2),
             'rdf':accuracy_score(y_val, ypred3),
             'decison tree':accuracy_score(y_val, ypred4),
             'SVM':accuracy_score(y_val, ypred5)}
train = pd.DataFrame(train01,index=[0])
train

Unnamed: 0,knn,Logistic,rdf,decison tree,SVM
0,0.95497,0.983008,0.940527,0.91164,0.983008


In [65]:
test  = {'knn':accuracy_score(y_val2, ypred6),
             'Logistic':accuracy_score(y_val2, ypred7),
             'rdf':accuracy_score(y_val2, ypred8),
             'decison tree':accuracy_score(y_val2, ypred9),
             'SVM':accuracy_score(y_val2, ypred10)}

test01 = pd.DataFrame(test,index=[0])
test01

Unnamed: 0,knn,Logistic,rdf,decison tree,SVM
0,0.945763,0.966102,0.938983,0.911864,0.976271


## 5.그리드서치

In [67]:
# KNN 
params = {'n_neighbors':range(2, 30), 'metric':['euclidean', 'manhattan']}
m5_2 = GridSearchCV(KNeighborsClassifier(), params, cv = 5)
m5_2.fit(x_train, y_train)
p5_2 = m5_2.predict(x_val)

In [68]:
m5_2.best_params_

{'metric': 'manhattan', 'n_neighbors': 4}

In [71]:
print('accuracy :',accuracy_score(y_val, p5_2))
print('='*60)
print(confusion_matrix(y_val, p5_2))
print('='*60)
print(classification_report(y_val, p5_2))

accuracy : 0.9770603228547153
[[222   0   0   0   0   0]
 [  0 187  11   0   0   0]
 [  0  14 221   0   0   0]
 [  0   0   0 192   0   0]
 [  0   0   0   1 155   0]
 [  0   0   0   1   0 173]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       222
           SITTING       0.93      0.94      0.94       198
          STANDING       0.95      0.94      0.95       235
           WALKING       0.99      1.00      0.99       192
WALKING_DOWNSTAIRS       1.00      0.99      1.00       156
  WALKING_UPSTAIRS       1.00      0.99      1.00       174

          accuracy                           0.98      1177
         macro avg       0.98      0.98      0.98      1177
      weighted avg       0.98      0.98      0.98      1177



In [73]:
#decision tree
params = {'max_depth':range(2, 15)}
m4_3 = GridSearchCV(DecisionTreeClassifier(), params, cv = 5)
m4_3.fit(x_train, y_train)
p4_3 = m4_3.predict(x_val)
m4_3.best_params_

{'max_depth': 9}

In [75]:
print('accuracy :',accuracy_score(y_val, p4_3))
print('='*60)
print(confusion_matrix(y_val, p4_3))
print('='*60)
print(classification_report(y_val, p4_3))

accuracy : 0.9422259983007647
[[222   0   0   0   0   0]
 [  0 179  19   0   0   0]
 [  0  16 219   0   0   0]
 [  0   0   1 177   6   8]
 [  0   0   0   3 153   0]
 [  0   0   0   8   7 159]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       222
           SITTING       0.92      0.90      0.91       198
          STANDING       0.92      0.93      0.92       235
           WALKING       0.94      0.92      0.93       192
WALKING_DOWNSTAIRS       0.92      0.98      0.95       156
  WALKING_UPSTAIRS       0.95      0.91      0.93       174

          accuracy                           0.94      1177
         macro avg       0.94      0.94      0.94      1177
      weighted avg       0.94      0.94      0.94      1177



In [86]:
#rdf
param = {'max_depth': range(1, 21)}
m4_4 = GridSearchCV(RandomForestClassifier(), param, cv=5)
m4_4.fit(x_train, y_train)
p4_4 = m4_4.predict(x_val)
m4_4.best_params_
print('* 파라미터:', model_rdf.best_params_)
print('* 예측성능:', model_rdf.best_score_)

NameError: name 'model_rdf' is not defined

## 5.모델 저장
* 각 알고리즘 별 최적의 성능 모델 저장
    * 단, 전체 변수를 이용해 생성한 모델만 저장합니다.(joblib.dump)
    * 튜닝 모델은, model.best_estimator_ 로 저장합니다.

In [80]:
joblib.dump(model5,'model5 SVM_v1.pkl')

['model5 SVM_v1.pkl']

In [82]:
joblib.dump(model10,'model10 SVM test_v1.pkl')

['model10 SVM test_v1.pkl']