#**스마트폰 센서 데이터 기반 모션 분류**
# 단계2 : 기본 모델링


## 0.미션

* 데이터 전처리
    * 가변수화, 데이터 분할, NaN 확인 및 조치, 스케일링 등 필요한 전처리 수행
* 다양한 알고리즘으로 분류 모델 생성
    * 최소 4개 이상의 알고리즘을 적용하여 모델링 수행
    * 성능 비교
        * 각 모델의 성능을 관리하는 별도의 엑셀파일을 만들어 봅시다.
        * 성능 가이드 : Accuracy 0.900 ~

## 1.환경설정

* 세부 요구사항
    - 경로 설정 : 로컬 수행(Ananconda)
        * 제공된 압축파일을 다운받아 압축을 풀고
        * anaconda의 root directory(보통 C:/Users/< ID > 에 project3_1 폴더를 만들고, 복사해 넣습니다.
    - 기본적으로 필요한 라이브러리를 import 하도록 코드가 작성되어 있습니다.
        * 필요하다고 판단되는 라이브러리를 추가하세요.


In [12]:
import sklearn
print(sklearn.__version__)

1.4.2


### (1) 라이브러리 로딩

In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.auto import tqdm
import warnings
warnings.filterwarnings('ignore')

import joblib

# 필요한 라이브러리, 함수 로딩 ------------------
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import *

* 제공 함수 생성
    * 변수 중요도를 시각화할 수 있는 함수를 제공합니다.
    * 입력 :
        * importance : 트리모델의 변수 중요도(예: model.feature_importances_)
        * names : 변수 이름 목록(예 : x_train.columns
        * result_only  : 변수 중요도 순으로 데이터프레임만 return할지, 그래프도 포함할지 결정. False이면 결과 데이터프레임 + 그래프
        * topn : 중요도 상위 n개만 표시. all 이면 전체.
    * 출력 :
        * 중요도 그래프 : 중요도 내림차순으로 정렬
        * 중요도 데이터프레임 : 중요도 내림차순으로 정렬

In [16]:
# 변수의 특성 중요도 계산하기
def plot_feature_importance(importance, names, result_only = False, topn = 'all'):
    feature_importance = np.array(importance)
    feature_name = np.array(names)

    data={'feature_name':feature_name,'feature_importance':feature_importance}
    fi_temp = pd.DataFrame(data)

    #변수의 특성 중요도 순으로 정렬하기
    fi_temp.sort_values(by=['feature_importance'], ascending=False,inplace=True)
    fi_temp.reset_index(drop=True, inplace = True)

    if topn == 'all' :
        fi_df = fi_temp.copy()
    else :
        fi_df = fi_temp.iloc[:topn]

    #변수의 특성 중요도 그래프로 그리기
    if result_only == False :
        plt.figure(figsize=(10,20))
        sns.barplot(x='feature_importance', y='feature_name', data = fi_df)

        plt.xlabel('importance')
        plt.ylabel('feature name')
        plt.grid()

    return fi_df

### (2) 데이터 불러오기

* 주어진 데이터셋
    * data01_train.csv : 학습 및 검증용
    * data01_test.csv : 테스트용

* 세부 요구사항
    * 칼럼 삭제 : data01_train.csv와 data01_test.csv 에서 'subject' 칼럼은 불필요하므로 삭제합니다.

#### 1) 데이터로딩

In [219]:
data = pd.read_csv('data01_train.csv')

In [21]:
data

Unnamed: 0,tBodyAcc-mean()-X,tBodyAcc-mean()-Y,tBodyAcc-mean()-Z,tBodyAcc-std()-X,tBodyAcc-std()-Y,tBodyAcc-std()-Z,tBodyAcc-mad()-X,tBodyAcc-mad()-Y,tBodyAcc-mad()-Z,tBodyAcc-max()-X,...,fBodyBodyGyroJerkMag-kurtosis(),"angle(tBodyAccMean,gravity)","angle(tBodyAccJerkMean),gravityMean)","angle(tBodyGyroMean,gravityMean)","angle(tBodyGyroJerkMean,gravityMean)","angle(X,gravityMean)","angle(Y,gravityMean)","angle(Z,gravityMean)",subject,Activity
0,0.288508,-0.009196,-0.103362,-0.988986,-0.962797,-0.967422,-0.989000,-0.962596,-0.965650,-0.929747,...,-0.816696,-0.042494,-0.044218,0.307873,0.072790,-0.601120,0.331298,0.165163,21,STANDING
1,0.265757,-0.016576,-0.098163,-0.989551,-0.994636,-0.987435,-0.990189,-0.993870,-0.987558,-0.937337,...,-0.693515,-0.062899,0.388459,-0.765014,0.771524,0.345205,-0.769186,-0.147944,15,LAYING
2,0.278709,-0.014511,-0.108717,-0.997720,-0.981088,-0.994008,-0.997934,-0.982187,-0.995017,-0.942584,...,-0.829311,0.000265,-0.525022,-0.891875,0.021528,-0.833564,0.202434,-0.032755,11,STANDING
3,0.289795,-0.035536,-0.150354,-0.231727,-0.006412,-0.338117,-0.273557,0.014245,-0.347916,0.008288,...,-0.408956,-0.255125,0.612804,0.747381,-0.072944,-0.695819,0.287154,0.111388,17,WALKING
4,0.394807,0.034098,0.091229,0.088489,-0.106636,-0.388502,-0.010469,-0.109680,-0.346372,0.584131,...,-0.563437,-0.044344,-0.845268,-0.974650,-0.887846,-0.705029,0.264952,0.137758,17,WALKING_DOWNSTAIRS
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5876,0.277194,-0.012389,-0.131974,-0.994046,-0.940578,-0.917337,-0.994261,-0.932830,-0.908088,-0.936219,...,-0.690363,-0.034888,-0.261437,-0.391477,-0.877612,-0.912365,0.114009,0.080146,21,SITTING
5877,0.191568,0.013328,-0.105174,-0.126969,-0.121729,-0.327480,-0.192523,-0.109923,-0.295286,0.078644,...,-0.879215,0.721718,0.623151,0.866858,-0.445660,-0.690278,0.303194,-0.044188,15,WALKING_UPSTAIRS
5878,0.267981,-0.018348,-0.107440,-0.991303,-0.989881,-0.990313,-0.992386,-0.988852,-0.991237,-0.936099,...,-0.886851,0.060173,0.228739,0.684400,-0.216665,0.620363,-0.437247,-0.571840,19,LAYING
5879,0.212787,-0.048130,-0.121001,-0.041373,0.052449,-0.585361,-0.100714,0.023353,-0.554707,0.219814,...,-0.053556,0.260880,0.551742,-0.943773,-0.862899,-0.718009,0.292856,0.024920,6,WALKING_UPSTAIRS


In [215]:
test = pd.read_csv('data01_test.csv')

In [23]:
test

Unnamed: 0,tBodyAcc-mean()-X,tBodyAcc-mean()-Y,tBodyAcc-mean()-Z,tBodyAcc-std()-X,tBodyAcc-std()-Y,tBodyAcc-std()-Z,tBodyAcc-mad()-X,tBodyAcc-mad()-Y,tBodyAcc-mad()-Z,tBodyAcc-max()-X,...,fBodyBodyGyroJerkMag-kurtosis(),"angle(tBodyAccMean,gravity)","angle(tBodyAccJerkMean),gravityMean)","angle(tBodyGyroMean,gravityMean)","angle(tBodyGyroJerkMean,gravityMean)","angle(X,gravityMean)","angle(Y,gravityMean)","angle(Z,gravityMean)",subject,Activity
0,0.284379,-0.021981,-0.116683,-0.992490,-0.979640,-0.963321,-0.992563,-0.977304,-0.958142,-0.938850,...,-0.850065,-0.018043,0.092304,0.074220,-0.714534,-0.671943,-0.018351,-0.185733,22,SITTING
1,0.277440,-0.028086,-0.118412,-0.996620,-0.927676,-0.972294,-0.997346,-0.931405,-0.971788,-0.939837,...,-0.613367,-0.022456,-0.155414,0.247498,-0.112257,-0.826816,0.184489,-0.068699,15,STANDING
2,0.305833,-0.041023,-0.087303,0.006880,0.182800,-0.237984,0.005642,0.028616,-0.236474,0.016311,...,0.394388,-0.362616,0.171069,0.576349,-0.688314,-0.743234,0.272186,0.053101,22,WALKING
3,0.276053,-0.016487,-0.108381,-0.995379,-0.983978,-0.975854,-0.995877,-0.985280,-0.974907,-0.941425,...,-0.841455,0.289548,0.079801,-0.020033,0.291898,-0.639435,-0.111998,-0.123298,8,SITTING
4,0.271998,0.016904,-0.078856,-0.973468,-0.702462,-0.869450,-0.979810,-0.711601,-0.856807,-0.920760,...,0.214219,0.010111,0.114179,-0.830776,-0.325098,-0.840817,0.116237,-0.096615,5,STANDING
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1466,0.278725,-0.015262,-0.106398,-0.993625,-0.982845,-0.974745,-0.993963,-0.981100,-0.974596,-0.939303,...,-0.874066,-0.054788,0.712975,0.300318,-0.319188,-0.857336,0.120184,0.119276,14,SITTING
1467,0.275803,-0.019257,-0.109078,-0.998614,-0.991621,-0.987403,-0.998813,-0.991503,-0.986802,-0.945442,...,-0.721050,0.076333,-0.021599,-0.277268,0.754011,-0.764185,0.212111,0.138595,16,STANDING
1468,0.240402,0.006361,-0.121377,-0.045805,0.189930,0.332664,-0.114706,0.157771,0.195271,0.210139,...,-0.615554,0.330378,-0.667635,0.806563,-0.850113,-0.639564,0.185363,0.260201,8,WALKING_DOWNSTAIRS
1469,0.135873,-0.020675,-0.116644,-0.960526,-0.955134,-0.985818,-0.963115,-0.971338,-0.988261,-0.946289,...,-0.422383,-0.048474,0.236761,-0.186581,0.396648,0.790877,-0.474618,-0.505953,19,LAYING


In [221]:
drop_cols = ['subject']

data.drop(columns=drop_cols, inplace=True)
test.drop(columns=drop_cols, inplace=True)

#### 2) 기본 정보 조회

In [26]:
data.shape

(5881, 562)

In [27]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5881 entries, 0 to 5880
Columns: 562 entries, tBodyAcc-mean()-X to Activity
dtypes: float64(561), object(1)
memory usage: 25.2+ MB


In [28]:
data.describe()

Unnamed: 0,tBodyAcc-mean()-X,tBodyAcc-mean()-Y,tBodyAcc-mean()-Z,tBodyAcc-std()-X,tBodyAcc-std()-Y,tBodyAcc-std()-Z,tBodyAcc-mad()-X,tBodyAcc-mad()-Y,tBodyAcc-mad()-Z,tBodyAcc-max()-X,...,fBodyBodyGyroJerkMag-meanFreq(),fBodyBodyGyroJerkMag-skewness(),fBodyBodyGyroJerkMag-kurtosis(),"angle(tBodyAccMean,gravity)","angle(tBodyAccJerkMean),gravityMean)","angle(tBodyGyroMean,gravityMean)","angle(tBodyGyroJerkMean,gravityMean)","angle(X,gravityMean)","angle(Y,gravityMean)","angle(Z,gravityMean)"
count,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,...,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0
mean,0.274811,-0.017799,-0.109396,-0.603138,-0.509815,-0.604058,-0.628151,-0.525944,-0.605374,-0.46549,...,0.126955,-0.305883,-0.623548,0.008524,-0.001185,0.00934,-0.007099,-0.491501,0.059299,-0.054594
std,0.067614,0.039422,0.058373,0.448807,0.501815,0.417319,0.424345,0.485115,0.413043,0.544995,...,0.249176,0.322808,0.310371,0.33973,0.447197,0.60819,0.476738,0.509069,0.29734,0.278479
min,-0.503823,-0.684893,-1.0,-1.0,-0.999844,-0.999667,-1.0,-0.999419,-1.0,-1.0,...,-0.965725,-0.979261,-0.999765,-0.97658,-1.0,-1.0,-1.0,-1.0,-1.0,-0.980143
25%,0.262919,-0.024877,-0.121051,-0.992774,-0.97768,-0.980127,-0.993602,-0.977865,-0.980112,-0.936067,...,-0.02161,-0.541969,-0.845985,-0.122361,-0.294369,-0.481718,-0.373345,-0.811397,-0.018203,-0.141555
50%,0.277154,-0.017221,-0.108781,-0.943933,-0.844575,-0.856352,-0.948501,-0.849266,-0.849896,-0.878729,...,0.133887,-0.342923,-0.712677,0.010278,0.005146,0.011448,-0.000847,-0.709441,0.182893,0.003951
75%,0.288526,-0.01092,-0.098163,-0.24213,-0.034499,-0.26269,-0.291138,-0.068857,-0.268539,-0.01369,...,0.288944,-0.127371,-0.501158,0.154985,0.28503,0.499857,0.356236,-0.51133,0.248435,0.111932
max,1.0,1.0,1.0,1.0,0.916238,1.0,1.0,0.967664,1.0,1.0,...,0.9467,0.989538,0.956845,1.0,1.0,0.998702,0.996078,0.977344,0.478157,1.0


In [29]:
data.corr(numeric_only=True)

Unnamed: 0,tBodyAcc-mean()-X,tBodyAcc-mean()-Y,tBodyAcc-mean()-Z,tBodyAcc-std()-X,tBodyAcc-std()-Y,tBodyAcc-std()-Z,tBodyAcc-mad()-X,tBodyAcc-mad()-Y,tBodyAcc-mad()-Z,tBodyAcc-max()-X,...,fBodyBodyGyroJerkMag-meanFreq(),fBodyBodyGyroJerkMag-skewness(),fBodyBodyGyroJerkMag-kurtosis(),"angle(tBodyAccMean,gravity)","angle(tBodyAccJerkMean),gravityMean)","angle(tBodyGyroMean,gravityMean)","angle(tBodyGyroJerkMean,gravityMean)","angle(X,gravityMean)","angle(Y,gravityMean)","angle(Z,gravityMean)"
tBodyAcc-mean()-X,1.000000,0.203297,-0.266383,0.012067,-0.018489,-0.040405,0.018367,-0.019752,-0.043303,0.048725,...,0.023137,-0.012101,-0.011490,-0.567455,0.019251,0.034065,0.030968,-0.027010,0.028177,0.019025
tBodyAcc-mean()-Y,0.203297,1.000000,-0.145709,-0.049467,-0.052489,-0.058499,-0.048703,-0.053018,-0.058559,-0.038544,...,-0.015872,-0.003101,-0.003467,0.076995,-0.013001,0.019670,0.074955,0.000039,0.002376,-0.022329
tBodyAcc-mean()-Z,-0.266383,-0.145709,1.000000,-0.024839,-0.017613,-0.016924,-0.023323,-0.015794,-0.012575,-0.040397,...,-0.017359,0.018155,0.018234,0.056806,-0.036749,-0.054069,-0.035593,0.007045,-0.017900,-0.019169
tBodyAcc-std()-X,0.012067,-0.049467,-0.024839,1.000000,0.927809,0.851841,0.998656,0.921154,0.846308,0.981190,...,-0.070634,0.151555,0.116425,-0.043069,-0.032145,0.016542,-0.024749,-0.373500,0.470834,0.392843
tBodyAcc-std()-Y,-0.018489,-0.052489,-0.017613,0.927809,1.000000,0.893995,0.923386,0.997320,0.892843,0.916853,...,-0.107573,0.209119,0.177232,-0.027671,-0.022181,-0.012927,-0.015237,-0.380258,0.521249,0.429141
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"angle(tBodyGyroMean,gravityMean)",0.034065,0.019670,-0.054069,0.016542,-0.012927,-0.016674,0.016846,-0.015394,-0.021836,0.021599,...,-0.027836,0.029841,0.019651,-0.003279,0.000576,1.000000,-0.113677,-0.001992,-0.013863,-0.021702
"angle(tBodyGyroJerkMean,gravityMean)",0.030968,0.074955,-0.035593,-0.024749,-0.015237,-0.004077,-0.025205,-0.013152,-0.004784,-0.028239,...,-0.004927,-0.018015,-0.015179,-0.027075,0.029856,-0.113677,1.000000,0.012598,0.005132,0.001242
"angle(X,gravityMean)",-0.027010,0.000039,0.007045,-0.373500,-0.380258,-0.346237,-0.370024,-0.376665,-0.348279,-0.385300,...,0.077532,-0.088810,-0.080113,0.012382,0.020700,-0.001992,0.012598,1.000000,-0.783107,-0.639201
"angle(Y,gravityMean)",0.028177,0.002376,-0.017900,0.470834,0.521249,0.473572,0.466287,0.522401,0.474880,0.478832,...,-0.097644,0.092383,0.080726,-0.000955,-0.013508,-0.013863,0.005132,-0.783107,1.000000,0.590883


In [30]:
# 결측치 확인
data.isna().sum()

tBodyAcc-mean()-X                       0
tBodyAcc-mean()-Y                       0
tBodyAcc-mean()-Z                       0
tBodyAcc-std()-X                        0
tBodyAcc-std()-Y                        0
                                       ..
angle(tBodyGyroJerkMean,gravityMean)    0
angle(X,gravityMean)                    0
angle(Y,gravityMean)                    0
angle(Z,gravityMean)                    0
Activity                                0
Length: 562, dtype: int64

## **2. 데이터 전처리**

* 가변수화, 데이터 분할, NaN 확인 및 조치, 스케일링 등 필요한 전처리를 수행한다.


### (1) 데이터 분할1 : x, y

* 세부 요구사항
    - x, y로 분할합니다.

In [34]:
target = 'Activity'

x = data.drop(columns=target)
y = data.loc[:,target]

### (2) 데이터분할2 : train, validation

* 세부 요구사항
    - train : val = 8 : 2 혹은 7 : 3
    - random_state 옵션을 사용하여 다른 모델과 비교를 위해 성능이 재현되도록 합니다.

In [37]:
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.2, random_state=10)

### (3) 스케일링


* 세부 요구사항
    - 스케일링을 필요로 하는 알고리즘 사용을 위해서 코드 수행
    - min-max 방식 혹은 standard 방식 중 한가지 사용.

In [40]:
minmax = MinMaxScaler()
x_train_s = minmax.fit_transform(x_train)
x_val_s = minmax.transform(x_val)

## **3. 기본 모델링**



* 세부 요구사항
    - 최소 5개 이상의 알고리즘을 적용하여 모델링을 수행한다.
    - 각 알고리즘 별로 다음 중 몇가지를 시도하며 성능을 비교한다.

### (1) 모델1 : KNN

In [60]:
# 모델 선언
model_knn = KNeighborsClassifier(n_neighbors=5)

In [64]:
# 학습
model_knn.fit(x_train_s, y_train)

In [71]:
# 예측
y_pred_1 = model_knn.predict(x_val_s)

In [73]:
# 평가
print('accuracy :',accuracy_score(y_val, y_pred_1))
print(confusion_matrix(y_val, y_pred_1))
print(classification_report(y_val, y_pred_1))

accuracy : 0.9643160577740016
[[221   0   0   0   0   0]
 [  0 159  25   0   0   0]
 [  0  14 198   0   0   0]
 [  0   0   0 219   0   1]
 [  0   0   0   0 167   2]
 [  0   0   0   0   0 171]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       221
           SITTING       0.92      0.86      0.89       184
          STANDING       0.89      0.93      0.91       212
           WALKING       1.00      1.00      1.00       220
WALKING_DOWNSTAIRS       1.00      0.99      0.99       169
  WALKING_UPSTAIRS       0.98      1.00      0.99       171

          accuracy                           0.96      1177
         macro avg       0.96      0.96      0.96      1177
      weighted avg       0.96      0.96      0.96      1177



### (2) 모델2 : Decision Tree

In [75]:
# 모델 선언
model_dt = DecisionTreeClassifier(max_depth=5, random_state=10)

In [77]:
# 학습
model_dt.fit(x_train, y_train)

In [79]:
# 예측
y_pred_2 = model_dt.predict(x_val)

In [81]:
# 평가
print('accuracy :',accuracy_score(y_val, y_pred_2))
print(confusion_matrix(y_val, y_pred_2))
print(classification_report(y_val, y_pred_2))

accuracy : 0.9056924384027187
[[221   0   0   0   0   0]
 [  0 169  15   0   0   0]
 [  0  23 188   0   0   1]
 [  0   0   0 198  12  10]
 [  0   0   0  10 151   8]
 [  0   0   0  18  14 139]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       221
           SITTING       0.88      0.92      0.90       184
          STANDING       0.93      0.89      0.91       212
           WALKING       0.88      0.90      0.89       220
WALKING_DOWNSTAIRS       0.85      0.89      0.87       169
  WALKING_UPSTAIRS       0.88      0.81      0.84       171

          accuracy                           0.91      1177
         macro avg       0.90      0.90      0.90      1177
      weighted avg       0.91      0.91      0.91      1177



### (3) 모델3 : Logistic Regression

In [83]:
# 모델 선언
model_lr = LogisticRegression(random_state=10)

In [85]:
# 학습
model_lr.fit(x_train, y_train)

In [87]:
# 예측
y_pred_3 = model_lr.predict(x_val)

In [89]:
# 평가
print('accuracy :',accuracy_score(y_val, y_pred_3))
print(confusion_matrix(y_val, y_pred_3))
print(classification_report(y_val, y_pred_3))

accuracy : 0.9864061172472387
[[221   0   0   0   0   0]
 [  0 180   4   0   0   0]
 [  0  11 201   0   0   0]
 [  0   0   0 219   0   1]
 [  0   0   0   0 169   0]
 [  0   0   0   0   0 171]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       221
           SITTING       0.94      0.98      0.96       184
          STANDING       0.98      0.95      0.96       212
           WALKING       1.00      1.00      1.00       220
WALKING_DOWNSTAIRS       1.00      1.00      1.00       169
  WALKING_UPSTAIRS       0.99      1.00      1.00       171

          accuracy                           0.99      1177
         macro avg       0.99      0.99      0.99      1177
      weighted avg       0.99      0.99      0.99      1177



### (4) 모델4 : SVM

In [91]:
# 모델 선언
model_svm = SVC(kernel='linear', C=1, random_state=10)

In [93]:
model_svm.fit(x_train_s, y_train)

In [97]:
# 예측
y_pred_4 = model_svm.predict(x_val_s)

In [99]:
# 평가
print('accuracy :',accuracy_score(y_val, y_pred_4))
print(confusion_matrix(y_val, y_pred_4))
print(classification_report(y_val, y_pred_4))

accuracy : 0.9872557349192863
[[221   0   0   0   0   0]
 [  0 180   4   0   0   0]
 [  0  11 201   0   0   0]
 [  0   0   0 220   0   0]
 [  0   0   0   0 169   0]
 [  0   0   0   0   0 171]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       221
           SITTING       0.94      0.98      0.96       184
          STANDING       0.98      0.95      0.96       212
           WALKING       1.00      1.00      1.00       220
WALKING_DOWNSTAIRS       1.00      1.00      1.00       169
  WALKING_UPSTAIRS       1.00      1.00      1.00       171

          accuracy                           0.99      1177
         macro avg       0.99      0.99      0.99      1177
      weighted avg       0.99      0.99      0.99      1177



### (5) 모델5 : Random Forest

In [101]:
# 모델 선언
model_rf = RandomForestClassifier(max_depth=5, random_state=10)

In [103]:
model_rf.fit(x_train, y_train)

In [105]:
# 예측
y_pred_5 = model_rf.predict(x_val)

In [107]:
# 평가
print('accuracy :',accuracy_score(y_val, y_pred_5))
print(confusion_matrix(y_val, y_pred_5))
print(classification_report(y_val, y_pred_5))

accuracy : 0.9464740866610025
[[221   0   0   0   0   0]
 [  0 167  17   0   0   0]
 [  0  10 202   0   0   0]
 [  0   0   0 205   5  10]
 [  0   0   0   7 152  10]
 [  0   0   0   2   2 167]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       221
           SITTING       0.94      0.91      0.93       184
          STANDING       0.92      0.95      0.94       212
           WALKING       0.96      0.93      0.94       220
WALKING_DOWNSTAIRS       0.96      0.90      0.93       169
  WALKING_UPSTAIRS       0.89      0.98      0.93       171

          accuracy                           0.95      1177
         macro avg       0.95      0.94      0.94      1177
      weighted avg       0.95      0.95      0.95      1177



## 4.성능비교

* 세부 요구사항
    - 각 모델에 대해서 test 데이터로 성능 측정후 비교
    

### (1) 데이터 분할1 : x, y

In [224]:
target = 'Activity'

x = test.drop(columns=target)
y = test.loc[:,target]

### (2) 스케일링

In [227]:
x_val_s2 = minmax.transform(x_val)

### (4) 각 모델 성능표시

#### 1. KNN

In [229]:
# 모델 선언
model_knn = KNeighborsClassifier(n_neighbors=5)

In [231]:
# 학습
model_knn.fit(x_train_s, y_train)

In [233]:
# 예측
y_pred = model_knn.predict(x_val_s2)

In [237]:
# 평가
print('accuracy :',accuracy_score(y_val, y_pred))
print(confusion_matrix(y_val, y_pred))
print(classification_report(y_val, y_pred))

accuracy : 0.9643160577740016
[[221   0   0   0   0   0]
 [  0 159  25   0   0   0]
 [  0  14 198   0   0   0]
 [  0   0   0 219   0   1]
 [  0   0   0   0 167   2]
 [  0   0   0   0   0 171]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       221
           SITTING       0.92      0.86      0.89       184
          STANDING       0.89      0.93      0.91       212
           WALKING       1.00      1.00      1.00       220
WALKING_DOWNSTAIRS       1.00      0.99      0.99       169
  WALKING_UPSTAIRS       0.98      1.00      0.99       171

          accuracy                           0.96      1177
         macro avg       0.96      0.96      0.96      1177
      weighted avg       0.96      0.96      0.96      1177



#### 2. Decision Tree

In [239]:
# 모델 선언
model_dt = DecisionTreeClassifier(max_depth=5, random_state=10)

In [241]:
# 학습
model_dt.fit(x_train, y_train)

In [247]:
# 예측
y_pred = model_dt.predict(x_val)

In [249]:
# 평가
print('accuracy :',accuracy_score(y_val, y_pred))
print(confusion_matrix(y_val, y_pred))
print(classification_report(y_val, y_pred))

accuracy : 0.9056924384027187
[[221   0   0   0   0   0]
 [  0 169  15   0   0   0]
 [  0  23 188   0   0   1]
 [  0   0   0 198  12  10]
 [  0   0   0  10 151   8]
 [  0   0   0  18  14 139]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       221
           SITTING       0.88      0.92      0.90       184
          STANDING       0.93      0.89      0.91       212
           WALKING       0.88      0.90      0.89       220
WALKING_DOWNSTAIRS       0.85      0.89      0.87       169
  WALKING_UPSTAIRS       0.88      0.81      0.84       171

          accuracy                           0.91      1177
         macro avg       0.90      0.90      0.90      1177
      weighted avg       0.91      0.91      0.91      1177



#### 3. Logistic Regression

In [251]:
# 모델 선언
model_lr = LogisticRegression(random_state=10)

In [253]:
# 학습
model_lr.fit(x_train, y_train)

In [255]:
# 예측
y_pred = model_lr.predict(x_val)

In [257]:
# 평가
print('accuracy :',accuracy_score(y_val, y_pred))
print(confusion_matrix(y_val, y_pred))
print(classification_report(y_val, y_pred))

accuracy : 0.9864061172472387
[[221   0   0   0   0   0]
 [  0 180   4   0   0   0]
 [  0  11 201   0   0   0]
 [  0   0   0 219   0   1]
 [  0   0   0   0 169   0]
 [  0   0   0   0   0 171]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       221
           SITTING       0.94      0.98      0.96       184
          STANDING       0.98      0.95      0.96       212
           WALKING       1.00      1.00      1.00       220
WALKING_DOWNSTAIRS       1.00      1.00      1.00       169
  WALKING_UPSTAIRS       0.99      1.00      1.00       171

          accuracy                           0.99      1177
         macro avg       0.99      0.99      0.99      1177
      weighted avg       0.99      0.99      0.99      1177



#### 4. SVM

In [259]:
# 모델 선언
model_svm = SVC(kernel='linear', C=1, random_state=10)

In [261]:
model_svm.fit(x_train_s, y_train)

In [263]:
# 예측
y_pred = model_svm.predict(x_val_s2)

In [265]:
# 평가
print('accuracy :',accuracy_score(y_val, y_pred))
print(confusion_matrix(y_val, y_pred))
print(classification_report(y_val, y_pred))

accuracy : 0.9872557349192863
[[221   0   0   0   0   0]
 [  0 180   4   0   0   0]
 [  0  11 201   0   0   0]
 [  0   0   0 220   0   0]
 [  0   0   0   0 169   0]
 [  0   0   0   0   0 171]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       221
           SITTING       0.94      0.98      0.96       184
          STANDING       0.98      0.95      0.96       212
           WALKING       1.00      1.00      1.00       220
WALKING_DOWNSTAIRS       1.00      1.00      1.00       169
  WALKING_UPSTAIRS       1.00      1.00      1.00       171

          accuracy                           0.99      1177
         macro avg       0.99      0.99      0.99      1177
      weighted avg       0.99      0.99      0.99      1177



#### 5. Random Forest

In [267]:
# 모델 선언
model_rf = RandomForestClassifier(max_depth=5, random_state=10)

In [269]:
model_rf.fit(x_train, y_train)

In [271]:
# 예측
y_pred = model_rf.predict(x_val)

In [273]:
# 평가
print('accuracy :',accuracy_score(y_val, y_pred))
print(confusion_matrix(y_val, y_pred))
print(classification_report(y_val, y_pred))

accuracy : 0.9464740866610025
[[221   0   0   0   0   0]
 [  0 167  17   0   0   0]
 [  0  10 202   0   0   0]
 [  0   0   0 205   5  10]
 [  0   0   0   7 152  10]
 [  0   0   0   2   2 167]]
                    precision    recall  f1-score   support

            LAYING       1.00      1.00      1.00       221
           SITTING       0.94      0.91      0.93       184
          STANDING       0.92      0.95      0.94       212
           WALKING       0.96      0.93      0.94       220
WALKING_DOWNSTAIRS       0.96      0.90      0.93       169
  WALKING_UPSTAIRS       0.89      0.98      0.93       171

          accuracy                           0.95      1177
         macro avg       0.95      0.94      0.94      1177
      weighted avg       0.95      0.95      0.95      1177



## 5.모델 저장
* 각 알고리즘 별 최적의 성능 모델 저장
    * 단, 전체 변수를 이용해 생성한 모델만 저장합니다.(joblib.dump)
    * 튜닝 모델은, model.best_estimator_ 로 저장합니다.

In [275]:
joblib.dump(model_knn, 'model_knn.pkl')

['model_knn.pkl']

In [277]:
joblib.dump(model_dt, 'model_dt.pkl')

['model_dt.pkl']

In [279]:
joblib.dump(model_lr, 'model_lr.pkl')

['model_lr.pkl']

In [281]:
joblib.dump(model_svm, 'model_svm.pkl')

['model_svm.pkl']

In [283]:
joblib.dump(model_rf, 'model_rf.pkl')

['model_rf.pkl']