ベース・モデルは LightGBM を用いて，モデル選択は層化K分割交差検証 (K=5) を用いたプログラムに変更した<br>
追加１・使用する特徴量を相関係数から選択した。<br>
追加２・学習データを分け、モデルを評価できるようにした

ハイパーパラメータのぐりっとサーチを実装したプログラム

In [1]:
!pip install lightgbm



In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold
from lightgbm import LGBMClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import GridSearchCV
 # データの読み込み
train = pd.read_csv("./original/train.csv")
test = pd.read_csv("./original/test.csv")
 # 相関が高い特徴量に絞り込む
columns = ['battery_power', 'ram', 'int_memory', 'three_g', 'touch_screen', 'wifi']
 # 特徴量と目的変数に分ける
X_train = train[columns]
y_train = train["price_range"]
X_test = test[columns]
 # train_test_splitを使用して学習データを分割する
from sklearn.model_selection import train_test_split
X_train_, X_val, y_train_, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)
 # 特徴量のスケーリング
scaler = StandardScaler()
X_train_ = scaler.fit_transform(X_train_)
X_test = scaler.transform(X_test)
 # ハイパーパラメータの設定
params = {
    'objective': 'multiclass',
    'num_class': 4,
    'metric': 'multi_error',
    'boosting_type': 'gbdt',
    'n_jobs': -1,
    'num_leaves': 31,
    'learning_rate': 0.05,
    'max_depth': -1,
    'min_child_samples': 20,
    'subsample_freq': 1,
    'subsample': 0.8,
    'colsample_bytree': 0.6,
    'reg_alpha': 0.1,
    'reg_lambda': 0.1,
    'verbosity': -1,
    'seed': 42
}
 # グリッドサーチで最適なハイパーパラメータを探索する
param_dist = {
    'learning_rate': [0.01, 0.05, 0.1],
    'max_depth': [5, 10, -1],
    'num_leaves': [31, 50, 100],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0],
    'reg_alpha': [0.01, 0.1, 1.0],
    'reg_lambda': [0.01, 0.1, 1.0],
}
lgbm = LGBMClassifier(**params)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
grid_search = GridSearchCV(lgbm, param_grid=param_dist, cv=skf, n_jobs=-1)
grid_search.fit(X_train_, y_train_)
best_params = grid_search.best_params_
print("Best hyperparameters: ", best_params)
 # 最適なハイパーパラメータを使用してモデルを再学習する
model = LGBMClassifier(**best_params)
model.fit(X_train_, y_train_, eval_set=[(X_val, y_val)], early_stopping_rounds=100, verbose=False)
 # モデルの評価
y_train_pred = model.predict(X_train_)
y_val_pred = model.predict(X_val)
print("Training Accuracy: ", accuracy_score(y_train_, y_train_pred))
print("Validation Accuracy: ", accuracy_score(y_val, y_val_pred))
 # 予測結果の出力
y_pred = model.predict(X_test)
output = pd.DataFrame({"id": test["id"], "price_range": y_pred})
output.to_csv("./submission/submission_lgbm_v7.csv", index=False, header=False)

Best hyperparameters:  {'colsample_bytree': 0.6, 'learning_rate': 0.05, 'max_depth': -1, 'num_leaves': 31, 'reg_alpha': 1.0, 'reg_lambda': 0.01, 'subsample': 0.6}
Training Accuracy:  0.40729166666666666
Validation Accuracy:  0.35




In [2]:
output

Unnamed: 0,index,price_range
0,1,1
1,2,0
2,6,3
3,10,1
4,12,1
...,...,...
795,1978,3
796,1980,1
797,1982,3
798,1988,2
