ベース・モデルは LightGBM を用いて，モデル選択は層化K分割交差検証 (K=5) を用いたプログラムに変更した<br>
追加１・使用する特徴量を相関係数から選択した。<br>
追加２・学習データを分け、モデルを評価できるようにした

In [2]:
!pip install lightgbm



In [12]:
import pandas as pd 
import numpy as np 
from sklearn.preprocessing import StandardScaler 
from sklearn.model_selection import StratifiedKFold 
from lightgbm import LGBMClassifier, early_stopping 
from sklearn.metrics import accuracy_score, confusion_matrix 
  # データの読み込み 
train = pd.read_csv("./original/train.csv") 
test = pd.read_csv("./original/test.csv") 
  # 特徴量と目的変数に分ける 
X_train = train.drop("price_range", axis=1) 
y_train = train["price_range"] 
X_test = test.copy() 
  # train_test_splitを使用して学習データを分割する 
from sklearn.model_selection import train_test_split 
X_train_, X_valid, y_train_, y_valid = train_test_split(X_train, y_train, test_size=0.2, random_state=42) 
  # 特徴量のスケーリング 
scaler = StandardScaler() 
X_train_ = scaler.fit_transform(X_train_) 
X_valid = scaler.transform(X_valid) 
X_test = scaler.transform(X_test) 
  # ハイパーパラメータの設定 
params = { 
    'objective': 'multiclass', 
    'num_class': 4, 
    'metric': 'multi_error', 
    'boosting_type': 'gbdt', 
    'n_jobs': -1, 
    'num_leaves': 31, 
    'learning_rate': 0.05, 
    'max_depth': -1, 
    'min_child_samples': 20, 
    'subsample_freq': 1, 
    'subsample': 0.8, 
    'colsample_bytree': 0.6, 
    'reg_alpha': 0.1, 
    'reg_lambda': 0.1, 
    'verbosity': -1, 
    'seed': 42 
} 
  # モデルの作成と学習 
n_splits = 6
skf = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42) 
oof = np.zeros((len(X_train_), 4)) 
y_pred = np.zeros((len(X_test), 4)) 
models = []
for fold, (train_index, valid_index) in enumerate(skf.split(X_train_, y_train_)): 
    X_tr = X_train_[train_index] 
    y_tr = y_train_.iloc[train_index] 
    X_val = X_train_[valid_index] 
    y_val = y_train_.iloc[valid_index] 
    model = LGBMClassifier(**params) 
    es = early_stopping(stopping_rounds=100, verbose=False)
    model.fit(X_tr, y_tr, eval_set=[(X_val, y_val)], callbacks=[es], verbose=False) 
    oof[valid_index] = model.predict_proba(X_val) 
    y_pred += model.predict_proba(X_test) / n_splits 
    models.append(model)
  # モデルの評価 
y_train_pred = np.argmax(oof, axis=1)
y_val_pred = np.argmax(model.predict_proba(X_valid), axis=1)
print("Training Accuracy: ", accuracy_score(y_train_, y_train_pred)) 
print("Validation Accuracy: ", accuracy_score(y_valid, y_val_pred)) 
  # 予測結果の出力 
y_pred = np.argmax(y_pred, axis=1) 
output = pd.DataFrame({"id": test["id"], "price_range": y_pred}) 
output.to_csv("./submission/submission_lgbm_v9.csv", index=False, header=False)



Training Accuracy:  0.5052083333333334
Validation Accuracy:  0.4583333333333333


Training Accuracy:  0.9083333333333333
Validation Accuracy:  0.5474452554744526

5
Training Accuracy:  0.5052083333333334
Validation Accuracy:  0.5208333333333334

6
Training Accuracy:  0.5052083333333334
Validation Accuracy:  0.4583333333333333

7
Training Accuracy:  0.5041666666666667
Validation Accuracy:  0.48333333333333334

8
Training Accuracy:  0.509375
Validation Accuracy:  0.49583333333333335

9





In [3]:
output

Unnamed: 0,id,price_range
0,1,3
1,2,0
2,6,3
3,10,0
4,12,2
...,...,...
795,1978,3
796,1980,1
797,1982,3
798,1988,2
