样例地址：https://github.com/Microsoft/LightGBM/blob/master/examples/python-guide/simple_example.py
本文稍有改动，将其`objective`改为`binary`

## 导入包

In [42]:
import lightgbm as lgb
import pandas as pd
from sklearn.metrics import mean_squared_error,roc_auc_score

## 加载数据

In [43]:
print("Loading data...")
df_train=pd.read_csv('files/data/python83/regression.train.txt',header=None,sep='\t')
df_test=pd.read_csv('files/data/python83/regression.test.txt',header=None,sep='\t')
print(df_train.shape)
print(df_test.shape)
df_train.head()

Loading data...
(7000, 29)
(500, 29)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,19,20,21,22,23,24,25,26,27,28
0,1,0.869,-0.635,0.226,0.327,-0.69,0.754,-0.249,-1.092,0.0,...,-0.01,-0.046,3.102,1.354,0.98,0.978,0.92,0.722,0.989,0.877
1,1,0.908,0.329,0.359,1.498,-0.313,1.096,-0.558,-1.588,2.173,...,-1.139,-0.001,0.0,0.302,0.833,0.986,0.978,0.78,0.992,0.798
2,1,0.799,1.471,-1.636,0.454,0.426,1.105,1.282,1.382,0.0,...,1.129,0.9,0.0,0.91,1.108,0.986,0.951,0.803,0.866,0.78
3,0,1.344,-0.877,0.936,1.992,0.882,1.786,-1.647,-0.942,0.0,...,-0.678,-1.36,0.0,0.947,1.029,0.999,0.728,0.869,1.027,0.958
4,1,1.105,0.321,1.522,0.883,-1.205,0.681,-1.07,-0.922,0.0,...,-0.374,0.113,0.0,0.756,1.361,0.987,0.838,1.133,0.872,0.808


数据格式为：第一列为1,0标签，作为y；其余列为特征，为X

In [44]:
y_train=df_train[0].values
y_test=df_test[0].values
X_train=df_train.drop(0,axis=1).values
X_test=df_test.drop(0,axis=1).values

## 创建lightgbm格式的数据

In [45]:
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

## 配置参数
参数以字典的形式

In [46]:
params={
    'task':'train',
    'boosting_type':'gbdt',
    'objective':'binary',
    'metric':{'l2','auc'},
    'num_leaves':31,
    'learing_rate':0.05,
    'feature_fraction':0.9,
    'bagging_fraction':0.8,
    'bagging_freq':5,
    'verbose':0
}

In [47]:
## 训练

In [48]:
print('Start training...')
gbm=lgb.train(params,
             lgb_train,
             num_boost_round=20,
             valid_sets=lgb_eval,
             early_stopping_rounds=5)

Start training...
[1]	valid_0's l2: 0.239847	valid_0's auc: 0.721096
Training until validation scores don't improve for 5 rounds.
[2]	valid_0's l2: 0.233245	valid_0's auc: 0.744114
[3]	valid_0's l2: 0.226011	valid_0's auc: 0.784386
[4]	valid_0's l2: 0.220255	valid_0's auc: 0.782484
[5]	valid_0's l2: 0.215338	valid_0's auc: 0.783604
[6]	valid_0's l2: 0.211726	valid_0's auc: 0.785902
[7]	valid_0's l2: 0.207911	valid_0's auc: 0.794988
[8]	valid_0's l2: 0.204013	valid_0's auc: 0.798834
[9]	valid_0's l2: 0.201921	valid_0's auc: 0.793223
[10]	valid_0's l2: 0.199101	valid_0's auc: 0.795287
[11]	valid_0's l2: 0.197132	valid_0's auc: 0.797294
[12]	valid_0's l2: 0.195174	valid_0's auc: 0.799318
[13]	valid_0's l2: 0.193806	valid_0's auc: 0.800721
[14]	valid_0's l2: 0.192564	valid_0's auc: 0.801374
[15]	valid_0's l2: 0.190438	valid_0's auc: 0.805131
[16]	valid_0's l2: 0.189536	valid_0's auc: 0.805131
[17]	valid_0's l2: 0.18758	valid_0's auc: 0.807356
[18]	valid_0's l2: 0.186451	valid_0's auc: 0.81

## 保存模型

In [49]:
print('Save model...')
gbm.save_model('files/data/python83/model.pkl')

Save model...


<lightgbm.basic.Booster at 0x18237fa0978>

## 预测

In [50]:
print('Start predicting...')
y_pred=gbm.predict(X_test,num_iteration=gbm.best_iteration)
y_pred

Start predicting...


array([0.705176  , 0.44674959, 0.26879015, 0.56545158, 0.28760497,
       0.26702311, 0.35263565, 0.38950501, 0.69540525, 0.35595029,
       0.69969559, 0.76458749, 0.74614799, 0.70594182, 0.42606952,
       0.62763372, 0.42512869, 0.59642855, 0.57876619, 0.62400946,
       0.7391993 , 0.66023603, 0.57991074, 0.45893718, 0.37315465,
       0.61269029, 0.62532473, 0.72349027, 0.46891632, 0.71793287,
       0.59860109, 0.53960073, 0.4390982 , 0.54885602, 0.57878077,
       0.37460297, 0.30708314, 0.39402885, 0.41004637, 0.70949937,
       0.14217593, 0.61979219, 0.41327597, 0.5292721 , 0.32940959,
       0.16854021, 0.52222192, 0.43971427, 0.81403718, 0.44781021,
       0.85730483, 0.39736082, 0.18926589, 0.75257127, 0.68990718,
       0.44828279, 0.30294129, 0.36063242, 0.62071732, 0.66884483,
       0.47527581, 0.78761978, 0.78361861, 0.40588454, 0.78609673,
       0.3708207 , 0.52029288, 0.20626834, 0.74898117, 0.74590996,
       0.37771121, 0.76909093, 0.8097293 , 0.33431068, 0.82349

## 评价

In [52]:
# eval
print('The rmse of prediction is:', roc_auc_score(y_test, y_pred) ** 0.5)

The rmse of prediction is: 0.9017187034918585


## 相关资料
LightGBM 如何调参：https://blog.csdn.net/aliceyangxi1987/article/details/80711014