# 🌳 LGBM

LGBM（Light Gradient Boosting Machine），轻量梯度提升树，由Microsoft开发。

官网：https://lightgbm.readthedocs.io/en/stable/

安装：`pip install lightgbm`

In [1]:
# 导入加利福尼亚房价数据
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split as TTS

data = fetch_california_housing()
print(data.keys())

# 划分数据集
x = data['data']
y = data['target']
train_x, test_x, train_y, test_y = TTS(x, y, test_size=0.3, random_state=22)

dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])


## 壹丨简单使用

In [6]:
import lightgbm as lgb

# 需要将数据转换成LGBM的数据格式
train_data = lgb.Dataset(train_x, train_y)

param = {'seed': 22}
reg = lgb.train(param, train_data)

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000553 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1838
[LightGBM] [Info] Number of data points in the train set: 14448, number of used features: 8
[LightGBM] [Info] Start training from score 2.070666


在LGBM中，使用直方图计算分枝方式，会按照行和列方向计算，没有设置对应参数时，会选择运行更快的分枝方式

In [7]:
# 设置分支方式
param = {'seed': 22, 'force_col_wise': True}
reg = lgb.train(param, train_data, num_boost_round=10)

[LightGBM] [Info] Total Bins 1838
[LightGBM] [Info] Number of data points in the train set: 14448, number of used features: 8
[LightGBM] [Info] Start training from score 2.070666


In [8]:
# 预测
from sklearn.metrics import mean_squared_error as mse

pred = reg.predict(test_x)
mse(test_y, pred, squared=False)

0.6860233557436403

In [9]:
# 交叉验证
# stratified用于分类任务中均衡kfold中数据的分布
param = {'seed': 22, 'metric': 'rmse', 'force_col_wise': True}
result = lgb.cv(param, train_data, nfold=5, num_boost_round=10, seed=22, stratified=False)

[LightGBM] [Info] Total Bins 1838
[LightGBM] [Info] Number of data points in the train set: 11556, number of used features: 8
[LightGBM] [Info] Total Bins 1838
[LightGBM] [Info] Number of data points in the train set: 11556, number of used features: 8
[LightGBM] [Info] Total Bins 1838
[LightGBM] [Info] Number of data points in the train set: 11556, number of used features: 8
[LightGBM] [Info] Total Bins 1838
[LightGBM] [Info] Number of data points in the train set: 11556, number of used features: 8
[LightGBM] [Info] Total Bins 1838
[LightGBM] [Info] Number of data points in the train set: 11556, number of used features: 8
[LightGBM] [Info] Start training from score 2.070080
[LightGBM] [Info] Start training from score 2.067682
[LightGBM] [Info] Start training from score 2.067000
[LightGBM] [Info] Start training from score 2.071840
[LightGBM] [Info] Start training from score 2.075814
