# LightGBMの学習速度
- CPU
```
[LightGBM] [Info] Number of positive: 35018, number of negative: 34982
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.366908 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 255000
[LightGBM] [Info] Number of data points in the train set: 70000, number of used features: 1000
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500257 -> initscore=0.001029
[LightGBM] [Info] Start training from score 0.001029
INFO:__main__:Elapsed Time: 351.84 sec
INFO:__main__:Validation Metric: 0.11437848050873242
```

```
[LightGBM] [Info] Number of positive: 35018, number of negative: 34982
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 255000
[LightGBM] [Info] Number of data points in the train set: 70000, number of used features: 1000
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: Intel(R) UHD Graphics 770, Vendor: Intel(R) Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 1000 dense feature groups (66.76 MB) transferred to GPU in 0.020182 secs. 0 sparse feature groups
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500257 -> initscore=0.001029
[LightGBM] [Info] Start training from score 0.001029
INFO:__main__:Elapsed Time: 407.22 sec
INFO:__main__:Validation Metric: 0.11342810052390678
```

- GPU
```
[LightGBM] [Info] Number of positive: 35018, number of negative: 34982
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 255000
[LightGBM] [Info] Number of data points in the train set: 70000, number of used features: 1000
[LightGBM] [Info] Using requested OpenCL platform 1 device 0
[LightGBM] [Info] Using GPU Device: NVIDIA GeForce RTX 3090, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 1000 dense feature groups (66.76 MB) transferred to GPU in 0.024623 secs. 0 sparse feature groups
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500257 -> initscore=0.001029
[LightGBM] [Info] Start training from score 0.001029
INFO:__main__:Elapsed Time: 250.90 sec
INFO:__main__:Validation Metric: 0.11437778313272819
```

- CatBoost(GPU)
```
INFO:__main__:Elapsed Time: 57.76 sec
4999:	learn: 0.0580361	test: 0.0991410	best: 0.0991401 (4998)	total: 55.2s	remaining: 0us
bestTest = 0.09914007161
bestIteration = 4998
INFO:__main__:Validation Metric: 0.09914105790309861
```

In [10]:
import sys
import time
import logging
from contextlib import contextmanager

import lightgbm as lgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss

from catboost import CatBoostClassifier, Pool



In [2]:
## cpuでの実行

In [3]:
LOGGER = logging.getLogger(__name__)


@contextmanager
def timeit():
    """処理にかかった時間を計測してログに出力するコンテキストマネージャ"""
    start = time.time()
    yield
    end = time.time()
    elapsed = end - start
    LOGGER.info(f'Elapsed Time: {elapsed:.2f} sec')


def main():
    logging.basicConfig(level=logging.INFO,
                        stream=sys.stderr,
                        )

    # 疑似的な教師信号を作るためのパラメータ
    dist_args = {
        # データ点数
        'n_samples': 100_000,
        # 次元数
        'n_features': 1_000,
        # その中で意味のあるもの
        'n_informative': 100,
        # 重複や繰り返しはなし
        'n_redundant': 0,
        'n_repeated': 0,
        # タスクの難易度
        'class_sep': 0.65,
        # 二値分類問題
        'n_classes': 2,
        # 生成に用いる乱数
        'random_state': 42,
        # 特徴の順序をシャッフルしない (先頭の次元が informative になる)
        'shuffle': False,
    }
    # 教師データを作る
    train_x, train_y = make_classification(**dist_args)
    # データセットを学習用と検証用に分割する
    x_tr, x_val, y_tr, y_val = train_test_split(train_x, train_y,
                                                test_size=0.3,
                                                shuffle=True,
                                                random_state=42,
                                                stratify=train_y)
    # CatBoost が扱うデータセットの形式に直す
    train_pool = lgb.Dataset(x_tr, label=y_tr)
    valid_pool = lgb.Dataset(x_val, label=y_val)
    # 学習用のパラメータ
    params = {
        # タスク設定と損失関数
        'objective': 'binary',
        # 学習率
        'learning_rate': 0.02,
        # 学習ラウンド数
        'num_boost_round': 5_000,
        # 検証用データの損失が既定ラウンド数減らなかったら学習を打ち切る
        # NOTE: ラウンド数を揃えたいので今回は使わない
        # 'early_stopping_rounds': 100,
        # 乱数シード
        'random_state': 42,
        # 学習に GPU を使う場合
        # 'device': 'gpu',
    }
    # モデルを学習する
    with timeit():
        model = lgb.train(params, 
                          train_pool,
                          valid_sets=[valid_pool]
                          )

    # 検証用データを分類する
    y_pred = model.predict(x_val)
    # ロジスティック損失を確認する
    metric = log_loss(y_val, y_pred)
    LOGGER.info(f'Validation Metric: {metric}')


main()

[LightGBM] [Info] Number of positive: 35018, number of negative: 34982
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.366908 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 255000
[LightGBM] [Info] Number of data points in the train set: 70000, number of used features: 1000
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500257 -> initscore=0.001029
[LightGBM] [Info] Start training from score 0.001029


INFO:__main__:Elapsed Time: 351.84 sec
INFO:__main__:Validation Metric: 0.11437848050873242


In [4]:
## GPUの実行

In [5]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

LOGGER = logging.getLogger(__name__)


@contextmanager
def timeit():
    """処理にかかった時間を計測してログに出力するコンテキストマネージャ"""
    start = time.time()
    yield
    end = time.time()
    elapsed = end - start
    LOGGER.info(f'Elapsed Time: {elapsed:.2f} sec')


def main():
    logging.basicConfig(level=logging.INFO,
                        stream=sys.stderr,
                        )

    # 疑似的な教師信号を作るためのパラメータ
    dist_args = {
        # データ点数
        'n_samples': 100_000,
        # 次元数
        'n_features': 1_000,
        # その中で意味のあるもの
        'n_informative': 100,
        # 重複や繰り返しはなし
        'n_redundant': 0,
        'n_repeated': 0,
        # タスクの難易度
        'class_sep': 0.65,
        # 二値分類問題
        'n_classes': 2,
        # 生成に用いる乱数
        'random_state': 42,
        # 特徴の順序をシャッフルしない (先頭の次元が informative になる)
        'shuffle': False,
    }
    # 教師データを作る
    train_x, train_y = make_classification(**dist_args)
    # データセットを学習用と検証用に分割する
    x_tr, x_val, y_tr, y_val = train_test_split(train_x, train_y,
                                                test_size=0.3,
                                                shuffle=True,
                                                random_state=42,
                                                stratify=train_y)
    # CatBoost が扱うデータセットの形式に直す
    train_pool = lgb.Dataset(x_tr, label=y_tr)
    valid_pool = lgb.Dataset(x_val, label=y_val)
    # 学習用のパラメータ
    params = {
        # タスク設定と損失関数
        'objective': 'binary',
        # 学習率
        'learning_rate': 0.02,
        # 学習ラウンド数
        'num_boost_round': 5_000,
        # 検証用データの損失が既定ラウンド数減らなかったら学習を打ち切る
        # NOTE: ラウンド数を揃えたいので今回は使わない
        # 'early_stopping_rounds': 100,
        # 乱数シード
        'random_state': 42,
        # 学習に GPU を使う場合
        'device': 'gpu',
        'gpu_platform_id': 0,
        'gpu_device_id': 0,
    }
    # モデルを学習する
    #model = lgb.train(params, train_pool,valid_sets=[valid_pool])
    with timeit():
        model = lgb.train(params, 
                          train_pool,
                          valid_sets=[valid_pool], 
                          # verbose_eval=100
                          )
    model.save_model('model_gpu.txt'
                     #, num_iteration=model.best_iteration
                     )
        #model.fit(train_pool,
        #          eval_set=valid_pool,
        #          verbose_eval=100,
        #          use_best_model=True,
        #          )
    # 検証用データを分類する
    y_pred = model.predict(x_val)
    # ロジスティック損失を確認する
    metric = log_loss(y_val, y_pred)
    LOGGER.info(f'Validation Metric: {metric}')



main()


[LightGBM] [Info] Number of positive: 35018, number of negative: 34982
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 255000
[LightGBM] [Info] Number of data points in the train set: 70000, number of used features: 1000
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: Intel(R) UHD Graphics 770, Vendor: Intel(R) Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 1000 dense feature groups (66.76 MB) transferred to GPU in 0.020182 secs. 0 sparse feature groups
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500257 -> initscore=0.001029
[LightGBM] [Info] Start training from score 0.001029


INFO:__main__:Elapsed Time: 407.22 sec
INFO:__main__:Validation Metric: 0.11342810052390678


In [6]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

LOGGER = logging.getLogger(__name__)


@contextmanager
def timeit():
    """処理にかかった時間を計測してログに出力するコンテキストマネージャ"""
    start = time.time()
    yield
    end = time.time()
    elapsed = end - start
    LOGGER.info(f'Elapsed Time: {elapsed:.2f} sec')


def main():
    logging.basicConfig(level=logging.INFO,
                        stream=sys.stderr,
                        )

    # 疑似的な教師信号を作るためのパラメータ
    dist_args = {
        # データ点数
        'n_samples': 100_000,
        # 次元数
        'n_features': 1_000,
        # その中で意味のあるもの
        'n_informative': 100,
        # 重複や繰り返しはなし
        'n_redundant': 0,
        'n_repeated': 0,
        # タスクの難易度
        'class_sep': 0.65,
        # 二値分類問題
        'n_classes': 2,
        # 生成に用いる乱数
        'random_state': 42,
        # 特徴の順序をシャッフルしない (先頭の次元が informative になる)
        'shuffle': False,
    }
    # 教師データを作る
    train_x, train_y = make_classification(**dist_args)
    # データセットを学習用と検証用に分割する
    x_tr, x_val, y_tr, y_val = train_test_split(train_x, train_y,
                                                test_size=0.3,
                                                shuffle=True,
                                                random_state=42,
                                                stratify=train_y)
    # CatBoost が扱うデータセットの形式に直す
    train_pool = lgb.Dataset(x_tr, label=y_tr)
    valid_pool = lgb.Dataset(x_val, label=y_val)
    # 学習用のパラメータ
    params = {
        # タスク設定と損失関数
        'objective': 'binary',
        # 学習率
        'learning_rate': 0.02,
        # 学習ラウンド数
        'num_boost_round': 5_000,
        # 検証用データの損失が既定ラウンド数減らなかったら学習を打ち切る
        # NOTE: ラウンド数を揃えたいので今回は使わない
        # 'early_stopping_rounds': 100,
        # 乱数シード
        'random_state': 42,
        # 学習に GPU を使う場合
        'device': 'gpu',
        'gpu_platform_id': 1,
        'gpu_device_id': 0,
    }
    # モデルを学習する
    #model = lgb.train(params, train_pool,valid_sets=[valid_pool])
    with timeit():
        model = lgb.train(params, 
                          train_pool,
                          valid_sets=[valid_pool], 
                          # verbose_eval=100
                          )
    model.save_model('model_gpu.txt'
                     #, num_iteration=model.best_iteration
                     )
        #model.fit(train_pool,
        #          eval_set=valid_pool,
        #          verbose_eval=100,
        #          use_best_model=True,
        #          )
    # 検証用データを分類する
    y_pred = model.predict(x_val)
    # ロジスティック損失を確認する
    metric = log_loss(y_val, y_pred)
    LOGGER.info(f'Validation Metric: {metric}')



main()


[LightGBM] [Info] Number of positive: 35018, number of negative: 34982
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 255000
[LightGBM] [Info] Number of data points in the train set: 70000, number of used features: 1000
[LightGBM] [Info] Using requested OpenCL platform 1 device 0
[LightGBM] [Info] Using GPU Device: NVIDIA GeForce RTX 3090, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 1000 dense feature groups (66.76 MB) transferred to GPU in 0.024623 secs. 0 sparse feature groups
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500257 -> initscore=0.001029
[LightGBM] [Info] Start training from score 0.001029


INFO:__main__:Elapsed Time: 250.90 sec
INFO:__main__:Validation Metric: 0.11437778313272819


## catboost

In [11]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

LOGGER = logging.getLogger(__name__)

@contextmanager
def timeit():
    """処理にかかった時間を計測してログに出力するコンテキストマネージャ"""
    start = time.time()
    yield
    end = time.time()
    elapsed = end - start
    LOGGER.info(f'Elapsed Time: {elapsed:.2f} sec')

def main():
    logging.basicConfig(level=logging.INFO,
                      stream=sys.stderr,
                      )

    # 疑似的な教師信号を作るためのパラメータ
    dist_args = {
        # データ点数
        'n_samples': 100_000,
        # 次元数
        'n_features': 1_000,
        # その中で意味のあるもの
        'n_informative': 100,
        # 重複や繰り返しはなし
        'n_redundant': 0,
        'n_repeated': 0,
        # タスクの難易度
        'class_sep': 0.65,
        # 二値分類問題
        'n_classes': 2,
        # 生成に用いる乱数
        'random_state': 42,
        # 特徴の順序をシャッフルしない (先頭の次元が informative になる)
        'shuffle': False,
    }
    # 教師データを作る
    train_x, train_y = make_classification(**dist_args)
    # データセットを学習用と検証用に分割する
    x_tr, x_val, y_tr, y_val = train_test_split(train_x, train_y,
                                              test_size=0.3,
                                              shuffle=True,
                                              random_state=42,
                                              stratify=train_y)
    
    # CatBoost用のデータプール作成
    train_pool = Pool(x_tr, label=y_tr)
    valid_pool = Pool(x_val, label=y_val)

    # 学習用のパラメータ
    params = {
        # タスク設定と損失関数
        'loss_function': 'Logloss',
        # 学習率
        'learning_rate': 0.02,
        # 学習ラウンド数
        'iterations': 5_000,
        # 乱数シード
        'random_seed': 42,
        # 評価指標
        'eval_metric': 'Logloss',
        # GPU設定
        'task_type': 'GPU',
        'devices': '0:1',  # GPUデバイス指定 (プラットフォーム1、デバイス0)
        # その他最適化設定
        'bootstrap_type': 'Bernoulli',
        'subsample': 0.8,
        'depth': 6,
        'l2_leaf_reg': 3.0,
        'verbose': 100  # 100イテレーションごとに進捗表示
    }

    # モデルを学習する
    with timeit():
        model = CatBoostClassifier(**params)
        model.fit(train_pool,
                 eval_set=valid_pool,
                 use_best_model=False)  # early_stoppingを使わない設定

    # モデルを保存
    model.save_model('model_gpu.cbm')

    # 検証用データを分類する
    y_pred = model.predict_proba(x_val)[:, 1]  # クラス1の確率を取得
    # ロジスティック損失を確認する
    metric = log_loss(y_val, y_pred)
    LOGGER.info(f'Validation Metric: {metric}')

main()

0:	learn: 0.6917632	test: 0.6918459	best: 0.6918459 (0)	total: 25.7ms	remaining: 2m 8s
100:	learn: 0.5876281	test: 0.5928320	best: 0.5928320 (100)	total: 1.19s	remaining: 57.7s
200:	learn: 0.5150117	test: 0.5240579	best: 0.5240579 (200)	total: 2.33s	remaining: 55.5s
300:	learn: 0.4590706	test: 0.4709485	best: 0.4709485 (300)	total: 3.48s	remaining: 54.4s
400:	learn: 0.4139224	test: 0.4279887	best: 0.4279887 (400)	total: 4.62s	remaining: 53s
500:	learn: 0.3767624	test: 0.3929598	best: 0.3929598 (500)	total: 5.76s	remaining: 51.7s
600:	learn: 0.3452963	test: 0.3632068	best: 0.3632068 (600)	total: 6.94s	remaining: 50.8s
700:	learn: 0.3171706	test: 0.3365992	best: 0.3365992 (700)	total: 8.13s	remaining: 49.8s
800:	learn: 0.2929540	test: 0.3137095	best: 0.3137095 (800)	total: 9.26s	remaining: 48.6s
900:	learn: 0.2717450	test: 0.2937264	best: 0.2937264 (900)	total: 10.4s	remaining: 47.4s
1000:	learn: 0.2528601	test: 0.2759136	best: 0.2759136 (1000)	total: 11.6s	remaining: 46.2s
1100:	learn: 

INFO:__main__:Elapsed Time: 57.76 sec


4999:	learn: 0.0580361	test: 0.0991410	best: 0.0991401 (4998)	total: 55.2s	remaining: 0us
bestTest = 0.09914007161
bestIteration = 4998


INFO:__main__:Validation Metric: 0.09914105790309861


## OpenCLの確認

In [12]:
stop()

NameError: name 'stop' is not defined

In [7]:
import pyopencl as cl

# 全てのプラットフォームを取得
platforms = cl.get_platforms()

# 各プラットフォームの詳細を表示
for platform_id, platform in enumerate(platforms):
    print(f"Platform ID: {platform_id}, Name: {platform.name}")
    # 各プラットフォーム内のデバイスを取得
    devices = platform.get_devices()
    for device_id, device in enumerate(devices):
        print(f"  Device ID: {device_id}, Name: {device.name}, Type: {cl.device_type.to_string(device.type)}")

Platform ID: 0, Name: Intel(R) OpenCL Graphics
  Device ID: 0, Name: Intel(R) UHD Graphics 770, Type: ALL | GPU
Platform ID: 1, Name: NVIDIA CUDA
  Device ID: 0, Name: NVIDIA GeForce RTX 3090, Type: ALL | GPU


In [8]:
device

<pyopencl.Device 'NVIDIA GeForce RTX 3090' on 'NVIDIA CUDA' at 0x226a9ab0280>