## <strong>Catboostでnumeraiに初めての予測を提出しよう</strong>

このnotebookはKaggleでもメジャーな勾配ブースティングライブラリの一つの
Catboostを使ってnumeraiの初めての投資予測をして、提出までしていきます。

In [1]:
# 必要なライブラリのinstall
!pip install numerapi
!pip install catboost

Collecting numerapi
  Downloading numerapi-2.9.2-py3-none-any.whl (25 kB)
Installing collected packages: numerapi
Successfully installed numerapi-2.9.2
Collecting catboost
  Downloading catboost-1.0.0-cp37-none-manylinux1_x86_64.whl (76.4 MB)
[K     |████████████████████████████████| 76.4 MB 45 kB/s 
Installing collected packages: catboost
Successfully installed catboost-1.0.0


## 必要なライブラリのimport

In [2]:
import os
import numpy as np

# numeraiのデータのやり取りをスムーズにするためのモジュール
import numerapi
import pandas as pd

# 回帰を使うので、CatBoostRegressorを呼び出す
from catboost import CatBoostRegressor

## numerapiを使ってデータセットのダウンロード

In [3]:
# numerapiを使えばデータセットのダウンロードが簡単にできる
#インスタンス化（numerapiを使うための準備）
napi = numerapi.NumerAPI(verbosity="info")

# 現在のラウンドのデータセットをダウンロードして解凍する。
napi.download_current_dataset(unzip=True)

2021-10-09 18:00:29,754 INFO numerapi.utils: starting download
./numerai_dataset_285.zip: 425MB [00:11, 38.1MB/s]                           
2021-10-09 18:00:40,917 INFO numerapi.base_api: unzipping file...


'./numerai_dataset_285.zip'

## 準備：トーナメントの現在のラウンド数を取得

In [4]:
# numerai_dataset_321/numerai_training_data.csv でトレーニングデータのファイル名
# numerai_dataset_321/numerai_tournament_data.csv でトーナメントデータのファイル名

# まずは現在のトーナメントのラウンド数を取得(int型)
current_ds = napi.get_current_round()
print(current_ds)

# ここはnumerai_dataset_321のようなパスを得るため
latest_round = os.path.join('numerai_dataset_'+str(current_ds))
print(latest_round)

285
numerai_dataset_285


## トレーニングデータとトーナメントデータの読み込み

In [5]:
print("# データの読み込み中...")
# トレーニングデータをCSVから読み込む。　set_indexでどの列をindexにするか？を決める
training_data = pd.read_csv(os.path.join(latest_round, "numerai_training_data.csv")).set_index("id")

# トーナメントデータをCSVから読み込む。
tournament_data = pd.read_csv(os.path.join(latest_round, "numerai_tournament_data.csv")).set_index("id")

feature_names = [f for f in training_data.columns if "feature" in f]

# データの読み込み中...


## データの確認

In [6]:
training_data.head()
training_data.shape

(501808, 313)

In [7]:
tournament_data.tail()
tournament_data

Unnamed: 0_level_0,era,data_type,feature_intelligence1,feature_intelligence2,feature_intelligence3,feature_intelligence4,feature_intelligence5,feature_intelligence6,feature_intelligence7,feature_intelligence8,feature_intelligence9,feature_intelligence10,feature_intelligence11,feature_intelligence12,feature_charisma1,feature_charisma2,feature_charisma3,feature_charisma4,feature_charisma5,feature_charisma6,feature_charisma7,feature_charisma8,feature_charisma9,feature_charisma10,feature_charisma11,feature_charisma12,feature_charisma13,feature_charisma14,feature_charisma15,feature_charisma16,feature_charisma17,feature_charisma18,feature_charisma19,feature_charisma20,feature_charisma21,feature_charisma22,feature_charisma23,feature_charisma24,feature_charisma25,feature_charisma26,...,feature_wisdom8,feature_wisdom9,feature_wisdom10,feature_wisdom11,feature_wisdom12,feature_wisdom13,feature_wisdom14,feature_wisdom15,feature_wisdom16,feature_wisdom17,feature_wisdom18,feature_wisdom19,feature_wisdom20,feature_wisdom21,feature_wisdom22,feature_wisdom23,feature_wisdom24,feature_wisdom25,feature_wisdom26,feature_wisdom27,feature_wisdom28,feature_wisdom29,feature_wisdom30,feature_wisdom31,feature_wisdom32,feature_wisdom33,feature_wisdom34,feature_wisdom35,feature_wisdom36,feature_wisdom37,feature_wisdom38,feature_wisdom39,feature_wisdom40,feature_wisdom41,feature_wisdom42,feature_wisdom43,feature_wisdom44,feature_wisdom45,feature_wisdom46,target
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
n0003aa52cab36c2,era121,validation,0.25,0.75,0.50,0.50,0.00,0.75,0.50,0.25,0.50,0.50,0.25,0.00,0.25,0.50,0.25,0.00,0.25,1.00,1.00,0.25,1.00,1.00,0.25,0.25,0.00,0.50,0.25,0.75,0.00,0.50,0.25,0.25,0.25,0.50,0.00,0.50,1.00,0.25,...,0.00,0.00,0.25,0.50,0.25,0.25,0.00,0.25,0.00,0.25,0.50,0.50,0.50,0.50,0.00,0.25,0.75,0.25,0.25,0.50,0.25,0.00,0.25,0.50,0.25,0.50,0.25,0.25,1.00,0.75,0.75,0.75,1.00,0.75,0.50,0.50,1.00,0.00,0.00,0.25
n000920ed083903f,era121,validation,0.75,0.50,0.75,1.00,0.50,0.00,0.00,0.75,0.25,0.00,0.75,0.50,0.00,0.25,0.50,0.00,1.00,0.25,0.25,1.00,1.00,0.25,0.75,0.00,0.00,0.75,1.00,1.00,0.00,0.25,0.00,0.00,0.25,0.25,0.25,0.00,1.00,0.25,...,0.50,0.50,0.25,1.00,0.50,0.25,0.00,0.25,0.50,0.25,1.00,0.25,0.00,0.50,0.75,0.75,0.50,1.00,1.00,0.25,0.50,0.25,0.50,0.50,0.50,0.50,0.25,0.25,0.75,0.50,0.50,0.50,0.75,1.00,0.75,0.50,0.50,0.50,0.50,0.50
n0038e640522c4a6,era121,validation,1.00,0.00,0.00,1.00,1.00,1.00,1.00,1.00,0.50,0.50,1.00,1.00,1.00,0.75,0.50,0.50,1.00,1.00,0.50,0.50,0.00,1.00,0.50,1.00,0.50,1.00,0.50,1.00,0.25,1.00,1.00,1.00,0.50,1.00,1.00,0.75,1.00,1.00,...,0.25,0.50,0.00,0.00,0.00,0.25,0.25,0.00,0.50,0.00,0.00,0.00,0.25,0.00,0.25,0.50,0.00,0.00,0.00,0.00,0.00,0.00,0.50,0.00,0.75,0.00,0.00,0.25,0.00,0.00,0.00,0.00,0.50,0.25,0.00,0.00,0.50,0.50,0.00,1.00
n004ac94a87dc54b,era121,validation,0.75,1.00,1.00,0.50,0.00,0.00,0.00,0.50,0.75,1.00,0.75,0.00,0.50,0.00,0.50,0.75,0.50,0.75,0.25,0.75,0.25,0.75,0.25,0.75,1.00,0.50,0.50,0.75,0.50,1.00,0.50,0.25,0.75,0.25,0.75,0.25,0.75,0.75,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.25,0.00,0.25,0.00,0.00,0.25,0.00,0.00,0.00,0.00,0.75,0.00,0.00,0.25,0.25,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.25,0.00,0.00,0.00,0.25,0.25,0.50
n0052fe97ea0c05f,era121,validation,0.25,0.50,0.50,0.25,1.00,0.50,0.50,0.25,0.25,0.50,0.50,1.00,1.00,1.00,1.00,0.75,0.50,0.50,0.50,0.75,0.00,0.00,0.00,0.25,0.00,0.00,0.75,0.25,1.00,0.25,1.00,0.75,0.00,1.00,0.75,0.75,0.75,0.25,...,0.00,0.50,0.50,0.00,0.75,0.50,0.75,0.25,0.25,0.25,0.00,0.25,0.50,0.25,1.00,1.00,1.00,0.00,0.25,0.00,0.00,0.25,0.25,0.75,1.00,1.00,0.75,0.75,0.50,0.50,0.50,0.75,0.00,0.00,0.75,1.00,0.00,0.25,1.00,0.75
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
nffe489171b01328,eraX,live,0.50,1.00,0.75,0.25,0.00,1.00,1.00,0.00,0.75,0.75,0.25,0.00,0.75,0.50,0.50,0.25,0.50,1.00,0.00,0.50,0.50,1.00,0.00,0.50,0.75,0.75,0.50,0.75,1.00,0.00,0.50,0.25,0.00,0.75,0.50,0.25,0.75,0.50,...,1.00,0.75,1.00,0.50,1.00,0.25,1.00,1.00,0.00,0.00,0.00,1.00,0.75,0.00,1.00,1.00,0.75,0.00,1.00,1.00,0.00,0.00,0.00,1.00,1.00,1.00,0.75,0.75,1.00,1.00,1.00,1.00,0.00,0.00,1.00,1.00,0.00,0.00,1.00,
nffec0c25d39b4a1,eraX,live,0.00,0.25,0.25,0.00,0.25,0.00,0.00,0.00,0.50,0.25,0.25,0.25,0.25,1.00,0.25,0.75,0.50,0.25,0.00,1.00,0.25,0.50,1.00,1.00,0.75,1.00,1.00,0.75,0.50,0.50,0.50,0.25,0.75,0.75,1.00,1.00,0.25,0.50,...,1.00,0.75,1.00,0.50,1.00,1.00,1.00,0.75,1.00,1.00,1.00,1.00,1.00,1.00,0.50,0.00,0.25,0.75,0.75,1.00,0.75,1.00,1.00,0.75,0.00,0.00,1.00,1.00,0.75,0.75,0.75,0.75,1.00,1.00,0.75,0.25,0.75,1.00,1.00,
nfff0d461c3b7bac,eraX,live,0.75,0.00,0.00,1.00,0.75,0.75,0.75,1.00,0.00,0.00,0.75,0.75,0.00,0.00,0.25,0.00,0.00,0.00,0.50,0.50,1.00,0.25,0.25,0.25,1.00,0.25,0.50,0.00,0.25,0.00,0.25,0.00,0.00,0.00,0.00,0.00,0.25,0.00,...,0.50,0.00,0.00,1.00,0.00,0.00,0.00,0.50,0.00,0.25,0.75,0.25,0.00,1.00,0.00,0.25,1.00,0.75,0.75,0.50,1.00,0.25,0.00,0.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,0.00,0.50,1.00,0.00,1.00,0.25,0.00,0.00,
nfff12c576ab5df9,eraX,live,0.25,0.25,0.25,0.50,0.50,0.00,0.00,0.50,0.25,0.50,0.50,0.25,0.50,0.50,0.50,1.00,0.75,0.00,0.50,0.50,0.50,0.75,0.25,0.75,1.00,0.50,0.50,0.50,1.00,0.50,0.75,0.75,0.50,0.75,0.50,0.50,0.50,0.50,...,1.00,1.00,1.00,1.00,0.25,1.00,1.00,1.00,0.50,1.00,1.00,1.00,1.00,1.00,1.00,0.50,1.00,1.00,1.00,0.75,1.00,1.00,0.50,1.00,0.50,1.00,1.00,1.00,1.00,1.00,1.00,1.00,0.50,1.00,1.00,1.00,0.75,0.50,0.50,


In [8]:
feature_names
len(feature_names)

310

## モデルのトレーニング

In [9]:
# GPUの指定
params = {
    'task_type': 'GPU'
    }
# モデルのインスタンス化（準備）
model = CatBoostRegressor(**params)

# モデルのトレーニング　model.fit(X, Y)  ⇨Xは特徴量、Yが予測したい値（列）
model.fit(training_data[feature_names], training_data["target"])

Learning rate set to 0.091218
0:	learn: 0.2232516	total: 5.42ms	remaining: 5.42s
1:	learn: 0.2232357	total: 10.1ms	remaining: 5.03s
2:	learn: 0.2232236	total: 14.8ms	remaining: 4.92s
3:	learn: 0.2232105	total: 19.6ms	remaining: 4.89s
4:	learn: 0.2231978	total: 24.5ms	remaining: 4.88s
5:	learn: 0.2231838	total: 29.3ms	remaining: 4.85s
6:	learn: 0.2231688	total: 34.1ms	remaining: 4.83s
7:	learn: 0.2231556	total: 38.9ms	remaining: 4.82s
8:	learn: 0.2231430	total: 43.7ms	remaining: 4.81s
9:	learn: 0.2231310	total: 48.4ms	remaining: 4.79s
10:	learn: 0.2231234	total: 53.2ms	remaining: 4.78s
11:	learn: 0.2231117	total: 57.9ms	remaining: 4.77s
12:	learn: 0.2231014	total: 62.5ms	remaining: 4.75s
13:	learn: 0.2230909	total: 67.4ms	remaining: 4.74s
14:	learn: 0.2230810	total: 72ms	remaining: 4.73s
15:	learn: 0.2230719	total: 76.8ms	remaining: 4.72s
16:	learn: 0.2230619	total: 81.5ms	remaining: 4.71s
17:	learn: 0.2230523	total: 86.4ms	remaining: 4.71s
18:	learn: 0.2230432	total: 91ms	remaining: 4.

<catboost.core.CatBoostRegressor at 0x7fabff7fbcd0>

## 予測をする

In [10]:
predictions = model.predict(tournament_data[feature_names])

In [11]:

# 予測結果をデータフレームの予測列とした。
tournament_data['prediction'] = predictions

In [12]:
# トーナメント名
TOURNAMENT_NAME = "nomi"

In [13]:
tournament_data['prediction'].to_csv(f"{TOURNAMENT_NAME}_{current_ds}_submission.csv")

## 予測をAPIキーを使って提出しよう

In [14]:
# APIキーの設定
public_id = "AVLBKJEQ4WRT34RAVZPJA4KDLC3GJPNH"
secret_key = "WFELIUJBBTA3UKWYH4B36SME6WXMIYSOVJZBGSFK3W6ZSS4BYW3TDIGEFCFTOZQM"
model_id = "335b5f2e-8647-45cf-adb3-e6754fb393c1"
napi = numerapi.NumerAPI(public_id=public_id, secret_key=secret_key)

In [15]:
# 予測の提出
submission_id = napi.upload_predictions(f"{TOURNAMENT_NAME}_{current_ds}_submission.csv", model_id=model_id)

2021-10-09 18:02:27,296 INFO numerapi.base_api: uploading predictions...
