#### Category Boosting (CatBoost)
##### Kategorik değişkenler ile otomatik olarak mücadele edebilen hızlı, başarılı bir diğer GBM türevidir.
###### Kategorik değişken desteği
###### Hızlı ve ölçeklenebilir GPU desteği
###### Daha başarılı tahminler
###### Hızlı train ve hızlı tahmin

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import scale
from sklearn.preprocessing import StandardScaler
from sklearn import model_selection
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn import neighbors
from sklearn.svm import SVR

In [2]:
import warnings 
warnings.filterwarnings('ignore')

In [3]:
df = pd.read_csv("Hitters.csv")
df = df.dropna()
dms = pd.get_dummies(df[["League", "Division", "NewLeague"]])
y = df["Salary"]
X_ = df.drop(["Salary", "League", "Division", "NewLeague"], axis=1).astype("float64")
X = pd.concat([X_, dms[["League_N", "Division_W", "NewLeague_N"]]], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.25,
                                                    random_state=42)

In [4]:
!pip install catboost

Collecting catboost
  Downloading catboost-1.1.1-cp310-none-win_amd64.whl (73.9 MB)
     -------------------------------------- 73.9/73.9 MB 489.0 kB/s eta 0:00:00
Collecting graphviz
  Downloading graphviz-0.20.1-py3-none-any.whl (47 kB)
     -------------------------------------- 47.0/47.0 kB 782.9 kB/s eta 0:00:00
Collecting plotly
  Downloading plotly-5.11.0-py2.py3-none-any.whl (15.3 MB)
     -------------------------------------- 15.3/15.3 MB 472.0 kB/s eta 0:00:00
Collecting tenacity>=6.2.0
  Downloading tenacity-8.1.0-py3-none-any.whl (23 kB)
Installing collected packages: tenacity, graphviz, plotly, catboost
Successfully installed catboost-1.1.1 graphviz-0.20.1 plotly-5.11.0 tenacity-8.1.0


In [5]:
from catboost import CatBoostRegressor

In [6]:
catb_model = CatBoostRegressor().fit(X_train, y_train)

Learning rate set to 0.031674
0:	learn: 437.6430699	total: 152ms	remaining: 2m 31s
1:	learn: 431.3923642	total: 157ms	remaining: 1m 18s
2:	learn: 424.8820360	total: 162ms	remaining: 53.9s
3:	learn: 418.2514904	total: 165ms	remaining: 41s
4:	learn: 412.6394021	total: 167ms	remaining: 33.2s
5:	learn: 406.6247020	total: 169ms	remaining: 28s
6:	learn: 400.5321206	total: 171ms	remaining: 24.3s
7:	learn: 394.6683437	total: 173ms	remaining: 21.5s
8:	learn: 388.2496484	total: 176ms	remaining: 19.4s
9:	learn: 382.9448842	total: 178ms	remaining: 17.6s
10:	learn: 377.2600080	total: 180ms	remaining: 16.2s
11:	learn: 372.4829606	total: 182ms	remaining: 15s
12:	learn: 366.6823437	total: 185ms	remaining: 14s
13:	learn: 362.6076230	total: 187ms	remaining: 13.1s
14:	learn: 358.0107745	total: 189ms	remaining: 12.4s
15:	learn: 353.2802665	total: 191ms	remaining: 11.7s
16:	learn: 348.5646265	total: 192ms	remaining: 11.1s
17:	learn: 343.6407912	total: 194ms	remaining: 10.6s
18:	learn: 339.2363847	total: 19

In [8]:
y_pred = catb_model.predict(X_test)

In [9]:
np.sqrt(mean_squared_error(y_test, y_pred))

351.194631344607

#### Model Tuning

In [11]:
catb_params = {"iterations": [200, 500, 1000],
               "learning_rate": [0.01, 0.1],
               "depth": [3, 6, 8]}

In [12]:
catb_model = CatBoostRegressor()

In [13]:
catb_cv_model = GridSearchCV(catb_model, catb_params, cv=10, n_jobs=-1, verbose=2).fit(X_train, y_train)

Fitting 10 folds for each of 18 candidates, totalling 180 fits
0:	learn: 425.7900818	total: 579us	remaining: 115ms
1:	learn: 404.8723520	total: 1.23ms	remaining: 122ms
2:	learn: 387.4057666	total: 1.66ms	remaining: 109ms
3:	learn: 372.2801584	total: 2.1ms	remaining: 103ms
4:	learn: 358.9204229	total: 2.52ms	remaining: 98.5ms
5:	learn: 347.0083933	total: 2.93ms	remaining: 94.8ms
6:	learn: 336.0130818	total: 3.35ms	remaining: 92.3ms
7:	learn: 324.3923300	total: 3.76ms	remaining: 90.3ms
8:	learn: 314.8690957	total: 4.18ms	remaining: 88.7ms
9:	learn: 308.5075563	total: 4.59ms	remaining: 87.3ms
10:	learn: 298.8587285	total: 5ms	remaining: 85.9ms
11:	learn: 294.7655438	total: 5.42ms	remaining: 85ms
12:	learn: 288.0697862	total: 5.84ms	remaining: 84ms
13:	learn: 282.6697154	total: 6.25ms	remaining: 83.1ms
14:	learn: 277.6121667	total: 6.66ms	remaining: 82.1ms
15:	learn: 273.4383979	total: 7.09ms	remaining: 81.5ms
16:	learn: 269.1556201	total: 7.5ms	remaining: 80.8ms
17:	learn: 264.8098704	tot

In [14]:
catb_cv_model.best_params_

{'depth': 3, 'iterations': 200, 'learning_rate': 0.1}

In [19]:
catb_tuned = CatBoostRegressor(depth=3, iterations=500, learning_rate=0.1).fit(X_train, y_train)

0:	learn: 425.7900818	total: 663us	remaining: 331ms
1:	learn: 404.8723520	total: 1.35ms	remaining: 338ms
2:	learn: 387.4057666	total: 1.73ms	remaining: 286ms
3:	learn: 372.2801584	total: 2.19ms	remaining: 272ms
4:	learn: 358.9204229	total: 2.71ms	remaining: 268ms
5:	learn: 347.0083933	total: 3.15ms	remaining: 259ms
6:	learn: 336.0130818	total: 3.87ms	remaining: 272ms
7:	learn: 324.3923300	total: 4.56ms	remaining: 280ms
8:	learn: 314.8690957	total: 5.83ms	remaining: 318ms
9:	learn: 308.5075563	total: 6.68ms	remaining: 327ms
10:	learn: 298.8587285	total: 7.26ms	remaining: 323ms
11:	learn: 294.7655438	total: 7.89ms	remaining: 321ms
12:	learn: 288.0697862	total: 8.4ms	remaining: 315ms
13:	learn: 282.6697154	total: 8.89ms	remaining: 309ms
14:	learn: 277.6121667	total: 9.43ms	remaining: 305ms
15:	learn: 273.4383979	total: 9.87ms	remaining: 298ms
16:	learn: 269.1556201	total: 10.3ms	remaining: 294ms
17:	learn: 264.8098704	total: 10.8ms	remaining: 290ms
18:	learn: 261.6700768	total: 11.2ms	rem

In [21]:
y_pred = catb_tuned.predict(X_test)

In [22]:
np.sqrt(mean_squared_error(y_test, y_pred))

336.40041748521486