TLNR;

* For binary/two-class classifier, the raw_score is confirmed to be log odds.
* For multiclassova classifier, the raw_score is also confirmed to be log odds. Note, one-vs-all classification is effectively binary classification as well.
* For multiclass classifier, the raw_score is confirmed to be logits (equivalent to log odds in binary classification case), which are converted to probabilites via softmax.

Therefore, raw scores are logits in all three cases.

In [1]:
import itertools
import pickle
import math

import lightgbm
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn.datasets

In [2]:
import lightgbm

# train an XGBoost model
# X, y = shap.datasets.boston()
data = sklearn.datasets.load_iris(as_frame=True)

In [3]:
df_data = data["frame"].loc[lambda df: df.target < 3].sample(100).reset_index(drop=True)

In [4]:
df_data.head(2)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.7,2.8,4.5,1.3,1
1,5.5,3.5,1.3,0.2,0


In [5]:
X, y = df_data.drop(columns='target'), df_data['target']

In [6]:
y.unique()

array([1, 0, 2])

### Confirm binary raw scores are log odds

It's a BAD idea to use binary classfication for iris dataset since y has three values. But here we still employ binary only for pedagogical purposes.

In [7]:
model = lightgbm.LGBMModel(n_estimators=2, objective='binary', max_depth=2).fit(X, y)

In [8]:
predictions = model.predict(X[:5])
predictions

array([0.72293792, 0.53674649, 0.53674649, 0.72293792, 0.72293792])

In [9]:
log_odds = model.predict(X[:5], raw_score=True)
odds = np.exp(log_odds)
probs = odds / (1 + odds)

In [10]:
np.testing.assert_allclose(predictions, probs)

### Confirm multiclassova raw scores are log odds

In [11]:
model = lightgbm.LGBMModel(n_estimators=2, objective='multiclassova', max_depth=2, num_classes=3).fit(X, y)

In [12]:
predictions = model.predict(X[:5])
predictions

array([[0.27706208, 0.45267105, 0.26609845],
       [0.46325351, 0.28484568, 0.25262994],
       [0.46325351, 0.36019921, 0.25262994],
       [0.27706208, 0.45267105, 0.28066774],
       [0.27706208, 0.45267105, 0.26672981]])

In [13]:
log_odds = model.predict(X[:5], raw_score=True)
odds = np.exp(log_odds)
probs = odds / (1 + odds)

In [14]:
np.testing.assert_allclose(predictions, probs)

# Confirm multiclass raw_score are logits

Exponentials are then converted to probabilites via softmax.

In [15]:
model = lightgbm.LGBMModel(n_estimators=2, objective='multiclass', max_depth=2, num_classes=3).fit(X, y)

In [16]:
predictions = model.predict(X[:5])
predictions

array([[0.27545769, 0.45940386, 0.26513845],
       [0.46919102, 0.28005215, 0.25075684],
       [0.43757432, 0.32856622, 0.23385945],
       [0.27187761, 0.45343305, 0.27468934],
       [0.27527264, 0.45923192, 0.26549544]])

In [17]:
# aka. logits, it's similar to log odds in binary classification, but less interpretable in terms of odds.
logits = model.predict(X[:5], raw_score=True) 

In [18]:
# Softmax
probs = (np.exp(logits).T / np.exp(logits).sum(axis=1)).T

In [19]:
np.testing.assert_allclose(predictions, probs)