In [1]:
!pip install river



In [2]:
!pip install -U numpy



* Use UCBRegressor to select the best learning rate for a linear regression model.
* The same pattern can be used more generally to select between any set of (online) regression models.

In [3]:
from river import compose
from river import linear_model
from river import preprocessing
from river import optim
models = [
          compose.Pipeline(
              preprocessing.StandardScaler(),
              linear_model.LinearRegression(optimizer=optim.SGD(lr=lr))
          )
          for lr in [1e-4, 1e-3, 1e-2, 1e-1]
]

* Build and evaluate models on the TrumpApproval dataset.

In [4]:
from river import datasets
dataset = datasets.TrumpApproval()

* Apply the UCB bandit, which calculates reward for regression models.

In [5]:
from river.expert import UCBRegressor
bandit = UCBRegressor(models=models, seed=1)

* The bandit provides methods to train its models in an online fashion.

In [6]:
for x, y in dataset:
  bandit = bandit.learn_one(x=x, y=y)

* Inspect the number of times (as a percentage) each arm has been pulled.
* The percentages for the four models are as follows.

In [7]:
for model, pct in zip(bandit.models, bandit.percentage_pulled):
  lr = model["LinearRegression"].optimizer.learning_rate
  print(f"{lr:.1e} — {pct:.2%}")

1.0e-04 — 2.45%
1.0e-03 — 2.45%
1.0e-02 — 92.25%
1.0e-01 — 2.85%


* Look at the average reward of each model.
* The reward is as follows.

In [8]:
for model, avg in zip(bandit.models, bandit.average_reward):
  lr = model["LinearRegression"].optimizer.learning_rate
  print(f"{lr:.1e} — {avg:.2f}")

1.0e-04 — 0.00
1.0e-03 — 0.00
1.0e-02 — 0.74
1.0e-01 — 0.05


* Select the best model (the one with the highest average reward).

In [9]:
best_model = bandit.best_model

* The learning rate chosen by the bandit is

In [10]:
best_model["LinearRegression"].intercept_lr.learning_rate

0.01