# Hyperparameter Optimization

In this tutorial, we optimize `P3alphaRecommender`'s performance by our [optuna](https://github.com/optuna/optuna)-backed `P3alphaOptimizer`.

In [1]:
import os

import numpy as np
import scipy.sparse as sps
from sklearn.model_selection import train_test_split

from irspack.dataset.movielens import MovieLens1MDataManager
from irspack.recommenders import P3alphaRecommender
from irspack.optimizers import P3alphaOptimizer
from irspack.split import rowwise_train_test_split
from irspack.evaluator import Evaluator

The comptutation might be heavy, so we use multiple threads to speed up the training and evaluation.

You can tell our algorithms to use mutiple threads whenever possible by setting ``IRSPACK_NUM_THREADS_DEFAULT`` environment variable:

In [2]:
os.environ["IRSPACK_NUM_THREADS_DEFAULT"] = "8"

## Read the ML1M dataset again.

We again prepare the sparse matrix `X`.

In [3]:
loader = MovieLens1MDataManager()

df = loader.read_interaction()

movies = loader.read_item_info()
movies.head()

unique_user_ids, user_index = np.unique(df.userId, return_inverse=True)
unique_movie_ids, movie_index = np.unique(df.movieId, return_inverse=True)

movie_id_vs_movie_index = { mid: i for i, mid in enumerate(unique_movie_ids)}

X = sps.csr_matrix(
    (
        np.ones(df.shape[0]),
        ( user_index, movie_index)
    )
)

## Split scheme 2. Hold-out for partial users.

To perform the hyperparameter optimization, we have to repeatedly measure the accuracy metrics on the validation set. As mentioned in the previous tutorial, doing this for all users is time-comsuming (often heavier than the recommender's learning process), so we truncate this subset as follows:

1. First split **users** into "train", "validation" (and "test") ones.
1. For train users, feed all their interactions into the recommender. For validation (test) users, hold-out part of their interaction for the validation ("prediction" part), and feed the rest ("learning" part) into the recommender.
1. After the fit, ask the recommender to output the score only for validation (test) users, and see how it ranks these held-out interactions for the validation (test) users.

![Perform hold out for part of users.](./split2.png "split1")

Although we have prepared another function to do this procedure, let us first do this manually.

In [4]:
# Split users into train and validation users.

X_train_user, X_valid_user = train_test_split(X, test_size=.4, random_state=0)

# Split the validation users' interaction into learning 50% and predcition 50%.

X_valid_learn, X_valid_predict = rowwise_train_test_split(
    X_valid_user, test_ratio=.5, random_seed=0
)

## Define the evaluator and optimize the validation metric

As illustrated above, we will use 

 * Train users' all interactions (``X_train_user``)
 * Validation users' 50% interaction (``X_valid_learn``)
 
as the recommender's training resource, and validation users' rest interaction (``X_valid_predict``) as the held-out ground truth:

In [5]:
X_train_val_learn = sps.vstack([X_train_user, X_valid_learn])
evaluator = Evaluator(X_valid_predict, offset=X_train_user.shape[0], cutoff=20)

The ``offset`` parameter specifies where the validation user block begins (where the train user block ends).

Now to start the optimization.

In [None]:
optimizer = P3alphaOptimizer(X_train_val_learn, evaluator, metric="ndcg")
best_params, validation_results = optimizer.optimize(random_seed=0, n_trials=20)
# stdout has been truncated as it's too lengthy to show

So here are the results of the optimization. 

The best `ndcg@20` value has been obtained by using these hyper parameters:

In [7]:
best_params

{'alpha': 5.063313369473015e-07, 'top_k': 161, 'normalize_weight': False}

and the resulting ndcg value is 

In [8]:
validation_results.ndcg.max()

0.5190536602989886

Meanwhile, the default argument of ``P3alphaRecommdner`` (which has been used so far)
attains `ndcg@20` = 0.404. So this is indeed a significant improvement:

In [9]:
rec_default = P3alphaRecommender(X_train_val_learn).learn()
evaluator.get_score(rec_default)['ndcg']

0.4041442008349765

## Check the recommender's output again

Let us finally check how our recommender has evolved from the first tutorial.

We consider the same setting (a new user has watched "Toy Story"), but fit the 
recommender using the obtained parameters.

In [10]:
rec_tuned = P3alphaRecommender(X, **best_params).learn()

In [11]:
toystory_id = 1
toystory_watcher_matrix = sps.csr_matrix(
    ([1], ([0], [movie_id_vs_movie_index[toystory_id]])),
    shape=(1, len(unique_movie_ids)) # this time
)

score = rec_tuned.get_score_cold_user_remove_seen(
    toystory_watcher_matrix
)

recommended_movie_index = score[0].argsort()[::-1][:10]
recommended_movie_ids = unique_movie_ids[recommended_movie_index]

# Top-10 recommendations
movies.reindex(recommended_movie_ids)

Unnamed: 0_level_0,title,genres,release_year
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1265,Groundhog Day (1993),Comedy|Romance,1993
2396,Shakespeare in Love (1998),Comedy|Romance,1998
3114,Toy Story 2 (1999),Animation|Children's|Comedy,1999
1270,Back to the Future (1985),Comedy|Sci-Fi,1985
2028,Saving Private Ryan (1998),Action|Drama|War,1998
34,Babe (1995),Children's|Comedy|Drama,1995
356,Forrest Gump (1994),Comedy|Romance|War,1994
2355,"Bug's Life, A (1998)",Animation|Children's|Comedy,1998
1197,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance,1987
588,Aladdin (1992),Animation|Children's|Comedy|Musical,1992


Note how drastically the recommended contents have changed (increased significance of genre "Children's" and disapperance of "Star Wars" series, etc...)!

To be complete, we should

* include other algorithms
* measure the final score against **test** dataset, not validation

but these are now straightforward. See our `examples/movielens/` directories for these complete examples.