In [0]:
from fastai import *
from fastai.collab import *

In [18]:
path = untar_data(URLs.ML_SAMPLE)
path.ls()

[PosixPath('/root/.fastai/data/movie_lens_sample/ratings.csv')]

In [19]:
help(series2cat)

Help on function series2cat in module fastai.core:

series2cat(df:pandas.core.frame.DataFrame, *col_names)
    Categorifies the columns `col_names` in `df`.



In [20]:
df = pd.read_csv(path/'ratings.csv')
series2cat(df, 'userId', 'movieId')
df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,73,1097,4.0,1255504951
1,561,924,3.5,1172695223
2,157,260,3.5,1291598691
3,358,1210,5.0,957481884
4,130,316,2.0,1138999234


In [0]:
data = CollabDataBunch.from_df(df)
learn = collab_learner(data, n_factors=50, y_range=[0, 5.5])  # We will see n_factors later

In [22]:
learn.fit_one_cycle(10, 1e-3)

epoch,train_loss,valid_loss
1,2.009654,1.979538
2,1.968699,1.856500
3,1.710164,1.360017
4,1.159017,0.822418
5,0.817045,0.701169
6,0.698311,0.676826
7,0.657248,0.667983
8,0.628036,0.665257
9,0.629572,0.664528
10,0.617122,0.664293


So what `collab_learner` really did here can be traced as follows (for reference, [this commit](https://github.com/fastai/fastai/commit/926548cd2460cd79b86ad225cc795e8496f71d2a) was used for the code below).
* [The code for `collab_learner`](https://github.com/fastai/fastai/blob/master/fastai/collab.py#L98) shows that in our case, it created an `EmbeddingDotBias` class
* [The code for the `EmbeddingDotBias` class](https://github.com/fastai/fastai/blob/master/fastai/collab.py#L37) shows that it calls an `embedding` function.
* [The code for the `embedding` function](https://github.com/fastai/fastai/blob/master/fastai/layers.py#L285) simply creates a `torch.nn.Embedding` layer. It normalizes it, according to a suggestion from [this paper](https://arxiv.org/abs/1711.09160).
* An embedding can be thought of as a matrix of weights, where you can look up an item (like a movie or a user), and grab a vector from (see the Excel sheet to get a visual understanding of this). So in the spreadsheet, we have an embedding matrix for users, and one for movies, and we've been taking the dot product so far. But the Excel sheet (so far) does only multiplication; we would additionally like to add a *bias* term for each value.
* The `EmbeddingDotBias` class, in its `forward` method, simply multiplies the user and the item weights (`u_weight` and `i_weight`), and then adds the biases.
* There is one additional tweak, though. These simple linear models tend to work well for collaborative filtering. We said that there's a min score of 0 and a max score of 5 in our code above. So we force our output to be in this range, by using a sigmoid activation, and scaling it to these limits (see [this line](https://github.com/fastai/fastai/blob/master/fastai/collab.py#L50)).

All this above seems to give state-of-the-art results on MovieLens!