A Python implementation of LightFM, a hybrid recommendation algorithm.
Switch branches/tags
Clone or download
Pull request Compare This branch is 297 commits behind lyst:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
examples/movielens
lightfm
tests
.flake8rc
.gitignore
LICENSE
README.md
changelog.md
lightfm.png
setup.cfg
setup.py

README.md

LightFM

LightFM logo

A Python implementation of LightFM, a hybrid recommendation algorithm.

The LightFM model incorporates both item and user metadata into the traditional matrix factorization algorithm. It represents each user and item as the sum of the latent representations of their features, thus allowing recommendations to generalise to new items (via item features) and to new users (via user features).

The details of the approach are described in the LightFM paper, available on arXiv.

The model can be trained using four methods:

  • logistic loss: useful when both positive (1) and negative (-1) interactions are present.
  • BPR: Bayesian Personalised Ranking [1] pairwise loss. Maximises the prediction difference between a positive example and a randomly chosen negative example. Useful when only positive interactions are present and optimising ROC AUC is desired.
  • WARP: Weighted Approximate-Rank Pairwise [2] loss. Maximises the rank of positive examples by repeatedly sampling negative examples until a rank violating one is found. Useful when only positive interactions are present and optimising the top of the recommendation list (precision@k) is desired.
  • k-OS WARP: k-th order statistic loss [3]. A modification of WARP that uses the k-th positive example for any given user as a basis for pairwise updates.

Two learning rate schedules are implemented:

  • adagrad: [4]
  • adadelta: [5]

Installation

Install from pypi using pip: pip install lightfm.

Note for OSX users: due to its use of OpenMP, lightfm does not compile under Clang. To install it, you will need a reasonably recent version of gcc (from Homebrew for instance). This should be picked up by setup.py; if it is not, please open an issue.

Usage

Model fitting is very straightforward.

Create a model instance with the desired latent dimensionality

from lightfm import LightFM

model = LightFM(no_components=30)

Assuming train is a (no_users, no_items) sparse matrix (with 1s denoting positive, and -1s negative interactions), you can fit a traditional matrix factorization model by calling

model.fit(train, epochs=20)

This will train a traditional MF model, as no user or item features have been supplied.

To get predictions, call model.predict:

predictions = model.predict(test_user_ids, test_item_ids)

User and item features can be incorporated into training by passing them into the fit method. Assuming user_features is a (no_users, no_user_features) sparse matrix (and similarly for item_features), you can call

model.fit(train,
          user_features=user_features,
          item_features=item_features,
          epochs=20)
predictions = model.predict(test_user_ids,
                            test_item_ids,
                            user_features=user_features,
                            item_features=item_features)

to train the model and obtain predictions.

Both training and prediction can employ multiple cores for speed:

model.fit(train, epochs=20, num_threads=4)
predictions = model.predict(test_user_ids, test_item_ids, num_threads=4)

This implementation uses asynchronous stochastic gradient descent [6] for training. This can lead to lower accuracy when the interaction matrix (or the feature matrices) are very dense and a large number of threads is used. In practice, however, training on a sparse dataset with 20 threads does not lead to a measurable loss of accuracy.

In an implicit feedback setting, the BPR, WARP, or k-OS WARP loss functions can be used. If train is a sparse matrix with positive entries representing positive interactions, the model can be trained as follows:

model = LightFM(no_components=30, loss='warp')
model.fit(train, epochs=20)

Examples

Check the examples directory for more examples.

The Movielens example shows how to use lightfm on the Movielens dataset, both with and without using movie metadata. Another example compares the performance of the adagrad and adadelta learning schedules.

Development

Pull requests are welcome. To install for development:

  1. Clone the repository: git clone git@github.com:lyst/lightfm.git
  2. Install it for development using pip: cd lightfm && pip install -e .
  3. You can run tests by running python setupy.py test.

When making changes to the .pyx extension files, you'll need to run python setup.py cythonize in order to produce the extension .c files before running pip install -e ..

References

[1] Rendle, Steffen, et al. "BPR: Bayesian personalized ranking from implicit feedback." Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2009.

[2] Weston, Jason, Samy Bengio, and Nicolas Usunier. "Wsabie: Scaling up to large vocabulary image annotation." IJCAI. Vol. 11. 2011.

[3] Weston, Jason, Hector Yee, and Ron J. Weiss. "Learning to rank recommendations with the k-order statistic loss." Proceedings of the 7th ACM conference on Recommender systems. ACM, 2013.

[4] Duchi, John, Elad Hazan, and Yoram Singer. "Adaptive subgradient methods for online learning and stochastic optimization." The Journal of Machine Learning Research 12 (2011): 2121-2159.

[5] Zeiler, Matthew D. "ADADELTA: An adaptive learning rate method." arXiv preprint arXiv:1212.5701 (2012).

[6] Recht, Benjamin, et al. "Hogwild: A lock-free approach to parallelizing stochastic gradient descent." Advances in Neural Information Processing Systems. 2011.