Skip to content

Commit

Permalink
docs: add models
Browse files Browse the repository at this point in the history
  • Loading branch information
zhenghaoz committed May 24, 2019
1 parent 59debb3 commit 64e5a4d
Show file tree
Hide file tree
Showing 7 changed files with 256 additions and 194 deletions.
4 changes: 2 additions & 2 deletions docs/introduction/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Introduction
:caption: Table of Contents
:maxdepth: 1

task
data
model
task
model/index
benchmark
192 changes: 0 additions & 192 deletions docs/introduction/model.rst

This file was deleted.

18 changes: 18 additions & 0 deletions docs/introduction/model/coclustering.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
============
CoClustering
============

Co-Clustering [#COC]_ predicts ratings by clustering users and items.

.. math::
\hat r_{ij}=A^{COC}_{gh}+(A^R_i-A^{RC}_g)+(A^C_j-A^{CC}_h)
Training
========

References
==========

.. [#COC] George, Thomas, and Srujana Merugu. "A scalable collaborative filtering framework based on co-clustering." Data Mining, Fifth IEEE international conference on. IEEE, 2005.
70 changes: 70 additions & 0 deletions docs/introduction/model/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
======
Models
======


+----------------------+------------------------------+------------------+
| Model | Data | Task |
+----------------------+----------+----------+--------+--------+---------+
| | explicit | implicit | weight | rating | ranking |
+======================+==========+==========+========+========+=========+
| BaseLine || | |||
+----------------------+----------+----------+--------+--------+---------+
| NMF [#NMF]_ || | |||
+----------------------+----------+----------+--------+--------+---------+
| SVD || | |||
+----------------------+----------+----------+--------+--------+---------+
| SVD++ [#SVDPP]_ || | |||
+----------------------+----------+----------+--------+--------+---------+
| KNN [#KNN]_ || | |||
+----------------------+----------+----------+--------+--------+---------+
| CoClustering [#COC]_ || | |||
+----------------------+----------+----------+--------+--------+---------+
| SlopeOne [#SO]_ || | |||
+----------------------+----------+----------+--------+--------+---------+
| ItemPop ||| | ||
+----------------------+----------+----------+--------+--------+---------+
| WRMF [#WRMF]_ |||| ||
+----------------------+----------+----------+--------+--------+---------+
| BPR [#BPR]_ ||| | ||
+----------------------+----------+----------+--------+--------+---------+

Apparently, these models using implicit feedbacks are more general since explicit feedbacks could be converted to implicit feedbacks and item ranking could be done by rating prediction.


Non-Personalized Models
=======================

Personalized Models
===================

.. toctree::
:caption: Personalized Models
:maxdepth: 1

matrix_factorization
knn
coclustering
slopeone


References
==========

.. [#Surprise] Hug, Nicolas. Surprise, a Python library for recommender systems. http://surpriselib.com, 2017.
.. [#LibRec] G. Guo, J. Zhang, Z. Sun and N. Yorke-Smith, LibRec: A Java Library for Recommender Systems, in Posters, Demos, Late-breaking Results and Workshop Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP), 2015.
.. [#NMF] Luo, Xin, et al. "An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems." IEEE Transactions on Industrial Informatics 10.2 (2014): 1273-1284.
.. [#SO] "Slope one predictors for online rating-based collaborative filtering." Proceedings of the 2005 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2005.
.. [#COC] George, Thomas, and Srujana Merugu. "A scalable collaborative filtering framework based on co-clustering." Data Mining, Fifth IEEE international conference on. IEEE, 2005.
.. [#WRMF] Hu, Yifan, Yehuda Koren, and Chris Volinsky. "Collaborative filtering for implicit feedback datasets." Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. Ieee, 2008.
.. [#KNN] Desrosiers, Christian, and George Karypis. "A comprehensive survey of neighborhood-based recommendation methods." Recommender systems handbook. Springer, Boston, MA, 2011. 107-144.
.. [#SVDPP] Koren, Yehuda. "Factorization meets the neighborhood: a multifaceted collaborative filtering model." Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008.
.. [#BPR] Rendle, Steffen, et al. "BPR: Bayesian personalized ranking from implicit feedback." Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, 2009.
103 changes: 103 additions & 0 deletions docs/introduction/model/knn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
===
KNN
===

KNN
---

Neighbors-based models [#KNN]_ predict ratings from similar items or similar users. There are two kinds of neighbors-based models: user-based and item-based, which means whether predictions are made by similar users or items. In general, item-based model is better than user-based model since items' characters are more consistent than users' preferences.

.. _similarity:

Similarity
^^^^^^^^^^

Similarity metrics define the definition of nearest neighbors. Items rated by two different users :math:`u` and :math:`v` are represented by :math:`I_u` and :math:`I_v`. Users rated two different items :math:`i` and :math:`j` are represented by :math:`U_i` and :math:`U_j`. There are several most used similarity functions:

Cosine
""""""

.. math::
\cos(u,v)=\frac{\sum\limits_{k\in|I_u\cap I_v|}r_{uk}\cdot r_{vk}}{\sqrt{\sum\limits_{k\in|I_u\cap I_v|}r_{uk}^2}\cdot\sqrt{\sum\limits_{k\in|I_u\cap I_v|}r_{vk}^2}}
Pearson
"""""""

Pearson similarity is similar to cosine similarity but ratings are subtracted by means first.

.. math::
\text{pearson}(a,b)=\frac{\sum\limits_{k\in|I_a\cap I_b|}(r_{ak}-\tilde r_a)\cdot (r_{bk}-\tilde r_b)}{\sqrt{\sum\limits_{k\in|I_a\cap I_b|}(r_{ak}-\tilde r_a)^2}\cdot\sqrt{\sum\limits_{k\in|I_a\cap I_b|}(r_{bk}-\tilde r_b)^2}}
where :math:`\tilde r_a` is the mean of ratings rated by the user :math:`a`:

.. math::
\tilde r_a = \sum_{k\in I_a} r_{ak}
Mean Square Distance
""""""""""""""""""""


The *Mean Square Distance* is

.. math::
\text{msd}(a,b)=\frac{1}{|I_a\cap I_b|}\sum_{k\in|I_a\cap I_b|}(r_{ak}-r_{bk})^2
Then, the *Mean Square Distance Similarity* is

.. math::
\text{msd_sim}(u, v) = \frac{1}{\text{msd}(u, v) + 1}
Predict
^^^^^^^

A rating could be predict by k nearest neighbors $\mathcal N_k(i)$ (k users or items with max k similarity).

.. math::
\hat r_{ui}=\frac{\sum_{v\in \mathcal N_k(u)}\text{sim}(u,v)r_{vi}}{\sum_{v\in \mathcal N_k(u)}\text{sim}(u,v)}
The basic KNN prediction has some problem, There are more advanced methods which achieve higher accuracy.

KNN with Mean
"""""""""""""

$\tilde r_l$ is the mean of l-th user's (or item's) ratings.

.. math::
\hat r_{ij}=\tilde r_i+\frac{\sum_{l\in \mathcal N_k(i)}\text{sim}(i,l)(r_{lj}-\tilde r_l)}{\sum_{l\in \mathcal N_k(i)}\text{sim}(i,l)}
KNN with Z-score
""""""""""""""""

$\sigma(r_i)$ is the standard deviation of l-th user's (or item's) ratings.

.. math::
\hat r_{ij}=\tilde r_i+\sigma(r_i)\frac{\sum_{l\in \mathcal N_k(i)}\text{sim}(i,l)\frac{r_{lj}-\tilde r_l}{\sigma(r_l)}}{\sum_{l\in \mathcal N_k(i)}\text{sim}(i,l)}
KNN with Baseline
"""""""""""""""""

$b_l$ is the baseline comes from the baseline model $\hat r_{ij}=b+b_i+b_j+p_i^Tq_j$.

.. math::
\hat r_{ij}=b_i+\frac{\sum_{l\in \mathcal N_k(i)}\text{sim}(i,l)(r_{lj}- b_l)}{\sum_{l\in \mathcal N_k(i)}\text{sim}(i,l)}
The KNN model with baseline is the best model since biases are used.



References
==========

.. [#KNN] Desrosiers, Christian, and George Karypis. "A comprehensive survey of neighborhood-based recommendation methods." Recommender systems handbook. Springer, Boston, MA, 2011. 107-144.

0 comments on commit 64e5a4d

Please sign in to comment.