#### CSCE 670 :: Information Storage and Retrieval :: Texas A&M University :: Spring 2020

## Spotlight - Crab
#### Mahin Ramezani

<br><br>
### 1- Introduction

Crab as known as scikits.recommender is a Python framework for building recommender engines that integrate with the world of scientific Python packages (numpy, scipy, matplotlib), provide a rich set of components from which user can construct a customized recommender system from a set of algorithms and be usable in various contexts like science and engineering.

website: http://muricoca.github.io/crab/

#### 1-1- What are recommender systems
In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy or anything else depending on industries). There are two major categories of recommender systems: collaborative filtering methods and content-based methods. The Crab recommender framework focus on collaborative filtering.

#### 1-2- Collaborative Filtering 
Collaborative Filtering produces recommendations based on the knowledge of users’ relationships to items. Collaborative methods for recommender systems are methods that are based solely on past interactions recorded between users and items in order to produce new recommendations. These interactions are stored in the so-called “user-item interactions matrix”.
There are several strategies for creating recommendations: One could look at what people with similar tastes seem to like. Another approach would figure out what items are like the ones we already like. Those strategies describe the two most well-known categories of recommender techniques: user-based and item-based recommenders.

<br> 
### 2- Installation

From a terminal window:
* Update packages:
> sudo apt update <br>
> sudo apt-get install python-dev python-numpy python-numpy-dev python-setuptools python-numpy-dev python-scipy libatlas-base-dev g++ <br>
> sudo apt install python-pip <br>
> pip install scikit-learn <br>
> pip install numpy <br>
> pip install base <br>
* Install crab  using the source code:
> git clone https://github.com/muricoca/crab.git <br>
> cd crab/ <br>
> python setup.py install <br>

<br> 
### 3- Building a Recommender System with Crab
Crab contains several recommender algorithms implemented: conventional user-based and item-based recommenders. It also comes with a few standard datasets, for instance, the songs dataset, or the movies dataset, etc.

#### 3-1 Load the dataset

In [0]:
# load dataset
from scikits.crab import datasets

data = datasets.load_sample_movies()
data

{'DESCR': 'sample_movies data set was collected by the book called \nProgramming the Collective Intelligence by Toby Segaran \n\nNotes\n-----\nThis data set consists of\n\t* n ratings with (1-5) from n users to n movies.',
 'data': {1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0},
  2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0},
  3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0},
  4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0},
  5: {2: 4.5, 3: 1.0, 4: 4.0},
  6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5},
  7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}},
 'item_ids': {1: 'Lady in the Water',
  2: 'Snakes on a Planet',
  3: 'You, Me and Dupree',
  4: 'Superman Returns',
  5: 'The Night Listener',
  6: 'Just My Luck'},
 'user_ids': {1: 'Jack Matthews',
  2: 'Mick LaSalle',
  3: 'Claudia Puig',
  4: 'Lisa Rose',
  5: 'Toby',
  6: 'Gene Seymour',
  7: 'Michael Phillips'}}

This dataset is a dictionary and has 4 main parts:
1. DESCR: this is a short description of the dataset
2. data: which is in the format {user_id:{item_id: preference, item_id2: preference, ...}, user_id2: {...}, ...}. For instance, in the case of the movies dataset, data.data gives access to the users and their preferences for the movies. 
3. item_ids
4. user_ids

In [0]:
data.DESCR

'sample_movies data set was collected by the book called \nProgramming the Collective Intelligence by Toby Segaran \n\nNotes\n-----\nThis data set consists of\n\t* n ratings with (1-5) from n users to n movies.'

In [0]:
data.data

{1: {1: 3.0, 2: 4.0, 3: 3.5, 4: 5.0, 5: 3.0},
 2: {1: 3.0, 2: 4.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 2.0},
 3: {2: 3.5, 3: 2.5, 4: 4.0, 5: 4.5, 6: 3.0},
 4: {1: 2.5, 2: 3.5, 3: 2.5, 4: 3.5, 5: 3.0, 6: 3.0},
 5: {2: 4.5, 3: 1.0, 4: 4.0},
 6: {1: 3.0, 2: 3.5, 3: 3.5, 4: 5.0, 5: 3.0, 6: 1.5},
 7: {1: 2.5, 2: 3.0, 4: 3.5, 5: 4.0}}

In [0]:
data.item_ids

{1: 'Lady in the Water',
 2: 'Snakes on a Planet',
 3: 'You, Me and Dupree',
 4: 'Superman Returns',
 5: 'The Night Listener',
 6: 'Just My Luck'}

In [0]:
data.user_ids

{1: 'Jack Matthews',
 2: 'Mick LaSalle',
 3: 'Claudia Puig',
 4: 'Lisa Rose',
 5: 'Toby',
 6: 'Gene Seymour',
 7: 'Michael Phillips'}

<br>
Crab also offers the possibility to use external datasets coming from simple comma-separated-value format files (.csv).

#### 3-2- Build model

In [0]:
from scikits.crab import models

m = models.MatrixPreferenceDataModel(data.data)
print(m)

MatrixPreferenceDataModel (7 by 6)
         1          2          3          4          5        ...
1        3.000000   4.000000   3.500000   5.000000   3.000000
2        3.000000   4.000000   2.000000   3.000000   3.000000
3           ---     3.500000   2.500000   4.000000   4.500000
4        2.500000   3.500000   2.500000   3.500000   3.000000
5           ---     4.500000   1.000000   4.000000      ---
6        3.000000   3.500000   3.500000   5.000000   3.000000
7        2.500000   3.000000      ---     3.500000   4.000000


Let's look at the user-item interactions matrix which we have printed in the previous cell. In this matrix, rows represent users and columns represent the items. You can see that users 1 and 6 seem to have similar tastes. 

On the other hand, users 1 and 7 are different.

Here's how we can get the similarity between users:

In [0]:
from scikits.crab.metrics.pairwise import euclidean_distances
from scikits.crab.similarities import  UserSimilarity

similarity = UserSimilarity(m, euclidean_distances)
similarity[1]

[(1, 1.0),
 (6, 0.66666666666666663),
 (4, 0.34054242658316669),
 (3, 0.32037724101704074),
 (7, 0.32037724101704074),
 (2, 0.2857142857142857),
 (5, 0.2674788903885893)]

You can use other distance metrics. Crab contains several distance metrics. For example: cosine_distances, jaccard_coefficient, loglikehood_coefficient, pearson_correlation, and etc.

In [0]:
from scikits.crab.metrics.pairwise import cosine_distances

similarity = UserSimilarity(m, cosine_distances)
similarity[1]

[(1, 0.99999999999999978),
 (6, 0.99856297007158268),
 (4, 0.99127582693458016),
 (2, 0.96771319700291469),
 (7, 0.96722039536025972),
 (3, 0.96352011827457695),
 (5, 0.93180524803215414)]

In [0]:
from scikits.crab.metrics.pairwise import pearson_correlation

similarity = UserSimilarity(m, pearson_correlation)
similarity[5]

[(5, 1.0),
 (4, 0.99124070716193025),
 (2, 0.92447345164190498),
 (3, 0.8934051474415643),
 (1, 0.66284898035987005),
 (6, 0.38124642583151169),
 (7, -1.0)]

Using ItemSimilarity function, we can get the similarity between different items.

In [0]:
from scikits.crab.similarities import ItemSimilarity

item_similarity = ItemSimilarity(m, euclidean_distances)
item_similarity[4]

[(4, 1.0),
 (2, 0.3090169943749474),
 (5, 0.25265030858707199),
 (1, 0.2402530733520421),
 (6, 0.20799159651347807),
 (3, 0.19182536636347339)]

Now we want to recommend a movie to user5. This user already watched movie2, movie3, movie4. So we have now items 1, 5 and 6 as possible recommendations.

##### 3-2-1- build user-based recommender
As we saw before, the most similar person to user5 is user4 and the next similar person is user2. So, we want our user-based recommender to suggest the movie that these two users liked.

In [0]:
#build user Based recommender

from scikits.crab.recommenders.knn import UserBasedRecommender

recsys_userbased = UserBasedRecommender(model= m, similarity= similarity, capper= True, with_preference=True)


In [0]:
#recommend item for the user 5

recsys_userbased.recommend(5)

[(5, 3.3477895267131013), (1, 2.1757957879546104), (6, 2.0955889324773884)]

Using recommended_because function, we can see why the recommender, recommend these movies. This function returns the items that were most influential in recommending a given item to a given user. 

For example in the following code, we asked the recommender to give us 2 reasons for recommending item1 to user5.
The recommender answer: 
1. Because user2 gave 3 to this movie 
2. Because user1 gave 3 to this movie

In [0]:
recsys_userbased.recommended_because(user_id = 5, item_id = 1, how_many = 2)

[(2, 3.0), (1, 3.0)]

##### 3-2-2- build item-based recommender

The item-based approach would figure out what items are like the ones we already like. 

In [0]:
from scikits.crab.recommenders.knn import ItemBasedRecommender
from scikits.crab.recommenders.knn.item_strategies import ItemsNeighborhoodStrategy

items_strategy = ItemsNeighborhoodStrategy()
recsys_itembased = ItemBasedRecommender(model = m, similarity = item_similarity, items_selection_strategy = items_strategy)

In [0]:
#recommend item for the user 5

recsys_itembased.recommend(5)

[5.0, 6.0, 1.0]

In [0]:
recsys_itembased.recommended_because(user_id = 5, item_id = 5, how_many = 2)

[2, 4]

So, based on the rank that user5 gave to item 2 and item4, the recommender, suggest movie5.

##### 3-2-3- Other functions

* estimate_preference: Return an estimated preference if the user has not expressed a preference for the item, or else the user's actual preference for the item. If a preference cannot be estimated, returns None.

In [0]:
recsys_userbased.estimate_preference(user_id = 5, item_id = 5)

3.3477895267131013

* all_other_items: return items in the model for which the user has not expressed the preference and could possibly be recommended to the user.

In [0]:
recsys_userbased.all_other_items(5)

array([1, 5, 6])

* most_similar_items: return the most similar items to the given item, ordered from most similar to least.

In [0]:
recsys_itembased.most_similar_items(2)

array([1, 5, 4, 6, 3])

* most_similar_users: return the most similar users to the given user, ordered from most similar to least.

In [0]:
recsys_userbased.most_similar_users(5)

array([4, 2, 3, 1, 6, 7])

### 4- Evaluating the Recommender System
#### 4-1- Evaluate user-based recommender

In [0]:
from scikits.crab.metrics.classes import CfEvaluator

evaluator = CfEvaluator()
all_scores = evaluator.evaluate(recsys_userbased)
all_scores

{'f1score': 1.0,
 'mae': 1.7860894669610905,
 'nmae': 0.44652236674027262,
 'precision': 1.0,
 'recall': 1.0,
 'rmse': 2.0103584098247276}

In [0]:
evaluator.evaluate(recommender= recsys_userbased, metric='rmse')

{'rmse': 1.4754076942044934}

In [0]:
evaluator.evaluate_on_split(recommender= recsys_userbased, at= 2)

({'error': [{'mae': nan, 'nmae': nan, 'rmse': nan},
   {'mae': nan, 'nmae': nan, 'rmse': nan},
   {'mae': nan, 'nmae': nan, 'rmse': nan}],
  'ir': [{'f1score': 1.0, 'precision': 1.0, 'recall': 1.0},
   {'f1score': 1.0, 'precision': 1.0, 'recall': 1.0},
   {'f1score': 1.0, 'precision': 1.0, 'recall': 1.0}]},
 {'final_error': {'avg': {'f1score': 1.0,
    'mae': nan,
    'nmae': nan,
    'precision': 1.0,
    'recall': 1.0,
    'rmse': nan},
   'stdev': {'f1score': 0.0,
    'mae': nan,
    'nmae': nan,
    'precision': 0.0,
    'recall': 0.0,
    'rmse': nan}}})

#### 4-2- Evaluate item-based recommender

In [0]:
evaluator.evaluate(recommender= recsys_itembased, metric='rmse')

{'rmse': 0.63400051445360961}

In [0]:
evaluator.evaluate_on_split(recommender= recsys_itembased, at= 2)

({'error': [{'mae': 0.98666907449075536,
    'nmae': 0.24666726862268884,
    'rmse': 1.0750986755668408},
   {'mae': 0.78056954900170283,
    'nmae': 0.19514238725042571,
    'rmse': 0.91436181061894917},
   {'mae': 0.70879418397610139,
    'nmae': 0.17719854599402535,
    'rmse': 0.85877323034529329}],
  'ir': [{'f1score': 1.0, 'precision': 1.0, 'recall': 1.0},
   {'f1score': 0.875, 'precision': 0.875, 'recall': 0.875},
   {'f1score': 0.875, 'precision': 0.875, 'recall': 0.875}]},
 {'final_error': {'avg': {'f1score': 0.91666666666666663,
    'mae': 0.82534426915618653,
    'nmae': 0.20633606728904663,
    'precision': 0.91666666666666663,
    'recall': 0.91666666666666663,
    'rmse': 0.94941123884369449},
   'stdev': {'f1score': 0.05892556509887896,
    'mae': 0.11777717766561563,
    'nmae': 0.029444294416403907,
    'precision': 0.05892556509887896,
    'recall': 0.05892556509887896,
    'rmse': 0.091726119060081476}}})

### 4- Conclusion

Crab is a python framework for building recommendation systems. In this spotlight, I have shown how to install and use crab to build and evaluate user-based and item-based recommender.

### 5- References
* http://muricoca.github.io/crab/
* https://github.com/muricoca/crab.git
* https://towardsdatascience.com/introduction-to-recommender-systems-6c66cf15ada
