Skip to content

ycc1107/CollaborativeFilteringSystem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

CollaborativeFilteringSystem

Build status:

Build status

###Instructions

Modeling: We are going to use a technique called collaborative filtering[2]. Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue x than to have the opinion on x of a person chosen randomly.

Data preparation: Split the ratingsRDD dataset into three pieces

A training set (RDD), which we will use to train models A validation set (RDD), which we will use to choose the best model A test set (RDD), which we will use for our experiments For example (in python): trainingRDD, validationRDD, testRDD = ratingsRDD.randomSplit([6, 2, 2], seed=0L) Model training & model selection: Based on the training and validation set, use the Root Mean Square Error (RMSE) [3]to compute the error of each model, then select the best model (hint: The most important parameter to ALS.train() is the rank. You could use a fixed value 0.1 for the regularizationParameter. )

Model evaluation: Apply the best model to the testRDD dataset to decide how good our model is.

Data:

Each line in the ratings dataset (ratings.dat.gz) is formatted as: UserID::MovieID::Rating::Timestamp Create the ratingsRDD : For each line in the ratings dataset, we create a tuple of (UserID, MovieID, Rating). We drop the timestamp because we do not need it here.

Project Processing: https://trello.com/b/ylJY2Fan/collaborativefilteringsystem

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published