In this project, i will build a recommender system web-app to make recommendation for user. I recommend for user the items they may like and a recommender system to find out which item is similar to this item.
In this project, I build a application about movie recommender system with Movielens 1M dataset. For more information about the dataset, please check here. I use Alternating Least Square which has been implemented in implicit for user recommendation and a Content-based module for related item recommendation. I also implement a simple web-app to perform recommendation for the user or the item (model serving).
This project contains four steps of ML pipeline:
- ETL: Clean data and save cleaned data to file and database.
- Feature engineering: Transform feature to meet model fitting.
- Modelling: Build a Machine learning pipeline to feature engineering and train ML model.
- Model serving: Build Flask web app to predict user's input query.
The code was implemented in Python 3.9. All necessary package was contained in requirements.txt
file.
For quick installation:
pip install -r requirements.txt
The Movielens 1M dataset contains information about history of user and movie's profile. To see the EDA of Movielens 1M, please go to this notebook
- Alternating Leasts Squares (ALS): An approach of matrix factorization. this model try to decompose rating matrix into two factos matrix
- Content-based (CB): A content based approach use cosine similarity to find most similar item.
With ALS model, i use implementation from implicit for better performance. With CB model, i implement my own and try to combine multiple of data type. My CB implementation can handle multiple of content with data type can be list, category or text. Final similarity of pair items is average of all features input. Code of this implementation you can find here
In this project, i only implement evaluation for ALS. To evaluate ALS, I use 3 metrics: RMSE, MAP@k and P@k.
- RMSE (Root Mean Squares Error): the differences between predicted rating and true rating.
- P@k (Precision at k): Precision of recommendation with top k result.
- MAP@k (Mean average precision at k): Mean of P@k with all users.
The result of ALS for Movielens 1M. You can view detail at here
Factors | RMSE | MAP@k | P@k |
---|---|---|---|
10 | 3.21 | 0.10 | 0.206 |
30 | 3.18 | 0.122 | 0.244 |
50 | 3.18 | 0.128 | 0.246 |
100 | 3.214 | 0.121 | 0.234 |
300 | 3.42 | 0.087 | 0.171 |
1000 | 3.67 | 0.0364 | 0.0733 |
- ETL pipeline
We need to pre-processing for user's dataset and item's dataset. To run ETL pipeline for clean user dataset, run the code below:
python movielens_rating_etl.py
To run ETL pipeline for clean item's dataset:
python movielens_meta_etl.py
- To build and train model
We have 2 model is ALS and ContentBased. To run ALS model
python als.py
To run ContentBased model
python cb.py
- Run web app
Run the code below to start the web app at localhost
python run.py
And go to http://localhost:3000 to see the web app