This is the official implementation of our paper Breaking the Curse of Quality Saturation with User-Centric Ranking. The code within this repository relies heavily on DeepCTR-Torch. We thank the authors for their contributions to the open-sourced community.
MovieLens data used in this experiment as well as pre-processed user-centric features can be found here.
Use the included setup_env.sh to set up your environment by running
sh setup_env.shAs mentioned before, pre-processed user-centric features can be downloaded from here. But to generate the features, you can run process_data.py with the following options.
--data_type: Choose between '1M', '10M', or '20M' which each corresponds to one variant of the MovieLens datasets. '25M' for the MovieLens25M is currently not supported unfortunately.
--input_dir: Directory of the root that contains the data, for example, it could be ./data/ml-1m.
--output_dir: Directory that will be used to save the generated features.
--num_process: For multi-process purposes.
-v, --verbose: Verbosity.
An example command for generating user-centric features can be
python process_features.py --data_type 10M --input_dir ./data/ml-1m/ --output_dir ./data/ --num_process 10 -vTo run the experiment, simply run the main file main.py with the following options
--mode: Choose either 'train' or 'test'.
--model_name: Name of the model, choose between 'DIN', 'DIEN', or 'DIFM'.
--model_type: Type of pooling operation, choose between 'sum' and 'attention'. Check Table 3 in our paper for more details.
--data_type: Same as before, choose between '1M', '10M' or '20M'.
--feature_type: Choose between 'IC' (item-centric) and 'UC' (user-centric).
--train_dir: Directory of the training features (if in 'train' mode). Running process_features.py or downloading pre-process features should have this ready. An example could be './data/splitted_features_10m/train'.
--val_dir: Directory of the validation features (if in 'train' mode). An example could be './data/splitted_features_10m/test'.
--test_dir: Directory of the test features (if in 'test' mode). An example could be './data/splitted_features_10m/test'.
--output_model_dir: Directory for trained model checkpoints.
--input_model_path: Path of the existing model, required if continuing training or doing testing.
--output_hist_dir: Directory of the training history.
--num_epoch: Number of training epochs.
--batch_size: Batch size.
--save_freq: Number of epochs when a model checkpoint is saved.
-v, --verbose: Verbosity.
An example of training command might look like:
python main.py --mode train --model_name DIN --model_type sum --data_type 10M --feature_type UC --train_dir ./data/splitted_features_10M/train --val_dir ./data/splitted_features_10M/test --output_model_dir ./models --output_hist_dir ./history --num_epoch 100 --batch_size 64 --save_freq 10 -vAn example of test command might look like:
python main.py --mode test --model_name DIN --model_type sum --data_type 10M --feature_type UC --test_dir ./data/splitted_features_10M/test --input_model_path ./models/trained_model.pt --batch_size 128 -vIf this code helps your research, please kindly cite our paper. Thank you!
@article{zhao2023breaking,
title={Breaking the Curse of Quality Saturation with User-Centric Ranking},
author={Zhao, Zhuokai and Yang, Yang and Wang, Wenyu and Liu, Chihuang and Shi, Yu and Hu, Wenjie and Zhang, Haotian and Yang, Shuang},
journal={arXiv preprint arXiv:2305.15333},
year={2023}
}