This repository contains the code for the ACL 2026 main paper "What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context".
Paper: https://arxiv.org/abs/2506.02261
Repository Collaborator: Qianlong Wen
The code supports supervised fine-tuning and preference-optimization experiments for sequential recommendation, including SFT, RecPO (ours), SimPO, DPO, S-DPO.
Create the conda environment from environment.yml:
conda env create -f environment.yml
conda activate recLog in to Hugging Face if your base model or dataset access requires authentication:
huggingface-cli loginDownload the dataset from Hugging Face:
huggingface-cli download zyouyang/recpo-data \
--repo-type dataset \
--local-dir recpo-dataThe expected downloaded layout is:
recpo-data/
raw_data/
amazon-books/
beeradvocate/
lastfm/
movielens/
steam/
processed_data/
amazon-books/
beeradvocate/
lastfm/
movielens/
steam/
The raw-data loaders in data/*_data.py read the raw files from recpo-data/raw_data/<dataset>. The training and inference scripts read processed JSON files directly from recpo-data/processed_data/<dataset> by default. To use another location, pass --processed_data_dir /path/to/processed_data to the Python entry point.
The main demo script is exp.sh. It performs the steps documented in the script:
- Prepare LastFM SFT/RecPO JSON data.
- Run SFT to create an adapter checkpoint.
- Run RecDPO, SimPO, DPO, and SoftmaxDPO from the SFT adapter.
- Run inference for one selected preference-optimized checkpoint.
Run:
bash exp.shBy default, exp.sh uses:
MODEL_NAME=Qwen/Qwen2.5-7B
DATASET=lastfm_10000
PROMPT_PATH=./prompt/music_rating.txt
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=8You can override the model and GPU settings from the shell:
CUDA_VISIBLE_DEVICES=0,1 NPROC_PER_NODE=2 MODEL_NAME=Qwen/Qwen2.5-7B bash exp.shOutputs are written under:
output/
log/
Prepare data:
cd data
python prepare_sft_data.py
cd ..Run SFT:
torchrun --nproc_per_node 8 --master_port=25641 sft.py \
--model_name Qwen/Qwen2.5-7B \
--train_dataset lastfm_10000 \
--batch_size 2 \
--gradient_accumulation_steps 8 \
--prompt_path ./prompt/music_rating.txt \
--logging_dir log \
--output_dir output \
--learning_rate 1e-5 \
--num_train_epochs 5 \
--eval_step 0.2 \
--cutoff_len 768 \
--report_to none \
--wandb_project RecPO \
--wandb_name Demo-qwen-7B-SFT-lastfmRun inference:
torchrun --nproc_per_node 8 --master_port=29611 inference.py \
--dataset lastfm_10000 \
--model_name Qwen/Qwen2.5-7B \
--prompt_path ./prompt/music_rating.txt \
--batch_size 10 \
--resume_from_checkpoint output/lastfm/Demo-qwen-7B-RecDPO-lastfm/final_checkpoint \
--rank0_process 0 \
--save_output FalseProcessed JSON examples contain fields such as:
{
"historyList": ["item A", "item B"],
"historyRatingList": ["5", "4"],
"itemList": ["candidate 1", "candidate 2"],
"itemScoreList": [4.0, 0.9045],
"trueSelection": "candidate 1",
"selectionScore": 4.0,
"ratingNegative": ["candidate 2"],
"randomNegative": ["candidate 3"]
}itemScoreList is aligned with itemList. For training examples, candidate scores are computed as:
score = rating / (1 + step) ** decay_factor
where decay_factor defaults to 0.5.
If you use this repository, please consider citing our paper:
@inproceedings{recpo2026,
title = {What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context},
author={Zhongyu Ouyang, Qianlong Wen, Chunhui Zhang, Yanfang Ye, Soroush Vosoughi}
booktitle={Proceedings of the The 64th Annual Meeting of the Association for Computational Linguistics},
year = {2026},
}Thanks 🌲