Skip to content

zyouyang/RecPO

Repository files navigation

What Makes LLMs Effective Sequential Recommenders?

This repository contains the code for the ACL 2026 main paper "What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context".

Paper: https://arxiv.org/abs/2506.02261

Repository Collaborator: Qianlong Wen

The code supports supervised fine-tuning and preference-optimization experiments for sequential recommendation, including SFT, RecPO (ours), SimPO, DPO, S-DPO.

Setup

Create the conda environment from environment.yml:

conda env create -f environment.yml
conda activate rec

Log in to Hugging Face if your base model or dataset access requires authentication:

huggingface-cli login

Download Data

Download the dataset from Hugging Face:

huggingface-cli download zyouyang/recpo-data \
  --repo-type dataset \
  --local-dir recpo-data

The expected downloaded layout is:

recpo-data/
  raw_data/
    amazon-books/
    beeradvocate/
    lastfm/
    movielens/
    steam/
  processed_data/
    amazon-books/
    beeradvocate/
    lastfm/
    movielens/
    steam/

The raw-data loaders in data/*_data.py read the raw files from recpo-data/raw_data/<dataset>. The training and inference scripts read processed JSON files directly from recpo-data/processed_data/<dataset> by default. To use another location, pass --processed_data_dir /path/to/processed_data to the Python entry point.

Run the Demo Pipeline

The main demo script is exp.sh. It performs the steps documented in the script:

  1. Prepare LastFM SFT/RecPO JSON data.
  2. Run SFT to create an adapter checkpoint.
  3. Run RecDPO, SimPO, DPO, and SoftmaxDPO from the SFT adapter.
  4. Run inference for one selected preference-optimized checkpoint.

Run:

bash exp.sh

By default, exp.sh uses:

MODEL_NAME=Qwen/Qwen2.5-7B
DATASET=lastfm_10000
PROMPT_PATH=./prompt/music_rating.txt
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=8

You can override the model and GPU settings from the shell:

CUDA_VISIBLE_DEVICES=0,1 NPROC_PER_NODE=2 MODEL_NAME=Qwen/Qwen2.5-7B bash exp.sh

Outputs are written under:

output/
log/

Running Individual Stages

Prepare data:

cd data
python prepare_sft_data.py
cd ..

Run SFT:

torchrun --nproc_per_node 8 --master_port=25641 sft.py \
  --model_name Qwen/Qwen2.5-7B \
  --train_dataset lastfm_10000 \
  --batch_size 2 \
  --gradient_accumulation_steps 8 \
  --prompt_path ./prompt/music_rating.txt \
  --logging_dir log \
  --output_dir output \
  --learning_rate 1e-5 \
  --num_train_epochs 5 \
  --eval_step 0.2 \
  --cutoff_len 768 \
  --report_to none \
  --wandb_project RecPO \
  --wandb_name Demo-qwen-7B-SFT-lastfm

Run inference:

torchrun --nproc_per_node 8 --master_port=29611 inference.py \
  --dataset lastfm_10000 \
  --model_name Qwen/Qwen2.5-7B \
  --prompt_path ./prompt/music_rating.txt \
  --batch_size 10 \
  --resume_from_checkpoint output/lastfm/Demo-qwen-7B-RecDPO-lastfm/final_checkpoint \
  --rank0_process 0 \
  --save_output False

Data Format

Processed JSON examples contain fields such as:

{
  "historyList": ["item A", "item B"],
  "historyRatingList": ["5", "4"],
  "itemList": ["candidate 1", "candidate 2"],
  "itemScoreList": [4.0, 0.9045],
  "trueSelection": "candidate 1",
  "selectionScore": 4.0,
  "ratingNegative": ["candidate 2"],
  "randomNegative": ["candidate 3"]
}

itemScoreList is aligned with itemList. For training examples, candidate scores are computed as:

score = rating / (1 + step) ** decay_factor

where decay_factor defaults to 0.5.

Citation

If you use this repository, please consider citing our paper:

@inproceedings{recpo2026,
  title = {What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context},
  author={Zhongyu Ouyang, Qianlong Wen, Chunhui Zhang, Yanfang Ye, Soroush Vosoughi}
  booktitle={Proceedings of the The 64th Annual Meeting of the Association for Computational Linguistics},
  year = {2026},
}

Thanks 🌲

About

The code repository for the ACL 2026 main paper -- What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors