What Makes LLMs Effective Sequential Recommenders?

This repository contains the code for the ACL 2026 main paper "What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context".

Paper: https://arxiv.org/abs/2506.02261

Repository Collaborator: Qianlong Wen

The code supports supervised fine-tuning and preference-optimization experiments for sequential recommendation, including SFT, RecPO (ours), SimPO, DPO, S-DPO.

Setup

Create the conda environment from environment.yml:

conda env create -f environment.yml
conda activate rec

Log in to Hugging Face if your base model or dataset access requires authentication:

huggingface-cli login

Download Data

Download the dataset from Hugging Face:

huggingface-cli download zyouyang/recpo-data \
  --repo-type dataset \
  --local-dir recpo-data

The expected downloaded layout is:

recpo-data/
  raw_data/
    amazon-books/
    beeradvocate/
    lastfm/
    movielens/
    steam/
  processed_data/
    amazon-books/
    beeradvocate/
    lastfm/
    movielens/
    steam/

The raw-data loaders in data/*_data.py read the raw files from recpo-data/raw_data/<dataset>. The training and inference scripts read processed JSON files directly from recpo-data/processed_data/<dataset> by default. To use another location, pass --processed_data_dir /path/to/processed_data to the Python entry point.

Run the Demo Pipeline

The main demo script is exp.sh. It performs the steps documented in the script:

Prepare LastFM SFT/RecPO JSON data.
Run SFT to create an adapter checkpoint.
Run RecDPO, SimPO, DPO, and SoftmaxDPO from the SFT adapter.
Run inference for one selected preference-optimized checkpoint.

Run:

bash exp.sh

By default, exp.sh uses:

MODEL_NAME=Qwen/Qwen2.5-7B
DATASET=lastfm_10000
PROMPT_PATH=./prompt/music_rating.txt
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC_PER_NODE=8

You can override the model and GPU settings from the shell:

CUDA_VISIBLE_DEVICES=0,1 NPROC_PER_NODE=2 MODEL_NAME=Qwen/Qwen2.5-7B bash exp.sh

Outputs are written under:

output/
log/

Running Individual Stages

Prepare data:

cd data
python prepare_sft_data.py
cd ..

Run SFT:

torchrun --nproc_per_node 8 --master_port=25641 sft.py \
  --model_name Qwen/Qwen2.5-7B \
  --train_dataset lastfm_10000 \
  --batch_size 2 \
  --gradient_accumulation_steps 8 \
  --prompt_path ./prompt/music_rating.txt \
  --logging_dir log \
  --output_dir output \
  --learning_rate 1e-5 \
  --num_train_epochs 5 \
  --eval_step 0.2 \
  --cutoff_len 768 \
  --report_to none \
  --wandb_project RecPO \
  --wandb_name Demo-qwen-7B-SFT-lastfm

Run inference:

torchrun --nproc_per_node 8 --master_port=29611 inference.py \
  --dataset lastfm_10000 \
  --model_name Qwen/Qwen2.5-7B \
  --prompt_path ./prompt/music_rating.txt \
  --batch_size 10 \
  --resume_from_checkpoint output/lastfm/Demo-qwen-7B-RecDPO-lastfm/final_checkpoint \
  --rank0_process 0 \
  --save_output False

Data Format

Processed JSON examples contain fields such as:

{
  "historyList": ["item A", "item B"],
  "historyRatingList": ["5", "4"],
  "itemList": ["candidate 1", "candidate 2"],
  "itemScoreList": [4.0, 0.9045],
  "trueSelection": "candidate 1",
  "selectionScore": 4.0,
  "ratingNegative": ["candidate 2"],
  "randomNegative": ["candidate 3"]
}

itemScoreList is aligned with itemList. For training examples, candidate scores are computed as:

score = rating / (1 + step) ** decay_factor

where decay_factor defaults to 0.5.

Citation

If you use this repository, please consider citing our paper:

@inproceedings{recpo2026,
  title = {What Makes LLMs Effective Sequential Recommenders? A Study on Preference Intensity and Temporal Context},
  author={Zhongyu Ouyang, Qianlong Wen, Chunhui Zhang, Yanfang Ye, Soroush Vosoughi}
  booktitle={Proceedings of the The 64th Annual Meeting of the Association for Computational Linguistics},
  year = {2026},
}

Thanks 🌲

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
prompt		prompt
trainer		trainer
.gitignore		.gitignore
Prompt.py		Prompt.py
README.md		README.md
cpo_simpo.py		cpo_simpo.py
dpo.py		dpo.py
environment.yml		environment.yml
evaluate_batch.py		evaluate_batch.py
exp.sh		exp.sh
inference.py		inference.py
rec_dpo.py		rec_dpo.py
sft.py		sft.py
softmax_dpo.py		softmax_dpo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What Makes LLMs Effective Sequential Recommenders?

Setup

Download Data

Run the Demo Pipeline

Running Individual Stages

Data Format

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What Makes LLMs Effective Sequential Recommenders?

Setup

Download Data

Run the Demo Pipeline

Running Individual Stages

Data Format

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages