MLLMRec-R1: Incentivizing Reasoning Capability in Large Language Models for Multimodal Sequential Recommendation
You can run the following command to download the codes faster:
conda create -y -n mllmrec python=3.11
conda activate mllmrec
pip install torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu128
pip install transformers==4.57.3 trl==0.26.2 tqdm==4.67.1 pandas==2.3.3 peft==0.18.0 accelerate==1.12.0 modelscope📦 Install Codebase
cd /absolute_path
unzip MLLMRec-R1-main.zip
mv MLLMRec-R1-main MLLMRec-R1You can download processed data files in the following datasets:
Movielens/ Microlens/ Netflix [Google Drive]
📌 Important: After downloading and extracting, please place the
data/directory under
/absolute_path/MLLMRec-R1/cd /absolute_path/MLLMRec-R1 unzip data-mllmrec-r1.zip
data/
├── microlens/ # Same structure as MovieLens
├── movielens/
│ ├── images/ # Item images (optional, used for MSR)
│ ├── processed/ # sample
│ │ ├── test_generate_seed42_neg9.json
│ │ ├── train_train_seed42_neg9_cot_0.1.json
│ │ └── train_generate_seed42_neg9_cot_0.05.json
│ ├── movielens_deepseek_cot.json # Multimodal CoT
│ │
│ ├── movielens_pairs.csv # Preference pairs (positive/negative, pairwise)
│ ├── movielens_titles.csv # Item metadata (item_id → title, etc.)
│ ├── train.tsv # Raw training interactions
│ └── test.tsv # Raw test interactions
└── netflix/ # Same structure as MovieLens
Before running Step 1 (SFT) and Multimodal CoT and subsequent steps, you must first download the Qwen3-4B/Qwen3-VL-8B-Instruct and place it under the following directory:
/absolute_path/MLLMRec-R1/Qwen3-4B/
/absolute_path/MLLMRec-R1/Qwen3-VL-8B-Instruct/
example code: modelscope download --model Qwen/Qwen3-4B --local_dir Qwen3-4B
If you would like to fully reproduce our data construction process, you can directly use the scripts provided in agent/.
The recommended pipeline follows three stages in order: (1) caption generation → (2) pseudo-CoT construction → (3) reasoning refinement.
In most cases, you only need to modify two parameters: DATASET_NAME (e.g., movielens, microlens, netflix) and root (the absolute path to the project directory). All other hyper-parameters can remain as default.
Note:
--rootmust be the absolute path to the project directory (i.e., the folder that containstrain/,data/, etc.).
python train/sft.py --root /absolute_path/MLLMRec-R1 --backbone Qwen3-4B --use_cot --cot_prob 0.1 --dataset movielens --min_inter 10 --tag qwen3_4b_sft_cotpython train/sft.py --root /absolute_path/MLLMRec-R1 --backbone Qwen3-4B --use_cot --cot_prob 0.1 --epochs 5 --dataset microlens --min_inter 10 --tag qwen3_4b_sft_cotpython train/sft.py --root /absolute_path/MLLMRec-R1 --backbone Qwen3-4B --use_cot --cot_prob 0.1 --dataset netflix --min_inter 5 --tag qwen3_4b_sft_cotAfter SFT, the LoRA checkpoints are saved under:
checkpoints/<dataset>/qwen3_4b_sft_cot
where <dataset> is one of: movielens, microlens, netflix.
2.1 Set parameters in checkpoints/lora_merge.py
Example (MovieLens):
ROOT = "/absolute_path/MLLMRec-R1"
DATASET = "movielens"
EXP_NAME = "qwen3_4b_sft_cot" # your SFT result folder under checkpoints/movielens/
MERGED_NAME = "Qwen3-4B-COT-SFT"2.2 Run LoRA merging on GPU 0
Use CUDA_VISIBLE_DEVICES=0 to force the merge to run on GPU 0:
CUDA_VISIBLE_DEVICES=0 python checkpoints/lora_merge.pyAfter merging, the final merged checkpoint will be saved under:
checkpoints/<dataset>/<MERGED_NAME>/
Run GRPO to further align the model with preference/reward signals. Make sure --sft_tag matches the merged checkpoint folder name produced in Step 2 (e.g., Qwen3-4B-COT-SFT).
python train/grpo.py --dataset movielens --use_cot --cot_prob 0.05 --epochs 3 --root /absolute_path/MLLMRec-R1 --min_inter 10 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --save_steps 500 --num_generations 8 --beta 0.1 python train/grpo.py --dataset microlens --use_cot --cot_prob 0.1 --epochs 3 --root /absolute_path/MLLMRec-R1 --min_inter 10 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --save_steps 500 --num_generations 8 --beta 0.05python train/grpo.py --dataset netflix --use_cot --cot_prob 0.05 --epochs 2 --root /absolute_path/MLLMRec-R1 --min_inter 5 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --save_steps 500 --num_generations 8 --beta 0.2| Dataset | cot_prob | epochs | min_inter | #generations | beta |
|---|---|---|---|---|---|
| movielens | 0.05 | 3 | 10 | 8 | 0.10 |
| microlens | 0.10 | 3 | 10 | 8 | 0.05 |
| netflix | 0.05 | 2 | 5 | 8 | 0.20 |
Before running evaluation, explicitly specify the GPUs to use by setting CUDA_VISIBLE_DEVICES. This ensures the evaluation only uses the intended devices and matches --nproc_per_node.
export CUDA_VISIBLE_DEVICES=0,1Evaluate the GRPO-trained model using distributed inference. Ensure that --tag points to the specific GRPO checkpoint to be evaluated. After GRPO, the LoRA checkpoints are saved under:
checkpoints/<dataset>/qwen3_4b_sft_grpo
| Purpose | Argument | Example Values |
|---|---|---|
| Switch dataset | --dataset |
movielens, microlens, netflix |
| Switch top-K | --top_k |
3, 5, 10 |
| Switch number of negatives | --num_neg |
9, 99 |
| Run concurrent distributed jobs | --rdzv_endpoint |
29601, 29602, 29603, … |
torchrun --nproc_per_node=2 --rdzv_backend=c10d --rdzv_endpoint=localhost:29601 train/inference.py --root /absolute_path/MLLMRec-R1 --dataset movielens --min_inter 10 --top_k 3 --num_neg 9 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --distributed