Skip to content

MLLMRec-R1: Incentivizing Reasoning Capability in Large Language Models for Multimodal Sequential Recommendation

Notifications You must be signed in to change notification settings

wangyu0627/MLLMRec-R1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 MLLMRec-R1

MLLMRec-R1: Incentivizing Reasoning Capability in Large Language Models for Multimodal Sequential Recommendation

🏗️ Model Architecture

🧰 Environment Setup for MLLMRec-R1

You can run the following command to download the codes faster:

conda create -y -n mllmrec python=3.11
conda activate mllmrec
pip install torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu128
pip install transformers==4.57.3 trl==0.26.2 tqdm==4.67.1 pandas==2.3.3 peft==0.18.0 accelerate==1.12.0 modelscope

📦 Install Codebase

cd /absolute_path
unzip MLLMRec-R1-main.zip
mv MLLMRec-R1-main MLLMRec-R1

💡 Dataset Structure and Download

You can download processed data files in the following datasets:

Movielens/ Microlens/ Netflix [Google Drive]

📌 Important: After downloading and extracting, please place the data/ directory under
/absolute_path/MLLMRec-R1/

cd /absolute_path/MLLMRec-R1
unzip data-mllmrec-r1.zip
data/
├── microlens/                      # Same structure as MovieLens
├── movielens/
│   ├── images/                     # Item images (optional, used for MSR)
│   ├── processed/                  # sample
│   │   ├── test_generate_seed42_neg9.json
│   │   ├── train_train_seed42_neg9_cot_0.1.json
│   │   └── train_generate_seed42_neg9_cot_0.05.json
│   ├── movielens_deepseek_cot.json # Multimodal CoT
│   │                               
│   ├── movielens_pairs.csv         # Preference pairs (positive/negative, pairwise)
│   ├── movielens_titles.csv        # Item metadata (item_id → title, etc.)
│   ├── train.tsv                   # Raw training interactions
│   └── test.tsv                    # Raw test interactions
└── netflix/                        # Same structure as MovieLens

🧱 Download Backbone Model

Before running Step 1 (SFT) and Multimodal CoT and subsequent steps, you must first download the Qwen3-4B/Qwen3-VL-8B-Instruct and place it under the following directory:

/absolute_path/MLLMRec-R1/Qwen3-4B/
/absolute_path/MLLMRec-R1/Qwen3-VL-8B-Instruct/
example code: modelscope download --model Qwen/Qwen3-4B --local_dir Qwen3-4B

♻️ Agent Pipeline for Multimodal CoT

✍️ (Caption → 🧩 Pseudo-CoT → 🧠 Reasoning)

If you would like to fully reproduce our data construction process, you can directly use the scripts provided in agent/.

The recommended pipeline follows three stages in order: (1) caption generation → (2) pseudo-CoT construction → (3) reasoning refinement.

In most cases, you only need to modify two parameters: DATASET_NAME (e.g., movielens, microlens, netflix) and root (the absolute path to the project directory). All other hyper-parameters can remain as default.

🚀 Examples to run the codes

🧑‍🏫 Step 1: Supervised Fine-Tuning (SFT)

Note: --root must be the absolute path to the project directory (i.e., the folder that contains train/, data/, etc.).

🎬 MovieLens

python train/sft.py --root /absolute_path/MLLMRec-R1 --backbone Qwen3-4B --use_cot --cot_prob 0.1 --dataset movielens --min_inter 10 --tag qwen3_4b_sft_cot

🔬 Microlens

python train/sft.py --root /absolute_path/MLLMRec-R1 --backbone Qwen3-4B --use_cot --cot_prob 0.1 --epochs 5 --dataset microlens --min_inter 10 --tag qwen3_4b_sft_cot

📺 Netflix

python train/sft.py --root /absolute_path/MLLMRec-R1 --backbone Qwen3-4B --use_cot --cot_prob 0.1 --dataset netflix --min_inter 5 --tag qwen3_4b_sft_cot

🧬 Step 2: Merge LoRA into the Base Model (GPU 0)

After SFT, the LoRA checkpoints are saved under:

checkpoints/<dataset>/qwen3_4b_sft_cot
where <dataset> is one of: movielens, microlens, netflix.

2.1 Set parameters in checkpoints/lora_merge.py

Example (MovieLens):

ROOT = "/absolute_path/MLLMRec-R1"
DATASET = "movielens"
EXP_NAME = "qwen3_4b_sft_cot" # your SFT result folder under checkpoints/movielens/
MERGED_NAME = "Qwen3-4B-COT-SFT"

2.2 Run LoRA merging on GPU 0

Use CUDA_VISIBLE_DEVICES=0 to force the merge to run on GPU 0:

CUDA_VISIBLE_DEVICES=0 python checkpoints/lora_merge.py

After merging, the final merged checkpoint will be saved under:

checkpoints/<dataset>/<MERGED_NAME>/

🧠 Step 3: GRPO Post-Training (Reinforcement Fine-Tuning)

Run GRPO to further align the model with preference/reward signals. Make sure --sft_tag matches the merged checkpoint folder name produced in Step 2 (e.g., Qwen3-4B-COT-SFT).

🎬 MovieLens

python train/grpo.py --dataset movielens --use_cot --cot_prob 0.05 --epochs 3 --root /absolute_path/MLLMRec-R1 --min_inter 10 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --save_steps 500 --num_generations 8 --beta 0.1 

🔬 Microlens

python train/grpo.py --dataset microlens --use_cot --cot_prob 0.1 --epochs 3 --root /absolute_path/MLLMRec-R1 --min_inter 10 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --save_steps 500 --num_generations 8 --beta 0.05

📺 Netflix

python train/grpo.py --dataset netflix --use_cot --cot_prob 0.05 --epochs 2 --root /absolute_path/MLLMRec-R1 --min_inter 5 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --save_steps 500 --num_generations 8 --beta 0.2

✅ GRPO Optimal Parameters

Dataset cot_prob epochs min_inter #generations beta
movielens 0.05 3 10 8 0.10
microlens 0.10 3 10 8 0.05
netflix 0.05 2 5 8 0.20

📈 Step 4: Evaluation (Distributed Inference)

Before running evaluation, explicitly specify the GPUs to use by setting CUDA_VISIBLE_DEVICES. This ensures the evaluation only uses the intended devices and matches --nproc_per_node.

export CUDA_VISIBLE_DEVICES=0,1

Evaluate the GRPO-trained model using distributed inference. Ensure that --tag points to the specific GRPO checkpoint to be evaluated. After GRPO, the LoRA checkpoints are saved under:

checkpoints/<dataset>/qwen3_4b_sft_grpo

🎬 MovieLens / 🔬 MicroLens / 📺 Netflix

Purpose Argument Example Values
Switch dataset --dataset movielens, microlens, netflix
Switch top-K --top_k 3, 5, 10
Switch number of negatives --num_neg 9, 99
Run concurrent distributed jobs --rdzv_endpoint 29601, 29602, 29603, …
torchrun --nproc_per_node=2 --rdzv_backend=c10d --rdzv_endpoint=localhost:29601 train/inference.py --root /absolute_path/MLLMRec-R1 --dataset movielens --min_inter 10 --top_k 3 --num_neg 9 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --distributed

About

MLLMRec-R1: Incentivizing Reasoning Capability in Large Language Models for Multimodal Sequential Recommendation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages