🤖 MLLMRec-R1

MLLMRec-R1: Incentivizing Reasoning Capability in Large Language Models for Multimodal Sequential Recommendation

🏗️ Model Architecture

🧰 Environment Setup for MLLMRec-R1

You can run the following command to download the codes faster:

conda create -y -n mllmrec python=3.11
conda activate mllmrec
pip install torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu128
pip install transformers==4.57.3 trl==0.26.2 tqdm==4.67.1 pandas==2.3.3 peft==0.18.0 accelerate==1.12.0 modelscope

📦 Install Codebase

cd /absolute_path
unzip MLLMRec-R1-main.zip
mv MLLMRec-R1-main MLLMRec-R1

💡 Dataset Structure and Download

You can download processed data files in the following datasets:

Movielens/ Microlens/ Netflix [Google Drive]

📌 Important: After downloading and extracting, please place the data/ directory under
/absolute_path/MLLMRec-R1/
cd /absolute_path/MLLMRec-R1
unzip data-mllmrec-r1.zip

data/
├── microlens/                      # Same structure as MovieLens
├── movielens/
│   ├── images/                     # Item images (optional, used for MSR)
│   ├── processed/                  # sample
│   │   ├── test_generate_seed42_neg9.json
│   │   ├── train_train_seed42_neg9_cot_0.1.json
│   │   └── train_generate_seed42_neg9_cot_0.05.json
│   ├── movielens_deepseek_cot.json # Multimodal CoT
│   │                               
│   ├── movielens_pairs.csv         # Preference pairs (positive/negative, pairwise)
│   ├── movielens_titles.csv        # Item metadata (item_id → title, etc.)
│   ├── train.tsv                   # Raw training interactions
│   └── test.tsv                    # Raw test interactions
└── netflix/                        # Same structure as MovieLens

🧱 Download Backbone Model

Before running Step 1 (SFT) and Multimodal CoT and subsequent steps, you must first download the Qwen3-4B/Qwen3-VL-8B-Instruct and place it under the following directory:

/absolute_path/MLLMRec-R1/Qwen3-4B/
/absolute_path/MLLMRec-R1/Qwen3-VL-8B-Instruct/
example code: modelscope download --model Qwen/Qwen3-4B --local_dir Qwen3-4B

♻️ Agent Pipeline for Multimodal CoT

✍️ (Caption → 🧩 Pseudo-CoT → 🧠 Reasoning)

If you would like to fully reproduce our data construction process, you can directly use the scripts provided in agent/.

The recommended pipeline follows three stages in order: (1) caption generation → (2) pseudo-CoT construction → (3) reasoning refinement.

In most cases, you only need to modify two parameters: DATASET_NAME (e.g., movielens, microlens, netflix) and root (the absolute path to the project directory). All other hyper-parameters can remain as default.

🚀 Examples to run the codes

🧑‍🏫 Step 1: Supervised Fine-Tuning (SFT)

Note: --root must be the absolute path to the project directory (i.e., the folder that contains train/, data/, etc.).

🎬 MovieLens

python train/sft.py --root /absolute_path/MLLMRec-R1 --backbone Qwen3-4B --use_cot --cot_prob 0.1 --dataset movielens --min_inter 10 --tag qwen3_4b_sft_cot

🔬 Microlens

python train/sft.py --root /absolute_path/MLLMRec-R1 --backbone Qwen3-4B --use_cot --cot_prob 0.1 --epochs 5 --dataset microlens --min_inter 10 --tag qwen3_4b_sft_cot

📺 Netflix

python train/sft.py --root /absolute_path/MLLMRec-R1 --backbone Qwen3-4B --use_cot --cot_prob 0.1 --dataset netflix --min_inter 5 --tag qwen3_4b_sft_cot

🧬 Step 2: Merge LoRA into the Base Model (GPU 0)

After SFT, the LoRA checkpoints are saved under:

checkpoints/<dataset>/qwen3_4b_sft_cot
where <dataset> is one of: movielens, microlens, netflix.

2.1 Set parameters in checkpoints/lora_merge.py

Example (MovieLens):

ROOT = "/absolute_path/MLLMRec-R1"
DATASET = "movielens"
EXP_NAME = "qwen3_4b_sft_cot" # your SFT result folder under checkpoints/movielens/
MERGED_NAME = "Qwen3-4B-COT-SFT"

2.2 Run LoRA merging on GPU 0

Use CUDA_VISIBLE_DEVICES=0 to force the merge to run on GPU 0:

CUDA_VISIBLE_DEVICES=0 python checkpoints/lora_merge.py

After merging, the final merged checkpoint will be saved under:

checkpoints/<dataset>/<MERGED_NAME>/

🧠 Step 3: GRPO Post-Training (Reinforcement Fine-Tuning)

Run GRPO to further align the model with preference/reward signals. Make sure --sft_tag matches the merged checkpoint folder name produced in Step 2 (e.g., Qwen3-4B-COT-SFT).

🎬 MovieLens

python train/grpo.py --dataset movielens --use_cot --cot_prob 0.05 --epochs 3 --root /absolute_path/MLLMRec-R1 --min_inter 10 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --save_steps 500 --num_generations 8 --beta 0.1

🔬 Microlens

python train/grpo.py --dataset microlens --use_cot --cot_prob 0.1 --epochs 3 --root /absolute_path/MLLMRec-R1 --min_inter 10 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --save_steps 500 --num_generations 8 --beta 0.05

📺 Netflix

python train/grpo.py --dataset netflix --use_cot --cot_prob 0.05 --epochs 2 --root /absolute_path/MLLMRec-R1 --min_inter 5 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --save_steps 500 --num_generations 8 --beta 0.2

✅ GRPO Optimal Parameters

Dataset	cot_prob	epochs	min_inter	#generations	beta
movielens	0.05	3	10	8	0.10
microlens	0.10	3	10	8	0.05
netflix	0.05	2	5	8	0.20

📈 Step 4: Evaluation (Distributed Inference)

Before running evaluation, explicitly specify the GPUs to use by setting CUDA_VISIBLE_DEVICES. This ensures the evaluation only uses the intended devices and matches --nproc_per_node.

export CUDA_VISIBLE_DEVICES=0,1

Evaluate the GRPO-trained model using distributed inference. Ensure that --tag points to the specific GRPO checkpoint to be evaluated. After GRPO, the LoRA checkpoints are saved under:

checkpoints/<dataset>/qwen3_4b_sft_grpo

🎬 MovieLens / 🔬 MicroLens / 📺 Netflix

Purpose	Argument	Example Values
Switch dataset	`--dataset`	movielens, microlens, netflix
Switch top-K	`--top_k`	3, 5, 10
Switch number of negatives	`--num_neg`	9, 99
Run concurrent distributed jobs	`--rdzv_endpoint`	29601, 29602, 29603, …

torchrun --nproc_per_node=2 --rdzv_backend=c10d --rdzv_endpoint=localhost:29601 train/inference.py --root /absolute_path/MLLMRec-R1 --dataset movielens --min_inter 10 --top_k 3 --num_neg 9 --sft_tag Qwen3-4B-COT-SFT --tag qwen3_4b_grpo_cot --distributed

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
agent		agent
checkpoints		checkpoints
train		train
README.md		README.md
data.png		data.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 MLLMRec-R1

🏗️ Model Architecture

🧰 Environment Setup for MLLMRec-R1

💡 Dataset Structure and Download

🧱 Download Backbone Model

♻️ Agent Pipeline for Multimodal CoT

✍️ (Caption → 🧩 Pseudo-CoT → 🧠 Reasoning)

🚀 Examples to run the codes

🧑‍🏫 Step 1: Supervised Fine-Tuning (SFT)

🎬 MovieLens

🔬 Microlens

📺 Netflix

🧬 Step 2: Merge LoRA into the Base Model (GPU 0)

🧠 Step 3: GRPO Post-Training (Reinforcement Fine-Tuning)

🎬 MovieLens

🔬 Microlens

📺 Netflix

✅ GRPO Optimal Parameters

📈 Step 4: Evaluation (Distributed Inference)

🎬 MovieLens / 🔬 MicroLens / 📺 Netflix

About

Uh oh!

Releases

Packages

Languages

wangyu0627/MLLMRec-R1

Folders and files

Latest commit

History

Repository files navigation

🤖 MLLMRec-R1

🏗️ Model Architecture

🧰 Environment Setup for MLLMRec-R1

💡 Dataset Structure and Download

🧱 Download Backbone Model

♻️ Agent Pipeline for Multimodal CoT

✍️ (Caption → 🧩 Pseudo-CoT → 🧠 Reasoning)

🚀 Examples to run the codes

🧑‍🏫 Step 1: Supervised Fine-Tuning (SFT)

🎬 MovieLens

🔬 Microlens

📺 Netflix

🧬 Step 2: Merge LoRA into the Base Model (GPU 0)

🧠 Step 3: GRPO Post-Training (Reinforcement Fine-Tuning)

🎬 MovieLens

🔬 Microlens

📺 Netflix

✅ GRPO Optimal Parameters

📈 Step 4: Evaluation (Distributed Inference)

🎬 MovieLens / 🔬 MicroLens / 📺 Netflix

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages