📦 Repository | 📂 Structure | 🚀 Quickstart
This repository demonstrates a nested Q-Former + Qwen3 LoRA recommendation stack:
- Item encoder + Item Q-Former
Raw item fields (text, CLIP features, etc.) → dense field embeddings → item query tokens. - User Q-Former
User history as a sequence of item query tokens → user query tokens. - Qwen3 + LoRA joint model
Injects item/user query tokens as special tokens in Qwen3, then uses the final embedding as a predicted next-item embedding for ranking a candidate pool.
All scripts here are copies from the original project, reorganized into a GitHub‑friendly layout (no hardcoded API keys or absolute cluster paths).
conda create -n unirec python=3.9
conda activate unirec
# Core deep learning libraries
pip install torch
pip install transformers
pip install sentence-transformers
pip install peft
# Data processing and utilities
pip install numpy
pip install pandas
pip install scikit-learn
pip install pyyaml
pip install tqdm
# Image processing
pip install Pillow
# HTTP requests (for downloading images)
pip install requestsNote: This project uses:
- Qwen3-Embedding-0.6B for text embeddings (via
sentence-transformers) - CLIP ViT-Large for image embeddings (via
transformers) - Qwen3-Embedding-0.6B as the base model for joint training (via
transformers) - PEFT/LoRA for parameter-efficient fine-tuning
Make sure you have CUDA-compatible PyTorch installed if you plan to use GPU acceleration.
-
data_processing/– build dicts, process recommendation data, generate CLIP embeddings, run Item Q-Former inference, and batch-generate item query tokens.
Seedata_processing/README.mdfor details and example flows. -
models/– core model components (Q-Former backbone + wrappers, item/user encoders, MWNE utilities).
Seemodels/README.mdfor a breakdown of each module. -
training/– training scripts for:- Item Q-Former,
- User Q-Former,
- Joint Qwen3+LoRA with injected query tokens.
Seetraining/README.mdfor per-script goals and rough pipelines.
-
evaluation/– evaluation scripts (currently: Item Q-Former reconstruction quality).
Seeevaluation/README.mdfor usage and metrics.
Run the following commands to prepare your dataset:
Put your raw dataset under data_rec/data/... and run the dict builders and rec processors:
# Build item dictionary
python data_processing/create_item_dict.py
# Build review dictionary (if using reviews)
python data_processing/create_review_dict.py
# Build triplet dictionary
python data_processing/create_triplet_dict.py
# Process recommendation data
python data_processing/process_rec_new_user.py
python data_processing/process_rec_old_user.pyYou may refer to the specific README in the data_processing directory for detailed argument descriptions.
Run CLIP embedding generation scripts:
# Generate CLIP embeddings for items
python data_processing/item_embedding_clip.py
# Generate CLIP embeddings for reviews (if using reviews)
python data_processing/review_embedding_clip.pyThis will generate CLIP embeddings under data_rec/embeddings/....
First, optionally precompute field embeddings to speed up training:
# Precompute and cache all item field embeddings
python training/precompute_full_field_embeddings.pyThen train the Item Q-Former:
# Train Item Q-Former
python training/item_qformer_training.pyFor more detailed information about the training process, please refer to the specific README in the training directory.
After training the Item Q-Former, generate item query tokens for all items:
# Generate item query tokens cache
python data_processing/generate_all_item_embeddings.pyTrain the User Q-Former and jointly train Qwen3+LoRA:
# Train User Q-Former
python training/user_qformer_training.py
# Jointly train Qwen3+LoRA with injected query tokens
python training/train_item_individual_token_joint.pyYou may refer to the specific README in the training directory for detailed instructions and hyperparameter configurations.
UniRec provides evaluation scripts to assess model performance. Currently supported:
- Item Q-Former reconstruction quality – measures how well the Item Q-Former reconstructs item field embeddings.
To evaluate your model's performance:
# Evaluate Item Q-Former reconstruction quality
python evaluation/evaluate_item_qformer.pyFor detailed information about the evaluation framework, supported metrics, and usage instructions, please refer to the evaluation/README.md.
For a complete end-to-end workflow:
-
Prepare data
- Put your raw dataset under
data_rec/data/.... - Run the dict builders and rec processors in
data_processing/:create_item_dict.py,create_review_dict.py,create_triplet_dict.py.process_rec_new_user.py/process_rec_old_user.py.
- Put your raw dataset under
-
Generate base embeddings
- Run
item_embedding_clip.py(andreview_embedding_clip.pyif you use reviews) to generate CLIP embeddings underdata_rec/embeddings/....
- Run
-
Train Item Q-Former
- Optionally run
precompute_full_field_embeddings.pyto cache field embeddings. - Run
item_qformer_training.pyto train the Item Q-Former and save a checkpoint.
- Optionally run
-
Generate item query tokens
- Run
generate_all_item_embeddings.pyto create a cache of item query tokens for all items.
- Run
-
Train User Q-Former and Qwen3+LoRA
- Run
user_qformer_training.pyto learn user query tokens from history. - Run
train_item_individual_token_joint.pyto jointly train Qwen3+LoRA with injected query tokens.
- Run
-
Evaluate
- Run
evaluate_item_qformer.pyto measure Item Q-Former reconstruction quality.
- Run
All paths and hyperparameters are meant to be edited for your dataset; everything now uses relative paths so the project can be safely pushed to GitHub.
If you find this repository useful, please consider citing:
@misc{unirec2024,
title={UniRec: Unified Multimodal Encoding for LLM-Based Recommendations},
author={UIUC U-Lab},
year={2025},
howpublished={\url{https://github.com/ulab-uiuc/UniRec}}
}