Skip to content

telecomhzj/MaskRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

MaskRAG: Mask Retrieval Augmented Generation for MLLM-based Referring Expression Segmentation

Abstract

MaskRAG is a multimodal RAG-like framework for referring expression segmentation (RES). It addresses the "segmentation hallucination" issue in existing [SEG]-based methods through two core modules: a Mask Retrieval Module that encodes region features with customized language templates for finer-grained scene perception, and a Mask Augmentation Module with multi-granularity semantic fusion and adaptive routing mechanisms. MaskRAG achieves state-of-the-art performance across RefCOCO/+/g benchmarks.

Results

Model RefCOCO Val RefCOCO+ Val RefCOCOg Val Avg.
MaskRAG (4B) 80.7 75.8 77.6 77.8
MaskRAG (8B) 82.1 77.0 78.3 79.0

Installation

Coming soon.

Usage

Coming soon.

Citation

If you find this work useful, please cite:

@article{he2025maskrag,
  title={MaskRAG: Mask Retrieval Augmented Generation for MLLM-based Referring Expression Segmentation},
  author={He, Zhongjiang and Zhao, An and Tang, Canhui and Sun, Hao and Sun, Hongbo and Yuan, Ye and Liang, Kongming and Ma, Zhanyu},
  year={2025}
}

About

Mask Retrieval Augmented Generation for MLLM-based Referring Expression Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors