this page is under construction
Screening similar but non-target images in text-based image re- trieval is crucial for pinpointing the user’s desired images accurately. However, conventional methods mainly focus on enhancing text- image matching performance, often failing to identify images that exactly match the retrieval intention because of the query quality. User-provided queries frequently lack adequate information for screening similar but not target images, especially when the target database (DB) contains numerous similar images. Therefore, a novel approach is needed to extract valuable information from users for effective screening. In this paper, we propose a DB question gener- ation (DQG) model to enhance exact cross-modal image retrieval performance. Our DQG model learns to generate effective ques- tions that precisely screen similar but non-target images using DB contents information. By answering the questions generated from our model, users can reach their desired images by only answer- ing the presented questions even within DBs with similar content. Experimental results on publicly available datasets show that our proposed approach can significantly improve exact cross-modal image retrieval performance. Code will be publicly available.
If you find our work useful in your research, please consider citing:
This project is licensed under the MIT License - see the LICENSE.md file for details.