OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding

Wenyuan Huang¹ Zhao Wang² Zhou Wei² Ting Huang¹ Fang Zhao¹ Jian Yang¹ Zhenyu Zhang¹✉️

Project Page | Paper | arXiv | Benchmark

¹Nanjing University, School of Intelligent Science and Technology
²China Mobile Zijin Innovation Institute
✉️Corresponding author

Abstract

3D visual grounding aims to locate objects based on natural language descriptions in 3D scenes. Existing methods rely on a pre-defined Object Lookup Table (OLT) to query Visual Language Models (VLMs) for reasoning about object locations, which limits the applications in scenarios with undefined or unforeseen targets. To address this problem, we present OpenGround, a novel zero-shot framework for open-world 3D visual grounding. Central to OpenGround is the Active Cognition-based Reasoning (ACR) module, which is designed to overcome the fundamental limitation of pre-defined OLTs by progressively augmenting the cognitive scope of VLMs. The ACR module performs human-like perception of the target via a cognitive task chain and actively reasons about contextually relevant objects, thereby extending VLM cognition through a dynamically updated OLT. This allows OpenGround to function with both pre-defined and open-world categories. We also propose a new dataset named OpenTarget, which contains over 7000 object-description pairs to evaluate our method in open-world scenarios. Extensive experiments demonstrate that OpenGround achieves competitive performance on Nr3D, state-of-the-art on ScanRefer, and delivers a substantial 17.6% improvement on OpenTarget.

📝 Updates

[25/12/28] Introducing OpenGround, a new framework towards open-world 3D visual grounding. For more details, kindly refer to our Project Page and Preprint. 🚀
[26/01/04] The benchmark OpenTarget have been released. 🤗
[Coming soon] The code have been released. 🤗

0. Framework Overview

Overview of the OpenGround framework. We propose the Active Cognition-based Reasoning (ACR) module that progressively enhances the cognition of VLMs. The ACR module builds a cognitive task chain to perform human-like cognitive planning on how to find the target, and then actively reasons about the location based on the contextually relevant objects to extend the VLMs' cognition via a dynamically updated OLT. This enables OpenGround to function effectively in both pre-defined and open-world scenarios without relying on a complete pre-defined OLT.

Key Components:

Active Cognition-based Reasoning (ACR): Performs human-like perception via cognitive task chains and actively reasons about contextually relevant objects
Dynamic Object Lookup Table (OLT): Continuously updated during the reasoning process to extend VLM cognition
OpenTarget Dataset: New benchmark with over 7000 object-description pairs for open-world evaluation

1. Run

Coming soon... The original code has over 3000 lines in a single file. I will publish it once it's organized.

2. Results

2.1 Quantitative Results

ScanRefer (Overall):

Method	Supervision	Acc@0.25	Acc@0.5
SPAZER	Zero-Shot	57.2	48.8
SeqVLM	Zero-Shot	55.6	49.6
VLM-Grounder*	Zero-Shot	48.3	33.5
SeeGround	Zero-Shot	44.1	39.4
OpenGround (Ours)	Zero-Shot	61.8	53.1

Nr3D (Overall):

Method	Supervision	Overall
SPAZER†	Zero-Shot	63.8
SeqVLM	Zero-Shot	53.2
VLM-Grounder	Zero-Shot	48.0
SeeGround	Zero-Shot	46.1
ZSVG3D	Zero-Shot	39.0
OpenGround (Ours)†	Zero-Shot	61.7

OpenTarget (New Benchmark, GT OLT):

Method	OLT	Acc@0.25	Acc@0.50
VLM-Grounder*	GT	28.6	20.4
SeqVLM	GT	19.4	19.2
SeeGround	GT	17.9	17.4
GPT4Scene	GT	12.1	11.8
OpenGround (Ours)	GT	54.8 (+27.0%)	54.3 (+33.9%)

Note: * denotes results on selected samples. † denotes methods using GPT-4o as VLM.

2.2 Qualitative Results

See our Project Page for visual comparisons and interactive demos.

3. Citation

If you find our work and code repository helpful, please consider starring our repository and citing the following paper:

@misc{huang2025opengroundactivecognitionbasedreasoning,
      title={OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding}, 
      author={Wenyuan Huang and Zhao Wang and Zhou Wei and Ting Huang and Fang Zhao and Jian Yang and Zhenyu Zhang},
      year={2025},
      eprint={2512.23020},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.23020}, 
}

Star History

If you find this project helpful, please consider giving us a star ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding

Abstract

📝 Updates

Table of Contents

0. Framework Overview

1. Run

2. Results

2.1 Quantitative Results

2.2 Qualitative Results

3. Citation

Star History

About

Uh oh!

Releases

Packages

why-102/openground

Folders and files

Latest commit

History

Repository files navigation

OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding

Abstract

📝 Updates

Table of Contents

0. Framework Overview

1. Run

2. Results

2.1 Quantitative Results

2.2 Qualitative Results

3. Citation

Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages