Skip to content

viridityzhu/RelaxFlow

Repository files navigation

RelaxFlow: Text-Driven Amodal 3D Generation


Jiayin Zhu1Guoji Fu1Xiaolu Liu2 1Qiyuan He1Yicong Li1Angela Yao1;
National University of Singapore 1
Zhejiang University 2

🎯 What We Do: Resolving Semantic Ambiguity

Image-to-3D generation faces inherent semantic ambiguity under occlusion, where partial observation alone is often insufficient to determine the object category. For instance, a visible wooden backboard could plausibly correspond to a sofa, a bed, or a dressing table. Existing feedforward models, like SAM3D, often collapse to an "observation-overfitted" shape by uncontrolled hallucination.

We formalize text-driven amodal 3D generation. Our task allows users to explicitly steer the completion of unseen regions using text prompts, while strictly preserving the visual evidence of the input observation.

⚙️ How We Do It: Decoupled Control & Relaxation

These dual objectives demand distinct control granularities: rigid control for the visible observation versus relaxed structural control for the text prompt. To solve this, we propose RelaxFlow, a training-free dual-branch framework:

  • Observation Branch: Provides strict adherence to ensure visual fidelity for the observed pixels.
  • Multi-Prior Consensus: Converts the text prompt into visual proxy reference images. Cross-attention across these priors naturally amplifies structural consensus while suppressing inconsistent, instance-specific textures.
  • Visibility-Aware Fusion: A spatial blending mechanism ensuring the semantic guide only steers genuinely occluded regions, while the observation strictly governs the visible pixels.

The Theory: Low-Pass Relaxation

A core challenge is preventing the text prompt's high-frequency details from clashing with the input image. We introduce a Relaxation Mechanism that smooths cross-attention logits within the generation backbone.

Theoretically, we prove this smoothing is equivalent to applying a low-pass filter on the generative vector field. This mathematically suppresses high-frequency instance details and exposes a "coarse semantic corridor," enforcing only the low-frequency global geometry needed to accommodate the observation (e.g., the general shape of a "sofa").

📊 Benchmarks & Results

To facilitate systematic evaluation, we introduce two new diagnostic benchmarks:

  • ExtremeOcc-3D: Targets extreme occlusion in natural indoor scenes where visible evidence cannot identify the object category.
  • AmbiSem-3D: Targets semantic branching, where the same visual evidence admits multiple plausible interpretations, paired with distinct text prompts.

Results

Extensive experiments demonstrate that RelaxFlow successfully steers the generation of unseen regions to match the prompt intent. It avoids the observation-overfitted collapse of existing models and produces high-quality 3D assets without compromising visual fidelity.

🚀 Get Started

Installation

Follow the setup steps of SAM 3D Objects before running the following. Based on our testing, the minimum requirement is a single GPU with 24GB of memory (e.g., NVIDIA RTX A5000).

Quickstart

For a quick start, run python demo_relaxflow.py using test data:

FOLDER="test_data/A_bike_with_a_blue_front_wheel_and_a_red_rear_wheel"
OUTNAME=$(basename $FOLDER)
IMG=${FOLDER}/image.png
MSK=${FOLDER}/mask.png
# PRI="${FOLDER}/prior1.png ${FOLDER}/prior2.png ${FOLDER}/prior3.png ${FOLDER}/prior4.png"
PRI=${FOLDER}/prior.png
python demo_relaxflow.py --image $IMG --mask $MSK --prior-images $PRI --output-name $OUTNAME 

Another case:

FOLDER="test_data/dressing_table"
OUTNAME=$(basename $FOLDER)
IMG=${FOLDER}/input.png
PRI="${FOLDER}/prior1.png ${FOLDER}/prior2.png ${FOLDER}/prior3.png"
python demo_relaxflow.py --image $IMG --prior-images $PRI --output-name $OUTNAME 

Results will be saved into outputs/.

Benchmarks

For testing benchmarks **ExtremeOcc-3D** and **AmbiSem-3D**, please first download the datasets via [link](tbd). Then run `python demo_relaxflow_batch.py` using the prepared manifests:
python demo_relaxflow_batch.py \... #todo: publish the datasets and manifest files

License

This repository is built upon the SAM 3D Objects model as a backbone; both the original SAM 3D Objects code and the modifications in this repository are licensed under the SAM License.

Citing RelaxFlow

If you find our work useful, please use the following BibTeX entry.

< TODO: update bibtex here >

@article{zhu2026relaxflow,
  title={RelaxFlow: Text-Driven Amodal 3D Generation},
  author={Zhu, Jiayin and Fu, Guoji and Liu, Xiaolu and He, Qiyuan and Li, Yicong and Yao, Angela},
  journal={arXiv preprint},
  year={2026}
}

About

RelaxFlow: Text-Driven Amodal 3D Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages