Yixuan Yang1*, Zhen Luo2,3*, Wanshui Gan1*, Jinkun Hao1, Junru Lu4, Jinghao Yan1, Zhaoyang Lyu1, Xudong Xu1†
1Shanghai Artificial Intelligence Laboratory 2Shanghai Innovation Institute 3Southern University of Science and Technology 4University of Warwick
*Equal Contribution †Corresponding Author
Code-as-Room is an MLLM-based agentic framework equipped with a structured execution harness that represents 3D rooms with Blender code. Given a single top-down view image, the framework parses scene elements and their spatial relationships, and synthesizes executable Blender code for geometry, materials, and lighting through a principled, multi-stage pipeline.
The pipeline is agent-driven: LLM/VLM stages produce scene semantics, relation graphs, Blender layout code, object descriptions, detailed geometry, materials, texture prompts, and render settings. Deterministic code handles orchestration, validation, repair, memory, code integration, and several geometry/layout constraints.
This repository releases the Blender-code generation pipeline for Code-as-Room. The goal is to turn a top-down room image into an executable Blender scene by progressively converting visual evidence into structured scene understanding, object layout, geometry code, materials, textures, and render settings.
The current implementation focuses on code-synthesized room reconstruction. It includes isolated run directories, resumable stages, scene-type routing, major-object geometry refinement, material generation, optional texture generation, and final render-script generation.
The asset-retrieval data/checkpoints and 3D generation combination components described in the broader project plan are not included in this release yet. They will be released separately.
Stage 0 Scene classification
Stage 1 Spatial semantic analysis
Stage 2 Scene graph construction
Stage 3 Base Blender code generation
Stage 4 Wall objects and selected minor placeholders
Stage 5 Major object descriptions
Stage 6 Detailed geometry for major objects
Stage 7 Surface-based small-object placement
Stage 8 Detailed small-object descriptions (optional extension)
Stage 9 Detailed small-object geometry (optional extension)
Stage 10 Per-part PBR material generation
Stage 11 Real texture generation and injection
Stage 12 Render-ready lighting and render settings
Stages 8 and 9 are optional extensions in this codebase for giving generated small objects their own descriptions and composite geometry. They are not part of the main paper pipeline. They are off by default and only run when --detail-small-objects is enabled. If you do not need detailed small-object geometry, leave this option disabled.
python run_pipeline.py \
--image example/example1.png \
--detail-small-objects- Blender code generation release.
- Initial release of the core 3D room generation pipeline.
- Web-based editing and viewing interface.
- We plan to release a Web UI for editing generated scenes in the browser, with synchronization between the scene, the underlying code, and Blender.
- This interface is intended to reduce both time and token cost compared with post-hoc correction inside the agent loop :)
- 3D assets retrieval checkpoint release.
- Code-only geometry can be insufficient for representing fine-grained small objects in downstream applications such as robotics. We plan to release retrieval data and checkpoints to improve object-level realism and usability.
- Support for more diverse room shapes.
- The current pipeline works best on rectangular or near-rectangular rooms. We plan to improve support for irregular room layouts.
- Whole-floor-plan to 3D scene generation.
- The current release focuses on single-room reconstruction. We plan to extend the pipeline to handle multi-room floor plans.
- Benchmark release.
- Building and scaling the benchmark requires substantial time and token cost. We plan to expand it beyond the current internal version, but it is resource-intensive. If you are interested in collaborating on the benchmark, please feel free to contact us.
System:
- Python 3.10+
- Blender 3.6+ or Blender 4.x
- An OpenAI-compatible chat/VLM API endpoint for the text and vision stages
- Optional image-generation endpoint for Stage 11 texture generation
Python packages:
pip install langchain-openai langchain-core openai pillow requestsBlender supplies bpy, bmesh, and mathutils; install those by installing Blender, not with pip.
Clone the repository and install dependencies:
git clone <your-repo-url>
cd Code-as-Room_github
pip install langchain-openai langchain-core openai pillow requestsConfigure your API credentials with environment variables:
export SCENEGEN_MODEL="gemini-3.1-pro-preview-thinking"
export SCENEGEN_BASE_URL="https://your-openai-compatible-endpoint/v1"
export SCENEGEN_API_KEY="your-api-key"
# Optional: only needed for Stage 11 real texture generation.
export SCENEGEN_TEXTURE_MODEL="gemini-3-pro-image-preview"
export SCENEGEN_TEXTURE_BASE_URL="https://your-image-generation-endpoint"
export SCENEGEN_TEXTURE_API_KEY="your-texture-api-key"You can also put these values in a JSON config file instead of environment variables. Copy example/pipeline_config.example.json to a local file, then edit model, base_url, api_key, stage11_texture_model, stage11_texture_base_url, and stage11_texture_api_key there.
The example/ directory contains small inputs and a runnable config template:
example/example1.png,example/example2.jpeg,example/example3.png: sample top-down room references.example/pipeline_config.example.json: config-file template forrun_pipeline.py.example/run_*/: generated example outputs. These show the expected stage folders and final scripts.
The image_prompt_gen/ directory contains the image-prompt workflow:
image_prompt_gen/topdown_room_image_generator.py: generates top-down room image prompts and optionally calls an image-generation endpoint.image_prompt_gen/generated_prompts_example.json: example prompt JSON that can be used as input to image generation.
Run the full pipeline:
python run_pipeline.py --image example/example1.pngBy default, output is written next to the input image:
example/run_YYYYMMDD_HHMMSS_example1/
To choose a different output parent directory:
python run_pipeline.py \
--image example/example1.png \
--output-dir /path/to/output_rootThe final Blender script is:
<run_dir>/stage12_render/render_output.py
Render or inspect it with Blender:
/Applications/Blender.app/Contents/MacOS/Blender \
--python <run_dir>/stage12_render/render_output.pyIf Blender is not at the macOS path above, pass it to the pipeline:
python run_pipeline.py --image example/example1.png --blender /path/to/blenderInstead of passing many CLI flags, copy the example config:
cp example/pipeline_config.example.json my_config.jsonEdit my_config.json, then run:
python run_pipeline.py --config my_config.jsonThe API-related fields to edit are:
model,base_url,api_key: main OpenAI-compatible text/VLM endpoint used by most stages.stage11_texture_model,stage11_texture_base_url,stage11_texture_api_key: optional image-generation endpoint used by Stage 11 texture generation.image: input image path.blender: Blender executable path if your Blender is not at the default macOS location.
CLI arguments override config-file values:
python run_pipeline.py --config my_config.json --start 5 --end 12Config keys use the CLI option names with underscores instead of hyphens, for example:
{
"image": "example/example1.png",
"output_dir": "outputs",
"start": 1,
"end": 12,
"model": "gemini-3.1-pro-preview-thinking",
"base_url": "https://your-openai-compatible-endpoint/v1",
"api_key": "your-api-key",
"stage11_texture_model": "gemini-3-pro-image-preview",
"stage11_texture_base_url": "https://your-image-generation-endpoint",
"stage11_texture_api_key": "your-texture-api-key",
"wall_intensity": "subtle"
}Do not commit real API keys.
The repo includes a helper for creating synthetic top-down room images before running the 3D pipeline. It has three modes:
prompt: generate prompt JSON from a text/VLM model.image: generate images from an existing prompt JSON.all: generate prompt JSON, then generate images.
Generate prompt JSON only:
python image_prompt_gen/topdown_room_image_generator.py prompt \
--count 20 \
--model gpt-4o \
--api-key "$SCENEGEN_API_KEY" \
--base-url "$SCENEGEN_BASE_URL" \
--scene-scope non_residential \
--output image_prompt_gen/generated_prompts.jsonGenerate images from an existing prompt file:
python image_prompt_gen/topdown_room_image_generator.py image \
--prompts image_prompt_gen/generated_prompts_example.json \
--image-model gemini-3-pro-image-preview \
--api-key "$SCENEGEN_TEXTURE_API_KEY" \
--base-url "$SCENEGEN_TEXTURE_BASE_URL" \
--output-dir generated_images/example \
--aspect-ratio 16:9 \
--image-size 1KRun both steps in one command:
python image_prompt_gen/topdown_room_image_generator.py all \
--count 20 \
--prompt-model gpt-4o \
--image-model gemini-3-pro-image-preview \
--api-key "$SCENEGEN_TEXTURE_API_KEY" \
--base-url "$SCENEGEN_TEXTURE_BASE_URL" \
--scene-scope non_residential \
--output image_prompt_gen/generated_prompts.json \
--output-dir generated_images/non_residential \
--aspect-ratio 16:9 \
--image-size 1KThe prompt generator writes a JSON file with metadata and prompts. Each prompt item contains the original structured parameters plus the final image prompt string. The image generator writes PNGs named from the prompt id, room type, and style.
Important: --base-url for prompt mode is OpenAI-compatible chat style and may include /v1. --base-url for image mode is the image-generation proxy root; the script appends the Gemini v1beta/models/...:generateContent path internally.
The pipeline includes additional code generation for small objects beyond the original base-room layout:
- Stage 4 uses Stage 1/2 semantics to add wall-mounted or minor objects that may be absent from the base layout.
- Stage 7 parses Stage 6 detailed geometry, finds usable support surfaces, and adds small objects grounded in the reference image.
- Stage 7 writes
stage7_small_objects/small_objects.jsonand an updated Blender script. - With
--detail-small-objects, Stage 8 describes each added small object and Stage 9 replaces simple small-object primitives with compact composite geometry. - Stage 10/11 then assign materials and texture integrations over the combined major-object and small-object scene.
This is useful for lab benches, desks, shelves, kitchen counters, office tables, and other clutter-heavy scenes.
Run only the base scene:
python run_pipeline.py --image example/example1.png --end 4Resume a previous run from Stage 5:
python run_pipeline.py \
--image example/example1.png \
--run-dir <run_dir> \
--start 5 \
--end 12Run only material, texture, and render stages after geometry is ready:
python run_pipeline.py \
--image example/example1.png \
--run-dir <run_dir> \
--start 10 \
--end 12Disable image compression:
python run_pipeline.py --image example/example1.png --no-compressSet the wall texture style:
python run_pipeline.py --image example/example1.png --wall-intensity subtle
python run_pipeline.py --image example/example1.png --wall-intensity bold
python run_pipeline.py --image example/example1.png --wall-intensity mural_likeForce a scene type:
python run_pipeline.py --image example/example1.png --scene-type lab
python run_pipeline.py --image example/example1.png --scene-type residentialUse batch_run_pipeline.py to run run_pipeline.py over every image in a folder. This is the normal entry point for generated image batches.
Preview the batch without running stages:
python batch_run_pipeline.py \
--images-dir example \
--label example \
--model-tag local-test \
--dry-runRun a batch sequentially:
python batch_run_pipeline.py \
--images-dir generated_images/non_residential \
--label non_residential \
--model-tag gemini31 \
--parallel 16 \
--max-concurrent 1Run multiple images at the same time:
python batch_run_pipeline.py \
--images-dir generated_images/non_residential \
--label non_residential \
--model-tag gemini31 \
--parallel 16 \
--max-concurrent 2Output is organized as:
<output-root>/<model-tag>/<label>/run_YYYYMMDD_HHMMSS_<image_stem>/
By default, output-root is:
agent_utils/pipeline_output/
Use a custom output root when running large batches:
python batch_run_pipeline.py \
--images-dir generated_images/non_residential \
--output-root /path/to/CAR3D_output \
--label non_residential \
--model-tag gemini31 \
--max-concurrent 2Batch options to keep straight:
--parallel: internal Stage 6 geometry worker count for one pipeline run.--max-concurrent: number of images/pipelines running at the same time.--label: dataset or image class folder under the output bucket.--model-tag: filesystem-safe model folder name. Use this even when the actual model name is long.--stop-on-error: stop the batch after the first failed image.--quiet: reduce per-stage logs. In parallel mode, each image writes its full log to<run_dir>/run.log.
Start conservatively. On a laptop, --max-concurrent 4 or 6 is usually safer because each pipeline may call LLM APIs and spawn Blender.
List historical runs under the default workspace output folder:
python run_pipeline.py --list-runsShow memory status for a run:
python run_pipeline.py --status --run-dir <run_dir>Clear all memory for a run:
python run_pipeline.py --clear-memory --run-dir <run_dir>Clear one stage and rerun from there:
python run_pipeline.py --clear-stage stage7_small_objects --run-dir <run_dir>
python run_pipeline.py --image example/example1.png --run-dir <run_dir> --start 7A typical run directory contains:
run_YYYYMMDD_HHMMSS_image/
├── agent_memory.jsonl
├── run_config.json
├── compressed_images/
├── stage1/
├── stage2/
├── stage3/
├── stage4/
├── stage5_describe/
├── stage6_geometry/
├── stage7_small_objects/
├── stage8_small_describe/ # only when --detail-small-objects is used
├── stage9_small_geometry/ # only when --detail-small-objects is used
├── stage10_material/
├── stage11_texture/
└── stage12_render/
Important final artifacts:
stage12_render/render_output.py: final render-ready Blender script.stage11_texture/images/: generated texture maps.stage11_texture/texture_manifest.json: texture-generation manifest.stage10_material/material_config.json: generated material configuration.stage7_small_objects/small_objects.json: added small objects and placement data.
This repository is licensed under the Apache License 2.0.
If you find our work useful, please consider citing:
@article{yang2026codeasroom,
title={Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis},
author={Yang, Yixuan and Luo, Zhen and Gan, Wanshui and Hao, Jinkun and Lu, Junru and Yan, Jinghao and Lyu, Zhaoyang and Xu, Xudong},
journal={arXiv preprint arXiv:2605.18451},
year={2026}
}