Text2World

Information

A Text2World pipeline based on 3D Gaussian Splatting and video generation technology.

Features:

The first open-source pipeline combining T2V models with Mamba-Transformer architecture LRM
More efficient LRM reconstruction model: Composed of a video VAE encoder and Mamba-Transformer structure, enabling the LRM model to process more images while consuming less GPU memory
Provides two pathways for 3DGS construction:
- Normal Path: Decodes latents and post-processes the video, then re-encodes to generate video latents
- Remap Path (Experimental): Directly maps video latents to decoder-generated latents using a remap model, eliminating unnecessary decoder-encoder and post-processing steps, preparing for future end-to-end training

BENCHMARK

To validate the pipeline's effectiveness and LRM model performance, we created evaluation datasets:

Public dataset: Evaluates LRM's reconstruction performance in real-world scenarios.（from ac3d, re10k test set with 1980 scenes）

Dataset	PSNR	SSIM	LPIPS
Public	29.34	0.87	0.205

Example

Render	Render	Render

Dataset

Data sources:

Data processing methods:

Pose data preprocessing: pixelsplat
Caption generation: VideoX-Fun or CameraCtrl

Inference

Considering the differences between diffusion latents and re-encoded VAE latents from post-processed videos, we provide two inference pipelines:

nonmap_pipeline.py: Processes diffusion-generated videos through post-processing before feeding to latentLRM for rendering
remap_pipeline.py: Remaps diffusion-generated latents to reduce discrepancies

Usage:

nonmap_pipeline.py: Takes the post-processed video generated by the diffusion model as input and feeds it to the latentLRM model for rendering. (Recommendation!!!)
remap_pipeline.py: Performs latent remapping on the diffusion-generated latent to mitigate these differences.

Command-Line Arguments

$pose_folder: Pose folder similar to RE10K
$prompt_txt: List of prompts
$MODEL_PATH: Video generation model parameters
$ckpt_path: ControlNet model parameters
$lrm_weight: LRM model parameters
$remap_weight (optional): Remap model parameters
$out_dir: Output directory

Non-Mapping Pipeline

python generate_nonmap_api.py \  
    --prompt $prompt_txt \   
    --lrm_weight $lrm_weight \  
    --pose_folder \  
    --base_model_path $MODEL_PATH \  
    --controlnet_model_path $ckpt_path \  
    --output_path $out_dir \  
    --start_camera_idx 0 \  
    --end_camera_idx 7 \  
    --stride_min 2 \  
    --stride_max 2 \  
    --height 480 \  
    --width 720 \  
    --controlnet_weights 1.0 \  
    --controlnet_guidance_start 0.0 \  
    --controlnet_guidance_end 0.4 \  
    --controlnet_transformer_num_attn_heads 4 \  
    --controlnet_transformer_attention_head_dim 64 \  
    --controlnet_transformer_out_proj_dim_factor 64 \  
    --num_inference_steps 20

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Evaluation		Evaluation
assets		assets
cogvideox-controlnet		cogvideox-controlnet
latentLRM		latentLRM
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
generate_nonmap_api.py		generate_nonmap_api.py
generate_remap_api.py		generate_remap_api.py
nomap_pipeline.py		nomap_pipeline.py
remap_pipeline.py		remap_pipeline.py
train_remap.py		train_remap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text2World

Information

BENCHMARK

Example

Dataset

Inference

Command-Line Arguments

Non-Mapping Pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

MooreThreads/text2world

Folders and files

Latest commit

History

Repository files navigation

Text2World

Information

BENCHMARK

Example

Dataset

Inference

Command-Line Arguments

Non-Mapping Pipeline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages