Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration
Arxiv | Project Page | Video
- Ubuntu version >= 20.04 (for using Blender 3.+)
- Metashape software (not necessary, we provide recalibrated poses)
- OpenAI API Key
First clone this repo recursively.
git clone https://github.com/yifanlu0227/ChatSim.git --recursive
conda create -n chatsim python=3.8
conda activate chatsim
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt
imageio_download_bin freeimage
The installation is the same as F2-NeRF. Please go through the following steps.
cd chatsim/background/mcnerf/
# mcnerf use the same data directory.
ln -s ../../../data .
For Debian based Linux distributions:
sudo apt install zlib1g-dev
For Arch based Linux distributions:
sudo pacman -S zlib
Taking torch-1.13.1+cu117
for example.
cd chatsim/background/mcnerf
cd External
wget https://download.pytorch.org/libtorch/cu117/libtorch-cxx11-abi-shared-with-deps-1.13.1%2Bcu117.zip
unzip ./libtorch-cxx11-abi-shared-with-deps-1.13.1+cu117.zip
rm ./libtorch-cxx11-abi-shared-with-deps-1.13.1+cu117.zip
The lowest g++ version is 7.5.0.
cd ..
cmake . -B build
cmake --build build --target main --config RelWithDebInfo -j
cd ../inpainting/Inpaint-Anything/
python -m pip install -e segment_anything
Go here to download pretrained_models, put the directory into ./
and get ./pretrained_models
. Additionally, download pretrain, put the directory into ./pytracking
as ./pytracking/pretrain
.
cd ../latent-diffusion
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip
pip install -e .
# download pretrained ldm
wget -O models/ldm/inpainting_big/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=1
We tested with Blender 3.5.1. Note that Blender 3+ requires Ubuntu version >= 20.04.
cd ../../Blender
wget https://download.blender.org/release/Blender3.5/blender-3.5.1-linux-x64.tar.xz
tar -xvf blender-3.5.1-linux-x64.tar.xz
rm blender-3.5.1-linux-x64.tar.xz
locate the internal Python of Blender, for example blender-3.5.1-linux-x64/3.5/python/bin/python3.10
export blender_py=$PWD/blender-3.5.1-linux-x64/3.5/python/bin/python3.10
cd utils
# install dependency (use the -i https://pypi.tuna.tsinghua.edu.cn/simple if you are in the Chinese mainland)
$blender_py -m pip install -r requirements.txt
$blender_py -m pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
$blender_py setup.py develop
If you want to train the skydome model, follow the READMD in chatsim/foreground/mclight/skydome_lighting/readme.md
. You can download our provided skydome HDRI in the next section and start the simulation.
mkdir data
mkdir data/waymo_tfrecords
mkdir data/waymo_tfrecords/1.4.2
Download the waymo perception dataset v1.4.2 to the data/waymo_tfrecords/1.4.2
. In the google cloud console, a correct folder path is waymo_open_dataset_v_1_4_2/individual_files/training
or waymo_open_dataset_v_1_4_2/individual_files/validation
. Static scene are listed here
Static waymo scenes in training set
- segment-10676267326664322837_311_180_331_180_with_camera_labels
- segment-11379226583756500423_6230_810_6250_810_with_camera_labels
- segment-1172406780360799916_1660_000_1680_000_with_camera_labels
- segment-12879640240483815315_5852_605_5872_605_with_camera_labels
- segment-13085453465864374565_2040_000_2060_000_with_camera_labels
- segment-13142190313715360621_3888_090_3908_090_with_camera_labels
- segment-13196796799137805454_3036_940_3056_940_with_camera_labels
- segment-13238419657658219864_4630_850_4650_850_with_camera_labels
- segment-13469905891836363794_4429_660_4449_660_with_camera_labels
- segment-14004546003548947884_2331_861_2351_861_with_camera_labels
- segment-14333744981238305769_5658_260_5678_260_with_camera_labels
- segment-14348136031422182645_3360_000_3380_000_with_camera_labels
- segment-14424804287031718399_1281_030_1301_030_with_camera_labels
- segment-14869732972903148657_2420_000_2440_000_with_camera_labels
- segment-15221704733958986648_1400_000_1420_000_with_camera_labels
- segment-15270638100874320175_2720_000_2740_000_with_camera_labels
- segment-15349503153813328111_2160_000_2180_000_with_camera_labels
- segment-15365821471737026848_1160_000_1180_000_with_camera_labels
- segment-15868625208244306149_4340_000_4360_000_with_camera_labels
- segment-16345319168590318167_1420_000_1440_000_with_camera_labels
- segment-16470190748368943792_4369_490_4389_490_with_camera_labels
- segment-16608525782988721413_100_000_120_000_with_camera_labels
- segment-16646360389507147817_3320_000_3340_000_with_camera_labels
- segment-17761959194352517553_5448_420_5468_420_with_camera_labels
- segment-3425716115468765803_977_756_997_756_with_camera_labels
- segment-3988957004231180266_5566_500_5586_500_with_camera_labels
- segment-4058410353286511411_3980_000_4000_000_with_camera_labels
- segment-8811210064692949185_3066_770_3086_770_with_camera_labels
- segment-9385013624094020582_2547_650_2567_650_with_camera_labels
Static waymo scenes in validation set
- segment-10061305430875486848_1080_000_1100_000_with_camera_labels
- segment-10247954040621004675_2180_000_2200_000_with_camera_labels
- segment-10275144660749673822_5755_561_5775_561_with_camera_labels
After downloading tfrecords, you should see folder structure like
data
|-- ...
|-- ...
`-- waymo_tfrecords
`-- 1.4.2
|-- segment-10247954040621004675_2180_000_2200_000_with_camera_labels.tfrecord
|-- segment-11379226583756500423_6230_810_6250_810_with_camera_labels.tfrecord
|-- ...
`-- segment-1172406780360799916_1660_000_1680_000_with_camera_labels.tfrecord
We extract the images, camera poses, LiDAR file, etc. out of the tfrecord files with the data_utils/process_waymo_script.py
.
cd data_utils
python process_waymo_script.py --waymo_data_dir=../data/waymo_tfrecords/1.4.2 --nerf_data_dir=../data/waymo_multi_view
This will generate the data folder data/waymo_multi_view
.
You can download our recalibration files from here. After everything is done, you should see
If you want to do the recalibration yourself, you need to use Metashape to calibrate images in the data/waymo_multi_view/{SCENE_NAME}/images
folder and convert them back to the waymo world coordinate. Please follow the tutorial in data_utils/README.md
. And the final camera extrinsics and intrinsics are stored as cam_meta.npy
and poses_bounds.npy
.
data
`-- waymo_multi_view
|-- ...
`-- segment-1172406780360799916_1660_000_1680_000_with_camera_labels
|-- 3d_boxes.npy # 3d bounding boxes of the first frame
|-- images # a clip of waymo images used in chatsim (typically 40 frames)
|-- images_all # full waymo images (typically 198 frames)
|-- map.pkl # map data of this scene
|-- point_cloud # point cloud file of the first frame
|-- camera.xml # relibration file from Metashape
|-- cams_meta.npy # Camera ext&int calibrated by metashape and transformed to waymo coordinate system.
|-- poses_bounds.npy # Camera ext&int calibrated by metashape and transformed to waymo coordinate system (for mcnerf training)
|-- poses_bounds_metashape.npy # Camera ext&int calibrated by metashape (intermediate file, not required)
|-- poses_bounds_waymo.npy # Camera ext&int from original waymo dataset (intermediate file, not required)
|-- shutters # normalized exposure time (mean=0 std=1)
|-- tracking_info.pkl # tracking data
`-- vehi2veh0.npy # transformation matrix from i-th frame to the first frame.
If you want to get smoother and more realistic trajectories, you can install the trajectory module and change the parameter motion_agent-motion_tracking to True in .yaml file. For installation(both code and pretrained model), you can run the following commands in terminal
pip install frozendict gym==0.26.2 stable-baselines3[extra] protobuf==3.20.1
cd chatsim/foreground
git clone --recursive git@github.com:MARMOTatZJU/drl-based-trajectory-tracking.git -b v1.0.0
cd drl-based-trajectory-tracking
source setup-minimum.sh
Then switch to the original directory to run main.py. And if the parameter motion_agent-motion_tracking is set as True, each trajectory will be tracked by this module to make it smoother and more realistic.
You need to train the McNeRF model for each scene as well as the McLight's skydome estimation network. To get started quickly, you can download our skydome estimation and some Blender 3D assets.
- Skydome HDRI. Download and put them in
data/waymo_skydome
- Blender Assets. Download and put them in
data/blender_assets
. Our 3D models are collected from the Internet. We tried our best to contact the author of the model and ensure that copyright issues are properly dealt with (our open source projects are not for profit). If you are the author of a model and our behavior infringes your copyright, please contact us immediately and we will delete the model.
cd chatsim/background/mcnerf
Make sure you have the data
folder linking to ../../../data
, and train your model with
python scripts/run.py --config-name=wanjinyou_big \
dataset_name=waymo_multi_view case_name=${CASE_NAME} \
exp_name=${EXP_NAME} dataset.shutter_coefficient=0.15 mode=train_hdr_shutter +work_dir=$(pwd)
where ${CASE_NAME}
are those like segment-11379226583756500423_6230_810_6250_810_with_camera_labels
and ${EXP_NAME}
can be anything like exp_coeff_0.15
. dataset.shutter_coefficient = 0.15
or dataset.shutter_coefficient = 0.3
work well.
You can simply run scripts like bash train-1137.sh
for training and bash render_novel_view-1137.sh
for testing.
export OPENAI_API_KEY=<your api key>
Now you can start the simulation with
python main.py -y ${CONFIG YAML} \
-p ${PROMPT} \
-s ${SIMULATION NAME}
-
${CONFIG YAML}
specifies the scene information, and yamls are stored inconfig
folder. e.g.config/waymo-1137.yaml
. -
${PROMPT}
is your input prompt, which should be wrapped in quotation marks. e.g.add a straight driving car in the scene
. -
${SIMULATION NAME}
determines the name of the folder when saving results. defaultdemo
.
You can try
python main.py -y config/waymo-1137.yaml -p 'add a straight driving car in the scene' -s demo
The rendered results are saved in results/1137_demo_%Y_%m_%d_%H_%M_%S
. Intermediate files are saved in results/cache/1137_demo_%Y_%m_%d_%H_%M_%S
for debug and visualization if save_cache
are enabled in config/waymo-1137.yaml
.
config/waymo-1137.yaml
contains the detailed explanation for each entry. We will give some extra explanation. Suppose the yaml is read into config_dict
:
-
config_dict['scene']['is_wide_angle']
determines the rendering view. If set toTrue
, we will expand waymo's intrinsics (width -> 3 x width) to render wide-angle images. Also note thatis_wide_angle = True
comes withrendering_mode = 'render_wide_angle_hdr_shutter'
;is_wide_angle = False
comes withrendering_mode = 'render_hdr_shutter'
-
config_dict['scene']['frames']
the frame number for rendering. -
config_dict['agents']['background_rendering_agent']['nerf_quiet_render']
will determine whether to print the output of mcnerf to the terminal. Set toFalse
for debug use. -
config_dict['agents']['foreground_rendering_agent']['use_surrounding_lighting']
defines whether we use the surrounding lighting. Currentlyuse_surrounding_lighting = True
only takes effect when merely one vehicle is added, because HDRI is a global illumination in Blender. It is difficult to set a separate HDRI for each car.use_surrounding_lighting = True
can also lead to slow rendering, since it will call nerf#frame
times. We set it toFalse
in each default yaml. -
config_dict['agents']['foreground_rendering_agent']['skydome_hdri_idx']
is the filename (w.o. extension) we choose fromdata/waymo_skydome/${SCENE_NAME}/
. It is the skydome HDRI estimation from the first frame('000'
) by default, but you can manually select a better estimation from another frame. To view the HDRI, we recommend the VERIV for vscode and tev for desktop environment.
Go to chatsim/foreground/mclight/skydome_lighting
and follow chatsim/foreground/mclight/skydome_lighting/readme.md
for the training.
- arxiv paper release
- code and model release
- motion tracking module drl-based-trajectory-tracking (to smooth trajectory)
- multi-round wrapper code
@InProceedings{wei2024editable,
title={Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents},
author={Yuxi Wei and Zi Wang and Yifan Lu and Chenxin Xu and Changxing Liu and Hao Zhao and Siheng Chen and Yanfeng Wang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month={June},
year={2024},
}