CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer(MMM‘24 Oral)

📌 Introduction

In this paper, we propose a novel cross-scale transformer (CT) that processes feature representations at different stages without additional computation. Specifically, we introduce an adaptive matching-aware transformer (AMT) that employs different interactive attention combinations at multiple scales. This combined strategy enables our network to capture intra-image context information and enhance inter-image feature relationships. Besides, we present a dual-feature guided aggregation (DFGA) that embeds the coarse global semantic information into the finer cost volume construction to further strengthen global and local feature awareness. Meanwhile, we design a feature metric loss (FM Loss) that evaluates the feature bias before and after transformation to reduce the impact of feature mismatch on depth estimation. Extensive experiments on DTU dataset and Tanks and Temples benchmark demonstrate that our method achieves state-of-the-art results.

🌑 Preparation

✔ Repo & Environment

Our code is tested with Python==3.8, PyTorch==1.9.0, CUDA==10.2 on Ubuntu-18.04 with NVIDIA GeForce RTX 2080Ti.

To use CT-MVSNet, clone this repo:

git clone https://github.com/wscstrive/CT-MVSNet.git
cd CT-MVSNet

Use the following commands to build the conda environment.

conda create -n ctmvsnet python=3.6
conda activate ctmvsnet
pip install -r requirements.txt

✔ Datasets

In TransMVSNet, we mainly use DTU, BlendedMVS and Tanks and Temples to train and evaluate our models. You can prepare the corresponding data by following the instructions below.

DTU Dataset

For DTU training set, you can download the preprocessed DTU training data and Depths_raw (both from Original MVSNet), and unzip them to construct a dataset folder like:

dtu_training
 ├── Cameras
 ├── Depths
 ├── Depths_raw
 └── Rectified

For DTU testing set, you can download the preprocessed DTU testing data (from Original MVSNet) and unzip it as the test data folder, which should contain one cams folder, one images folder and one pair.txt file.

BlendedMVS Dataset

We use the low-res set of BlendedMVS dataset for both training and testing. You can download the low-res set from orignal BlendedMVS and unzip it to form the dataset folder like below:

BlendedMVS
 ├── 5a0271884e62597cdee0d0eb
 │     ├── blended_images
 │     ├── cams
 │     └── rendered_depth_maps
 ├── 59338e76772c3e6384afbb15
 ├── 59f363a8b45be22330016cad
 ├── ...
 ├── all_list.txt
 ├── training_list.txt
 └── validation_list.txt

Tanks and Temples Dataset

Download our preprocessed Tanks and Temples dataset and unzip it to form the dataset folder like below:

tankandtemples
 ├── advanced
 │  ├── Auditorium
 │  ├── Ballroom
 │  ├── ...
 │  └── Temple
 └── intermediate
        ├── Family
        ├── Francis
        ├── ...
        └── Train

🌒 Training

✔ Training on DTU

Set the configuration in scripts/train.sh:

Set MVS_TRAINING as the path of DTU training set.
Set LOG_DIR to save the checkpoints.
Change NGPUS to suit your device.
We use torch.distributed.launch by default.

To train your own model, just run:

bash scripts/train.sh

You can conveniently modify more hyper-parameters in scripts/train.sh according to the argparser in train.py, such as summary_freq, save_freq, and so on.

✔ Finetune on BlendedMVS

For a fair comparison with other SOTA methods on Tanks and Temples benchmark, we finetune our model on BlendedMVS dataset after training on DTU dataset.

Set the configuration in scripts/train_bld_fintune.sh:

Set MVS_TRAINING as the path of BlendedMVS dataset.
Set LOG_DIR to save the checkpoints and training log.
Set CKPT as path of the loaded .ckpt which is trained on DTU dataset.

To finetune your own model, just run:

bash scripts/train_bld_fintune.sh

🌓 Testing

✔ Testing on DTU

Important Tips: to reproduce our reported results, you need to:

compile and install the modified gipuma from Yao Yao as introduced below
use the latest code as we have fixed tiny bugs and updated the fusion parameters
make sure you install the right version of python and pytorch, use some old versions would throw warnings of the default action of align_corner in several functions, which would affect the final results
be aware that we only test the code on 2080Ti and Ubuntu 18.04, other devices and systems might get slightly different results
make sure that you use the *.ckpt for testing

To start testing, set the configuration in scripts/test_dtu.sh:

Set TESTPATH as the path of DTU testing set.
Set TESTLIST as the path of test list (.txt file).
Set CKPT_FILE as the path of the model weights.
Set OUTDIR as the path to save results.

Run:

bash scripts/test_dtu.sh

To install the gipuma, clone the modified version from Yao Yao. Modify the line-10 in CMakeLists.txt to suit your GPUs. Othervise you would meet warnings when compile it, which would lead to failure and get 0 points in fused point cloud. For example, if you use 2080Ti GPU, modify the line-10 to:

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-O3 --use_fast_math --ptxas-options=-v -std=c++11 --compiler-options -Wall -gencode arch=compute_70,code=sm_70)

If you use other kind of GPUs, please modify the arch code to suit your device (arch=compute_XX,code=sm_XX). Then install it by cmake . and make, which will generate the executable file at FUSIBILE_EXE_PATH. Please note

For quantitative evaluation on DTU dataset, download SampleSet and Points. Unzip them and place Points folder in SampleSet/MVS Data/. The structure looks like:

SampleSet
├──MVS Data
      └──Points

In DTU-MATLAB/BaseEvalMain_web.m, set dataPath as path to SampleSet/MVS Data/, plyPath as directory that stores the reconstructed point clouds and resultsPath as directory to store the evaluation results. Then run DTU-MATLAB/BaseEvalMain_web.m in matlab.

DTU Dataset	Acc. ↓	Comp. ↓	Overall ↓
CT-MVSNet	0.341	0.264	0.302

✔ Testing on Tanks and Temples

We recommend using the finetuned models *.ckpt to test on Tanks and Temples benchmark.

Similarly, set the configuration in scripts/test_tnt.sh:

Set TESTPATH as the path of intermediate set or advanced set.
Set TESTLIST as the path of test list (.txt file).
Set CKPT_FILE as the path of the model weights.
Set OUTDIR as the path to save resutls.

To generate point cloud results, just run:

bash scripts/test_tnt.sh

Note that：

The parameters of point cloud fusion have not been studied thoroughly and the performance can be better if cherry-picking more appropriate thresholds for each of the scenes.
The dynamic fusion code is borrowed from AA-RMVSNet.

For quantitative evaluation, you can upload your point clouds to Tanks and Temples benchmark.

T&T (Intermediate)	Mean ↑	Family	Francis	Horse	Lighthouse	M60	Panther	Playground	Train
CT-MVSNet	64.28	81.20	65.09	56.95	62.60	63.07	64.83	61.82	58.68

T&T (Advanced)	Mean ↑	Auditorium	Ballroom	Courtroom	Museum	Palace	Temple
CT-MVSNet	38.03	28.37	44.61	34.83	46.51	34.69	39.15

🔗 Citation

@inproceedings{wang2024ct,
  title={CT-MVSNet: Efficient Multi-view Stereo with Cross-Scale Transformer},
  author={Wang, Sicheng and Jiang, Hao and Xiang, Lei},
  booktitle={International Conference on Multimedia Modeling},
  pages={394--408},
  year={2024},
  organization={Springer}
}

💌 Acknowledgments

We borrow some code from CasMVSNet, TransMVSNet. We thank the authors for releasing the source code.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
DTU-MATLAB		DTU-MATLAB
asserts		asserts
datasets		datasets
lists		lists
models		models
scripts		scripts
LICENSE		LICENSE
README.md		README.md
dynamic_fusion.py		dynamic_fusion.py
finetune.py		finetune.py
gipuma.py		gipuma.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer(MMM‘24 Oral)

📌 Introduction

🌑 Preparation

✔ Repo & Environment

✔ Datasets

DTU Dataset

BlendedMVS Dataset

Tanks and Temples Dataset

🌒 Training

✔ Training on DTU

✔ Finetune on BlendedMVS

🌓 Testing

✔ Testing on DTU

✔ Testing on Tanks and Temples

🔗 Citation

💌 Acknowledgments

About

Releases

Packages

Languages

License

wscstrive/CT-MVSNet

Folders and files

Latest commit

History

Repository files navigation

CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer(MMM‘24 Oral)

📌 Introduction

🌑 Preparation

✔ Repo & Environment

✔ Datasets

DTU Dataset

BlendedMVS Dataset

Tanks and Temples Dataset

🌒 Training

✔ Training on DTU

✔ Finetune on BlendedMVS

🌓 Testing

✔ Testing on DTU

✔ Testing on Tanks and Temples

🔗 Citation

💌 Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages