Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency

This repository contains the official implementation of Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency presented as an oral at the 3D Scene Understanding for Vision, Graphics, and Robotics workshop, CVPR 2019.

Project Page | Slides | Poster

teaser Our model consumes a collection of calibrated images of a scene from multiple views and produces depth maps for every such view. We show that this depth prediction model can be trained in an unsupervised manner using our robust photo consistency loss. The predicted depth maps are then fused together into a consistent 3D reconstruction which closely resembles and often improves upon the sensor scanned model. Left to Right: Input images, predicted depth maps, our fused 3D reconstruction, ground truth 3D scan.


We present a learning based approach for multi-view stereopsis (MVS). While current deep MVS methods achieve impressive results, they crucially rely on ground-truth 3D training data, and acquisition of such precise 3D geometry for supervision is a major hurdle. Our framework instead leverages photometric consistency between multiple views as supervisory signal for learning depth prediction in a wide baseline MVS setup. However, naively applying photo consistency constraints is undesirable due to occlusion and lighting changes across views. To overcome this, we propose a robust loss formulation that: a) enforces first order consistency and b) for each point, selectively enforces consistency with some views, thus implicitly handling occlusions. We demonstrate our ability to learn MVS without 3D supervision using a real dataset, and show that each component of our proposed robust loss results in a significant improvement. We qualitatively observe that our reconstructions are often more complete than the acquired ground truth, further showing the merits of this approach. Lastly, our learned model generalizes to novel settings, and our approach allows adaptation of existing CNNs to datasets without ground-truth 3D by unsupervised finetuning.



For a set of images of a scene, a given point in a source image may not be visible across all other views.



The predicted depth map from the network, along with the reference image are used to warp and calculate a loss map for each of M non-reference neighboring views. These M loss maps are then concatenated into a volume of dimension H × W × M, where H and W are the image dimensions. This volume is used to perform a pixel-wise selection to pick the K “best” (lowest loss) values, along the 3rd dimension of the volume (i.e. over the M loss maps), using which we take the mean to compute our robust photometric loss.



Pre-trained model weights and outputs (depth maps, point clouds, 3D ply files) for the DTU dataset can be downloaded from Google Drive.



To check out the source code:

git clone

Install CUDA 9.0, CUDNN 7.0 and Python 2.7. Please note that this code has not been tested with other versions and may likely need changes for running with different versions of libraries. This code is also not optimized for multi-GPU execution.

Recommended conda environment:

conda create -n mvs python=2.7 pip
conda activate mvs
pip install -r requirements.txt

Parts of the code for this project are borrowed and modified from the excellent MVSNet repository. Please follow instructions detailed there for downloading and processing data, and for installing fusibile which is used for fusion of the depth maps into a point cloud.


  • Download the preprocessed DTU training data.
  • Create directories for saving logs, model checkpoints and intermediate outputs.
  • Enter the code/unsup_mvsnet/ folder.
  • Train the model on DTU by using appropriate paths as defined by the flags.
python --dtu_data_root <YOUR_PATH> --log_dir <YOUR_PATH> --save_dir <YOUR_PATH> --save_op_dir <YOUR_PATH>


  • Specify dtu_data_root to be the folder where you downloaded the training data. If the data was downloaded to MVS folder, then the path here will be MVS/mvs_training/dtu/.
  • Specify log_dir, save_dir and save_op_dir to the corresponding directories that you created above for saving logs, model checkpoints and intermediate outputs.
  • For training the model with higher depth resolution, note that you have to change both max_d and interval_scale values inversely i.e. 2 * max_d --> 0.5 * interval_scale


  • Download the preprocessed DTU testing data.
  • Enter the code/unsup_mvsnet/ folder.
  • In order to generate the depth predictions from the model, run the test script with appropriate paths.
python --dense_folder <YOUR_PATH> --output_folder <YOUR_PATH> --model_dir <YOUR_PATH> --ckpt_step 45000


  • Specify dense_folder to be the folder where you downloaded the testing data. If the data was downloaded to MVS folder, then the path here will be MVS/mvs_testing/dtu/.
  • Specify output_folder to be the folder where you would like to store the generated depth map outputs.
  • Specify model_folder to be the folder where the model checkpoints are stored.
  • You can specify ckpt_step for the checkpoint step at which you want to test the model. Default is 45000.
  • Next, install fusibile as follows:
    • Check out the modified version fusibile with git clone
    • Enter the clone repository folder.
    • Install fusibile by cmake . and make, which will generate the executable at FUSIBILE_EXE_PATH which is same as the clone repository.
    • Next, we have to perform post processing in the form of depth fusion to create the final 3D point cloud. In the same folder, run the depth fusionn script as shown below.
python --model_folder <YOUR_PATH> --image_folder <YOUR_PATH> --cam_folder <YOUR_PATH> --fusibile_exe_path <YOUR_PATH>


  • Specify model_folder to be the folder where the model checkpoints are stored.
  • Specify image_folder and cam_folder to be the same path as that for dense_folder above.
  • fusibile_exe_path is the FUSIBILE_EXE_PATH.
  • The final point cloud is stored in MVS/mvs_testing/dtu/points_mvsnet/consistencyCheck-TIME/final3d_model.ply.
  • The final ply files can be visualized in an application such as MeshLab.


If you find this work useful, please cite the paper.

  title={Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency},
  author={Khot, Tejas and Agrawal, Shubham and Tulsiani, Shubham and Mertz, Christoph and Lucey, Simon and Hebert, Martial},
  journal={arXiv preprint arXiv:1905.02706},

Please consider also citing MVSNet if you found the code useful.


Code for paper: Learning Unsupervised Multi-View Stereopsis via Robust Photometric Consistency




No releases published


No packages published