Skip to content

weihaosky/MFuseNet

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

MFuseNet

This is the official implementation code for MFuseNet. For technical details, please refer to :

MFuseNet: Robust Depth Estimation with Learned Multiscopic Fusion
Weihao Yuan, Rui Fan, Michael Yu Wang, Qifeng Chen
ICRA2020, RA-L
[Paper] [Project Page]

Bibtex

If you find this code useful, please consider citing:

@article{yuan2020mfusenet,
  title={MFuseNet: Robust Depth Estimation With Learned Multiscopic Fusion},
  author={Yuan, Weihao and Fan, Rui and Wang, Michael Yu and Chen, Qifeng},
  journal={IEEE Robotics and Automation Letters},
  volume={5},
  number={2},
  pages={3113--3120},
  year={2020},
  publisher={IEEE}
}

Contents

  1. Environment Setup
  2. Data Preparation
  3. Train

Environment setup

This code has been tested on Ubuntu 16.04, CUDA 9.0, two GTX 1080 Ti GPUs.

Dependencies:

  • Python2.7
  • PyTorch (0.4.0+)
  • torchvision (0.2.0+)
  • os, time, numpy, argparse, cv2, matplotlib, PIL

Data Preparation

The input of the network are the cost volumes obtained by cost calculation step in stereo matching algorithms. They can be calculated by block matching, semi-global matching, graph cuts, deep-network-based methods, etc. The default costs are obtained by MC-CNN. Please refer to MC-CNN for computing the cost volumes.

The training data for three-view fusion are organized as follows:

dataset/
    TRAIN/
        scene1/
            view0.png
            view1.png
            view2.png
            disp1.png
            left.bin
            right.bin
    TEST/
    EVAL/

The view0.png, view1.png, view2.png are the color images of the left, center, and right view. The disp1.png is the ground-truth disparity map for view1. The left.bin and right.bin are the cost volumes obtained by MC-CNN for the matching between the left, right view and the center view.

For five-view fusion, there are additional view3.png for the bottom view and view4.png for the top view, and their corresponding cost volumes bottom.bin and top.bin.

Example data are available here.

Train

. train.sh

Pretrained Models

Five views, four costs fusion

Model_5view

Three views, two costs fusion

Model_3view

Results on Middlebury 2006:

Model AvgErr RMS Bad 0.5 Bad 1 Bad 2
Model_3view 0.250 1.036 4.08% 1.83% 1.15%

License

Licensed under an MIT license.

About

Official implementation code for MFuseNet

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published