Skip to content

SwiftDepth: An Efficient Hybrid CNN-Transformer Model for Self-Supervised Monocular Depth Estimation on Mobile Devices

License

Notifications You must be signed in to change notification settings

xapaxca/swiftdepth

Repository files navigation

SwiftDepth

SwiftDepth: An Efficient Hybrid CNN-Transformer Model for Self-Supervised Monocular Depth Estimation on Mobile Devices

2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)

Results on KITTI dataset

Model AbsRel SqRel RMSE RMSElog δ < 1.25 δ < 1.25^2 δ < 1.25^3 MParam GMACs
SwiftDepth-small 0.110 0.830 4.700 0.187 0.882 0.962 0.982 3.6 3.6
SwiftDepth 0.107 0.790 4.643 0.182 0.888 0.963 0.983 6.4 4.9

Model setup: single frame, monocular training, ResNet18-based pose network, no additional supervision, no post-processing, image resolution of 192x640, and pre-training solely on ImageNet.

Preparation

Main dependencies are listed in the requirements.txt.

Refer to Monodepth2 for KITTI dataset preparation.

Training

SwiftDepth

python train.py --data_path D:\kitti_dataset\jpg --log_dir LOG_DIR_PATH --model_name SwiftDepth_S_run --split eigen_zhou --num_workers 8 --eval_mono --pose_model_type separate_resnet --learning_rate 1e-4 --num_epochs 20 --scheduler_step_size 17 --config configs.swiftformer_S_pretrained --batch_size 16

SwiftDepth-small

python train.py --data_path KITTI_DATA_PATH --log_dir LOG_DIR_PATH --model_name SwiftDepth_XS_run --split eigen_zhou --num_workers 8 --eval_mono --pose_model_type separate_resnet --learning_rate 1e-4 --num_epochs 20 --scheduler_step_size 17 --config configs.swiftformer_XS_pretrained --batch_size 16

Evaluation

SwiftDepth

python evaluate_depth.py --config configs.swiftformer_S_pretrained --load_weights_folder .\weights\swiftdepth --eval_mono --data_path KITTI_DATA_PATH --eval_split eigen --batch_size 10

SwiftDepth-small

python evaluate_depth.py --config configs.swiftformer_XS_pretrained --load_weights_folder .\weights\swiftdepth-small --eval_mono --data_path KITTI_DATA_PATH --eval_split eigen --batch_size 10

Acknowledgement

The code is adapted primarily from Monodepth2.

It also adapted from VTDepth, EPCDepth, and Lite-Mono.

Special thanks to SwiftFormer for publicly available code and weights for their efficient backbone used as an encoder for SwiftDepth.

About

SwiftDepth: An Efficient Hybrid CNN-Transformer Model for Self-Supervised Monocular Depth Estimation on Mobile Devices

Topics

Resources

License

Stars

Watchers

Forks

Languages