Our code is available at TLlib examples for cross-domain object detection
Our code is based on Detectron latest(v0.6), please install it before usage.
The following is an example based on PyTorch 1.9.0 with CUDA 11.1. For other versions, please refer to the official website of PyTorch and Detectron.
# create environment
conda create -n detection python=3.8.3
# activate environment
conda activate detection
# install pytorch
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
# install detectron
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.9/index.html
# install other requirements
pip install -r requirements.txt
Following datasets can be downloaded automatically:
- PASCAL_VOC 07+12
- Clipart
- WaterColor
- Comic
You need to prepare following datasets manually if you want to use them:
- Download Cityscapes and Foggy Cityscapes dataset from the link. Particularly, we use leftImg8bit_trainvaltest.zip for Cityscapes and leftImg8bit_trainvaltest_foggy.zip for Foggy Cityscapes.
- Unzip them under the directory like
object_detction/datasets/cityscapes
├── gtFine
├── leftImg8bit
├── leftImg8bit_foggy
└── ...
Then run
python prepare_cityscapes_to_voc.py
This will automatically generate dataset in VOC
format.
object_detction/datasets/cityscapes_in_voc
├── Annotations
├── ImageSets
└── JPEGImages
object_detction/datasets/foggy_cityscapes_in_voc
├── Annotations
├── ImageSets
└── JPEGImages
- Download Sim10k dataset from the following links: Sim10k. Particularly, we use repro_10k_images.tgz , repro_image_sets.tgz and repro_10k_annotations.tgz for Sim10k.
- Extract the training set from repro_10k_images.tgz, repro_image_sets.tgz and repro_10k_annotations.tgz, then rename directory
VOC2012/
tosim10k/
.
After preparation, there should exist following files:
object_detction/datasets/
├── VOC2007
│ ├── Annotations
│ ├──ImageSets
│ └──JPEGImages
├── VOC2012
│ ├── Annotations
│ ├── ImageSets
│ └── JPEGImages
├── clipart
│ ├── Annotations
│ ├── ImageSets
│ └── JPEGImages
├── watercolor
│ ├── Annotations
│ ├── ImageSets
│ └── JPEGImages
├── comic
│ ├── Annotations
│ ├── ImageSets
│ └── JPEGImages
├── cityscapes_in_voc
│ ├── Annotations
│ ├── ImageSets
│ └── JPEGImages
├── foggy_cityscapes_in_voc
│ ├── Annotations
│ ├── ImageSets
│ └── JPEGImages
└── sim10k
├── Annotations
├── ImageSets
└── JPEGImages
Note: The above is a tutorial for using standard datasets. To use your own datasets, you need to convert them into corresponding format.
Supported methods include:
- Cycle-Consistent Adversarial Networks (CycleGAN)
- Decoupled Adaptation for Cross-Domain Object Detection (D-adapt)
The shell files give the script to reproduce the benchmarks with specified hyper-parameters. The basic training pipeline is as follows.
The following command trains a Faster-RCNN detector on task VOC->Clipart, with only source (VOC) data.
CUDA_VISIBLE_DEVICES=0 python source_only.py \
--config-file config/faster_rcnn_R_101_C4_voc.yaml \
-s VOC2007 datasets/VOC2007 VOC2012 datasets/VOC2012 -t Clipart datasets/clipart \
--test VOC2007Test datasets/VOC2007 Clipart datasets/clipart --finetune \
OUTPUT_DIR logs/source_only/faster_rcnn_R_101_C4/voc2clipart
Explanation of some arguments
--config-file
: path to config file that specifies training hyper-parameters.-s
: a list that specifies source datasets, for each dataset you should pass in a(name, path)
pair, in the above command, there are two source datasets VOC2007 and VOC2012.-t
: a list that specifies target datasets, same format as above.--test
: a list that specifiers test datasets, same format as above.
AP | AP50 | AP75 | aeroplane | bicycle | bird | boat | bottle | bus | car | cat | chair | cow | diningtable | dog | horse | motorbike | person | pottedplant | sheep | sofa | train | tvmonitor | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Faster RCNN (ResNet101) | Source | 14.9 | 29.3 | 12.6 | 29.6 | 38.0 | 24.7 | 21.7 | 31.9 | 48.0 | 30.8 | 15.9 | 32.0 | 19.2 | 18.2 | 12.1 | 28.2 | 48.8 | 38.3 | 34.6 | 3.8 | 22.5 | 43.7 | 44.0 |
CycleGAN | 20.0 | 37.7 | 18.3 | 37.1 | 41.9 | 29.9 | 26.5 | 40.9 | 65.1 | 37.8 | 23.8 | 40.7 | 48.9 | 12.7 | 14.4 | 27.8 | 63.0 | 55.1 | 40.1 | 8.0 | 30.7 | 54.1 | 55.7 | |
D-adapt | 24.8 | 49.0 | 21.5 | 56.4 | 63.2 | 42.3 | 40.9 | 45.3 | 77.0 | 48.7 | 25.4 | 44.3 | 58.4 | 31.4 | 24.5 | 47.1 | 75.3 | 69.3 | 43.5 | 27.9 | 34.1 | 60.7 | 64.0 | |
RetinaNet | Source | 18.3 | 32.2 | 17.6 | 34.2 | 42.4 | 27.0 | 21.6 | 36.8 | 48.4 | 35.9 | 16.4 | 38.9 | 22.6 | 27.0 | 15.1 | 27.1 | 46.7 | 42.1 | 36.2 | 8.3 | 29.5 | 42.1 | 46.2 |
D-adapt | 25.1 | 46.3 | 23.9 | 47.4 | 65.0 | 33.1 | 37.5 | 56.8 | 61.2 | 55.1 | 27.3 | 45.5 | 51.8 | 29.1 | 29.6 | 38.0 | 74.5 | 66.7 | 46.0 | 24.2 | 29.3 | 54.2 | 53.8 |
AP | AP50 | AP75 | bicycle | bird | car | cat | dog | person | |
---|---|---|---|---|---|---|---|---|---|
Faster RCNN (ResNet101) | 23.0 | 45.9 | 18.5 | 71.1 | 48.3 | 48.6 | 23.7 | 23.3 | 60.3 |
CycleGAN | 24.9 | 50.8 | 22.4 | 75.8 | 52.1 | 49.8 | 30.1 | 33.4 | 63.6 |
D-adapt | 28.5 | 57.5 | 23.6 | 77.4 | 54.0 | 52.8 | 43.9 | 48.1 | 68.9 |
Target | 23.8 | 51.3 | 17.4 | 48.5 | 54.7 | 41.3 | 36.2 | 52.6 | 74.6 |
AP | AP50 | AP75 | bicycle | bird | car | cat | dog | person | |
---|---|---|---|---|---|---|---|---|---|
Faster RCNN (ResNet101) | 13.0 | 25.5 | 11.4 | 33.0 | 15.8 | 28.9 | 16.8 | 19.6 | 39.0 |
CycleGAN | 16.9 | 34.6 | 14.2 | 28.1 | 25.7 | 37.7 | 28.0 | 33.8 | 54.1 |
D-adapt | 20.8 | 41.1 | 18.5 | 49.4 | 25.7 | 43.3 | 36.9 | 32.7 | 58.5 |
Target | 21.9 | 44.6 | 16.0 | 40.7 | 32.3 | 38.3 | 43.9 | 41.3 | 71.0 |
AP | AP50 | AP75 | bicycle | bus | car | motorcycle | person | rider | train | truck | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Faster RCNN (VGG16) | Source | 14.3 | 25.9 | 13.2 | 33.6 | 27.0 | 40.0 | 22.3 | 31.3 | 38.5 | 2.3 | 12.2 |
CycleGAN | 22.5 | 41.6 | 20.7 | 46.5 | 41.5 | 62.0 | 33.8 | 45.0 | 54.5 | 21.7 | 27.7 | |
D-adapt | 19.4 | 38.1 | 17.5 | 42.0 | 36.8 | 58.1 | 32.2 | 43.1 | 51.8 | 14.6 | 26.3 | |
Target | 24.0 | 45.3 | 21.3 | 45.9 | 47.4 | 67.3 | 39.7 | 49.0 | 53.2 | 30.0 | 29.6 | |
Faster RCNN (ResNet101) | Source | 18.8 | 33.3 | 19.0 | 36.1 | 34.5 | 43.8 | 24.0 | 36.3 | 39.9 | 29.1 | 22.8 |
CycleGAN | 22.9 | 41.8 | 21.9 | 42.0 | 44.5 | 57.6 | 36.3 | 40.9 | 48.0 | 30.8 | 34.3 | |
D-adapt | 22.7 | 42.4 | 21.6 | 41.8 | 44.4 | 56.6 | 31.4 | 41.8 | 48.6 | 42.3 | 32.4 | |
Target | 25.5 | 45.3 | 24.3 | 41.9 | 53.2 | 63.4 | 36.1 | 42.6 | 47.9 | 42.4 | 35.3 |
AP | AP50 | AP75 | ||
---|---|---|---|---|
Faster RCNN (VGG16) | Source | 24.8 | 43.4 | 23.6 |
CycleGAN | 29.3 | 51.9 | 28.6 | |
D-adapt | 23.6 | 48.5 | 18.7 | |
Target | 24.8 | 43.4 | 23.6 | |
Faster RCNN (ResNet101) | Source | 24.6 | 44.4 | 23.0 |
CycleGAN | 26.5 | 47.4 | 24.0 | |
D-adapt | 27.4 | 51.9 | 25.7 | |
Target | 24.6 | 44.4 | 23.0 |
We provide code for visualization in visualize.py
. For example, suppose you have trained the source only model
of task VOC->Clipart using provided scripts. The following code visualizes the prediction of the
detector on Clipart.
CUDA_VISIBLE_DEVICES=0 python visualize.py --config-file config/faster_rcnn_R_101_C4_voc.yaml \
--test Clipart datasets/clipart --save-path visualizations/source_only/voc2clipart \
MODEL.WEIGHTS logs/source_only/faster_rcnn_R_101_C4/voc2clipart/model_final.pth
Explanation of some arguments
--test
: a list that specifiers test datasets for visualization.--save-path
: where to save visualization results.MODEL.WEIGHTS
: path to the model.
Support methods: SWDA, Global/Local Alignment
If you use these methods in your research, please consider citing.
@inproceedings{jiang2021decoupled,
title = {Decoupled Adaptation for Cross-Domain Object Detection},
author = {Junguang Jiang and Baixu Chen and Jianmin Wang and Mingsheng Long},
booktitle = {ICLR},
year = {2022}
}
@inproceedings{CycleGAN,
title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks},
author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
booktitle={ICCV},
year={2017}
}