A simple pytorch loop-less implementation of yolo loss and non-maximum
suppression (NMS) based on torchvision.ops
.
In this implementation, the most details follow the darknet implementation except for:
- The MSE loss of bbox (
x
,y
) is replaced with binary cross entropy. - The parameter
subdivision
is replaced withaccumulation
. The effective batch size in this implementation isbatch_size
xaccumulation
.
Install all required packages:
$ pip install -r requirements.txt
- Download pretrained backbone and 3 full models weights.
$ cd weights $ sh download_weights.sh
- The script will download MS COCO 2014
and rearrange them into the trainval splits like darknet.
There is a python script which will be executed at the end of
$ cd data $ sh download_coco.sh
download_coco.sh
, it needspycocotool
to be installed in advance. If the installation is missed, it was nothing serious. You can manually runpython generate_annotations.py
inside the directorydata
to generate darknet annotations in coco format.
Use --model
and --weights
to set model architecture and pretrained weights.
python eval.py --model yolov3 --weights ./weights/yolov3.weights --img_size 416
Model | AP@.5(darknet) | AP@.5 (our) | --img_size |
---|---|---|---|
yolov3-320 | 51.5 | 51.4 | 320 |
yolov3-416 | 55.3 | 55.3 | 416 |
yolov3-608 | 57.9 | 58.4 | 608 |
yolov3-tiny | 33.1 | 32.8 | 416 |
yolov3-spp | 60.6 | 61.0 | 608 |
NOTE, the darkent weights for yolov3-tiny
was trained with incorrect prior anchors which is listed
here.
The correct one is showed beside it for the reference.
python demo.py --image ./data/street.jpg --model yolov3 --weights ./weights/yolov3.weights --img_size 418
The resultsis saved as demo.png
.
Note that the confidence threshold and NMS IoU threshold can be changed by
flags --conf_threshold
and --nms_threshold
respectively.
The default training arguments are same as official ones for yolov3
.
-
Single GPU training (default to
cuda:0
)python train.py \ --weights ./wegiths/darknet53.conv.74 \ --logdir ./logs/yolov3
Default
batch_size=4
andaccumulation=16
are designed for single GPU with 8G VRAM. -
Multi GPU training
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py \ --weights ./wegiths/darknet53.conv.74 \ --logdir ./logs/yolov3 \ --batch_size 64 \ --accumulation 1
The inter-process communication uses port
39846
by default, which can be changed by editingtrain.py
. In my experiments, full 500k iterations takes about 63 hours on a 4xRTX3090 server.
Model | AP@.5(darknet) | AP@.5(our) | AP@.5:.95(our) |
---|---|---|---|
yolov3-320 | 51.5 | 48.5 | 27.5 |
yolov3-418 | 55.3 | 53.0 | 30.6 |
yolov3-608 | 57.9 | 54.8 | 31.7 |