How to train yolov4-csp via darknet ? #13

toplinuxsir · 2020-11-18T05:11:24Z

How to train yolov4-csp using darknet ?

use the yolov4-csp.cfg ?
how to change classes and filters params , the same way as yolov4-custom ?
The yolov4-csp weights can load by opencv ?

Thanks

WongKinYiu · 2020-11-18T05:26:32Z

yes
yes
for opencv dnn, use weights file provided in darknet model zoo.
the weights file provided here need some modification.

toplinuxsir · 2020-11-18T06:47:38Z

@WongKinYiu Thanks !
When train use darknet, use which conv file , The same as yolov4 , the file yolov4.conv.137 ?

wuzhenxin1989 · 2020-11-18T07:24:43Z

@WongKinYiu yolov4-csp where?

kadirbeytorun · 2020-11-27T10:49:06Z

@WongKinYiu Thanks !
When train use darknet, use which conv file , The same as yolov4 , the file yolov4.conv.137 ?

I dont think you can use yolov4.conv.137 in this case, since there are many differences between these networks. You either need to train from scratch or create your own conv file from csp weights

toplinuxsir · 2020-11-29T04:49:34Z

@kadirbeytorun how to create conv file from weights file ? Thanks

WongKinYiu · 2020-11-29T05:21:00Z

#4 (comment)

toplinuxsir · 2020-11-30T07:27:27Z

I trained for my own custom dataset via darknet , the mAP always is zero and avg loss from 1000 to 2000
Is that normal ?

 Tensor Cores are disabled until the first 3000 iterations are reached.
 Last accuracy mAP@0.5 = 0.00 %, best = 0.00 % 
 1160: 1306.229736, 1395.514893 avg loss, 0.001000 rate, 5.132274 seconds, 74240 images, 1646.073728 hours left
Loaded: 6.347659 seconds - performance bottleneck on CPU or Disk HDD/SSD
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.414428), count: 378, total_loss = 2930.492432 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.541770), count: 47, total_loss = 48.747066 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.575222), count: 5, total_loss = 1.038805 
 total_bbox = 5755854, rewritten_bbox = 0.230183 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.393561), count: 630, total_loss = 4719.420898 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.520402), count: 54, total_loss = 48.751522 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.539931), count: 3, total_loss = 0.804829 
 total_bbox = 5756541, rewritten_bbox = 0.230173 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.385769), count: 665, total_loss = 4913.453613 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.499050), count: 61, total_loss = 61.827621 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.688603), count: 4, total_loss = 1.349541 
 total_bbox = 5757269, rewritten_bbox = 0.230196 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.396953), count: 378, total_loss = 2812.761475 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.516003), count: 29, total_loss = 26.940224 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.000000), count: 1, total_loss = 0.000001 
 total_bbox = 5757676, rewritten_bbox = 0.230180 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.366531), count: 378, total_loss = 2609.476807 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.522645), count: 29, total_loss = 25.787998 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.000000), count: 1, total_loss = 0.000001 
 total_bbox = 5758083, rewritten_bbox = 0.230163 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.401975), count: 619, total_loss = 4721.038574 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.525033), count: 65, total_loss = 59.644394 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.504015), count: 2, total_loss = 0.201438 
 total_bbox = 5758769, rewritten_bbox = 0.230188 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.373822), count: 756, total_loss = 5677.616699 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.542468), count: 56, total_loss = 61.038700 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.658722), count: 4, total_loss = 0.895291 
 total_bbox = 5759585, rewritten_bbox = 0.230242 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.374134), count: 791, total_loss = 5751.552246 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.528585), count: 54, total_loss = 62.833313 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.682631), count: 6, total_loss = 1.954589 
 total_bbox = 5760436, rewritten_bbox = 0.230208 %

kadirbeytorun · 2020-11-30T07:51:14Z

There is some serious problem with your dataset or cfg file. You need share more information about your dataset and also share your cfg file here if you want to get help

toplinuxsir · 2020-11-30T10:09:25Z

@kadirbeytorun
my dataset of. training for yolov4.cfg works fine .
my yolov4-csp.cfg file as below:

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=8
width=640
height=640
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500500
policy=steps
steps=400000,450000
scales=.1,.1

mosaic=1

letter_box=1

optimized_memory=1

#23:104x104 54:52x52 85:26x26 104:13x13 for 416



[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=mish

#[convolutional]
#batch_normalize=1
#filters=64
#size=1
#stride=1
#pad=1
#activation=mish

#[route]
#layers = -2

#[convolutional]
#batch_normalize=1
#filters=64
#size=1
#stride=1
#pad=1
#activation=mish

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

#[convolutional]
#batch_normalize=1
#filters=64
#size=1
#stride=1
#pad=1
#activation=mish

#[route]
#layers = -1,-7

#[convolutional]
#batch_normalize=1
#filters=64
#size=1
#stride=1
#pad=1
#activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=128
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-10

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=256
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-28

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=512
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-28

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-16

[convolutional]
batch_normalize=1
filters=1024
size=1
stride=1
pad=1
activation=mish

##########################

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

### SPP ###
[maxpool]
stride=1
size=5

[route]
layers=-2

[maxpool]
stride=1
size=9

[route]
layers=-4

[maxpool]
stride=1
size=13

[route]
layers=-1,-3,-5,-6
### End SPP ###

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=mish

[route]
layers = -1, -13

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[upsample]
stride=2

[route]
layers = 79

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1, -3

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=mish

[route]
layers = -1, -6

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[upsample]
stride=2

[route]
layers = 48

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1, -3

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=128
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=128
activation=mish

[route]
layers = -1, -6

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

##########################

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=mish

[convolutional]
size=1
stride=1
pad=1
filters=45
activation=linear


[yolo]
mask = 0,1,2
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=10
num=9
jitter=.1
objectness_smooth=0
ignore_thresh = .7
truth_thresh = 1
#random=1
resize=1.5
iou_thresh=0.2
iou_normalizer=0.05
cls_normalizer=0.5
obj_normalizer=4.0
iou_loss=ciou
nms_kind=diounms
beta_nms=0.6
new_coords=1
max_delta=20

[route]
layers = -4

[convolutional]
batch_normalize=1
size=3
stride=2
pad=1
filters=256
activation=mish

[route]
layers = -1, -20

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=mish

[route]
layers = -1,-6

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=mish

[convolutional]
size=1
stride=1
pad=1
filters=45
activation=linear


[yolo]
mask = 3,4,5
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=10
num=9
jitter=.1
objectness_smooth=1
ignore_thresh = .7
truth_thresh = 1
#random=1
resize=1.5
iou_thresh=0.2
iou_normalizer=0.05
cls_normalizer=0.5
obj_normalizer=1.0
iou_loss=ciou
nms_kind=diounms
beta_nms=0.6
new_coords=1
max_delta=5

[route]
layers = -4

[convolutional]
batch_normalize=1
size=3
stride=2
pad=1
filters=512
activation=mish

[route]
layers = -1, -49

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=mish

[route]
layers = -1,-6

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=mish

[convolutional]
size=1
stride=1
pad=1
filters=45
activation=linear


[yolo]
mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=10
num=9
jitter=.1
objectness_smooth=1
ignore_thresh = .7
truth_thresh = 1
#random=1
resize=1.5
iou_thresh=0.2
iou_normalizer=0.05
cls_normalizer=0.5
obj_normalizer=0.4
iou_loss=ciou
nms_kind=diounms
beta_nms=0.6
new_coords=1
max_delta=2

kadirbeytorun · 2020-11-30T11:00:08Z

I dont see anything weird with the cfg file. Did you perhaps create your conv file wrong?

Maybe you couldn't create it properly, so you cannot do transfer learning, and your network is trying to learn it from scratch

Show me how you created your conv file.

toplinuxsir · 2020-11-30T22:59:08Z

@kadirbeytorun
./darknet partial cfg/yolov4-csp.cfg yolov4-csp.weights yolov4-csp.conv.166 166
Is that right ?

AlexeyAB · 2020-12-01T00:37:45Z

@toplinuxsir

Did you try to train with new_coords=0 for each [yolo] layer? Does it help?

Or did you try to train with optimized_memory=0 ? Does it help?

toplinuxsir · 2020-12-01T02:24:30Z

@AlexeyAB ，OK ， I will test it and let you know .

toplinuxsir · 2020-12-01T04:21:42Z

@AlexeyAB can I can set new_coords=0 and opitmized_memeory=0 at the same time to train ?
the option optimized_memory can reduce the training time ?

toplinuxsir · 2020-12-01T08:10:42Z

@AlexeyAB Thanks ,I change new_coords=0, The training of my custom dataset works ok ! but the avg loss still very big

training log as below:

Tensor Cores are disabled until the first 3000 iterations are reached.
 Last accuracy mAP@0.5 = 98.08 %, best = 98.08 % 
 1661: 1119.606445, 977.863708 avg loss, 0.001000 rate, 5.367847 seconds, 106304 images, 72.456934 hours left
Loaded: 6.305674 seconds - performance bottleneck on CPU or Disk HDD/SSD
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.784817), count: 373, total_loss = 1715.903076 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.868090), count: 37, total_loss = 15.113337 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.863180), count: 1, total_loss = 0.146742 
 total_bbox = 8285078, rewritten_bbox = 0.243281 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.766573), count: 363, total_loss = 1641.005249 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.864607), count: 41, total_loss = 14.837458 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.909243), count: 4, total_loss = 0.652445 
 total_bbox = 8285486, rewritten_bbox = 0.243269 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.755187), count: 334, total_loss = 1625.987427 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.865474), count: 40, total_loss = 15.278171 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.840126), count: 1, total_loss = 0.121653 
 total_bbox = 8285861, rewritten_bbox = 0.243282 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.780519), count: 363, total_loss = 2101.045654 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.884945), count: 32, total_loss = 13.281761 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.880044), count: 4, total_loss = 0.525622 
 total_bbox = 8286260, rewritten_bbox = 0.243270 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.790372), count: 422, total_loss = 2187.463623 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.910127), count: 37, total_loss = 16.859039 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.885419), count: 5, total_loss = 0.736598 
 total_bbox = 8286724, rewritten_bbox = 0.243257 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.711878), count: 610, total_loss = 2663.282715 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.813086), count: 51, total_loss = 22.179031 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.889040), count: 3, total_loss = 0.467306 
 total_bbox = 8287388, rewritten_bbox = 0.243273 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.708827), count: 790, total_loss = 3897.820068 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.832588), count: 84, total_loss = 36.573296 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.832710), count: 4, total_loss = 0.415506 
 total_bbox = 8288266, rewritten_bbox = 0.243296 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 144 Avg (IOU: 0.721851), count: 794, total_loss = 4043.173584 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 159 Avg (IOU: 0.832350), count: 85, total_loss = 35.234844 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 174 Avg (IOU: 0.843946), count: 5, total_loss = 0.594249 
 total_bbox = 8289149, rewritten_bbox = 0.243306 %

AlexeyAB · 2020-12-01T15:12:18Z

@toplinuxsir I did some fixes. Try to download the latest Darknet version.

And use for each [yolo] layer

[yolo]
new_coords=1
scale_x_y = 2.0

toplinuxsir · 2020-12-01T23:28:54Z

@AlexeyAB OK, I will test it again , and Let you know.

toplinuxsir · 2020-12-02T02:18:56Z

@AlexeyAB

which conv file should I use ?
extract from the command

./darknet partial cfg/yolov4-csp.cfg yolov4-csp.weights yolov4-csp.conv.166 166

the file size: 162M,
and I download from https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4x-mish.conv.166
the file size :265M
Which one ?
2. which cfg file should I use?
the config file yolov4-csp.cfg or yolov4x-mish.cfg
which one ?

Thanks

AlexeyAB · 2020-12-02T02:31:30Z

@toplinuxsir

For yolov4-csp.cfg use (140 MB): https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-csp.conv.142
For yolov4x-mish.cfg use (264 MB): https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4x-mish.conv.166

All these files are in the assets at the bottom: https://github.com/AlexeyAB/darknet/releases/tag/darknet_yolo_v4_pre

which cfg file should I use?
the config file yolov4-csp.cfg or yolov4x-mish.cfg
which one ?

You can use any cfg-file.
It depends on what speed and accuracy do you want to achieve.

toplinuxsir · 2020-12-02T07:25:32Z

@AlexeyAB
I trained with command below :

./darknet detector train ./data/obj.data  ./yolov4x-mish.cfg   ./yolov4x-mish.conv.166  -map

the darknent and cfg file, conv file all download form github the latest darkenet version
with:

new_coords=1
scale_x_y = 2.0

but the training is not normal ,avg loss is very big and ap is zero , the log as below

Tensor Cores are disabled until the first 3000 iterations are reached.
 (next mAP calculation at 1161 iterations) 
 1161: 1480.196045, 1471.214966 avg loss, 0.001000 rate, 7.448454 seconds, 74304 images, 65.356810 hours left

 calculation mAP (mean average precision)...
 Detection layer: 168 - type = 28 
 Detection layer: 185 - type = 28 
 Detection layer: 202 - type = 28 
2580
 detections_count = 0, unique_truth_count = 79482  
class_id = 0, name = xkakou, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 1, name = dkakou, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 2, name = bkakou, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 3, name = flamp, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 4, name = blamp, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 5, name = diankuai1, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 6, name = diankuai2, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 7, name = xianshu, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 8, name = mic, ap = 0.00%   	 (TP = 0, FP = 0) 
class_id = 9, name = xinyinmian, ap = 0.00%   	 (TP = 0, FP = 0) 

 for conf_thresh = 0.25, precision = -nan, recall = 0.00, F1-score = -nan 
 for conf_thresh = 0.25, TP = 0, FP = 0, FN = 79482, average IoU = 0.00 % 

 IoU threshold = 50 %, used Area-Under-Curve for each unique Recall 
 mean average precision (mAP@0.50) = 0.000000, or 0.00 % 
Total Detection Time: 259 Seconds

Set -points flag:
 `-points 101` for MS COCO 
 `-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data) 
 `-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

 mean_average_precision (mAP@0.5) = 0.000000 
Loaded: 0.000077 seconds
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.391380), count: 430, total_loss = 3136.637695 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.607068), count: 40, total_loss = 39.292522 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.700347), count: 2, total_loss = 0.296184 
 total_bbox = 5829818, rewritten_bbox = 0.244433 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.385886), count: 633, total_loss = 4726.900391 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.608623), count: 60, total_loss = 80.453018 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.706707), count: 2, total_loss = 0.717614 
 total_bbox = 5830513, rewritten_bbox = 0.244867 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.389196), count: 794, total_loss = 5844.792480 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.579420), count: 68, total_loss = 88.285965 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.446411), count: 3, total_loss = 0.351362 
 total_bbox = 5831372, rewritten_bbox = 0.244882 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.411063), count: 409, total_loss = 3106.046631 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.592833), count: 41, total_loss = 43.277283 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.622946), count: 3, total_loss = 1.329083 
 total_bbox = 5831825, rewritten_bbox = 0.244863 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.397945), count: 809, total_loss = 5976.632324 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.559342), count: 82, total_loss = 95.763527 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.612882), count: 6, total_loss = 1.274195 
 total_bbox = 5832722, rewritten_bbox = 0.244843 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.353410), count: 478, total_loss = 3151.194580 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.652449), count: 34, total_loss = 49.702888 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.670376), count: 3, total_loss = 0.935958 
 total_bbox = 5833237, rewritten_bbox = 0.244838 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.400875), count: 432, total_loss = 3263.950928 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.559650), count: 50, total_loss = 50.028606 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.607934), count: 4, total_loss = 0.708170 
 total_bbox = 5833723, rewritten_bbox = 0.244818 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.376857), count: 816, total_loss = 5678.036133 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.543922), count: 70, total_loss = 66.156876 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.659107), count: 4, total_loss = 1.028102 
 total_bbox = 5834613, rewritten_bbox = 0.244849 %

toplinuxsir · 2020-12-02T23:03:44Z

@AlexeyAB The training iterations is mAp is always 0,

AlexeyAB · 2020-12-02T23:38:24Z

@toplinuxsir

I fixed an issue with letter_box for yolov4x-mish.cfg and yolov4csp.cfg, TTry to download the latest Darknet version, recompile and check the mAP by using ./darknet detector map ... command. What mAP do you get?
How many iterations did you train?
Do you get the same issue if you train yolov4x-mish.cfg with new_coords=0?
What width= and height= do you use in cfg-file? And what resolution of your images?

toplinuxsir · 2020-12-03T00:53:29Z

@AlexeyAB Ok, I will test with the latest darknet version and let you know the result .

toplinuxsir · 2020-12-03T01:09:49Z

@AlexeyAB

With previous darknet version:

How many iterations did you train? (about 3000 interations)

Do you get the same issue if you train yolov4x-mish.cfg with new_coords=0? (yes, avg loss is smaller, but mAP is zero)

What width= and height= do you use in cfg-file? And what resolution of your images? (width=640 height=640 , image resuolution: 5496X3672)

I will test with the latest darknet version and let you know the result .

AlexeyAB · 2020-12-03T01:32:12Z

@toplinuxsir

What width= and height= do you use in cfg-file? And what resolution of your images? (width=640 height=640 , image resuolution: 5496x3672)

Just be sure that you can see objects on your image after resizing it to 640x640 resolution.

toplinuxsir · 2020-12-03T04:53:24Z

@toplinuxsir

What width= and height= do you use in cfg-file? And what resolution of your images? (width=640 height=640 , image resuolution: 5496x3672)

Just be sure that you can see objects on your image after resizing it to 640x640 resolution.

Yes I can see objects after resizeing to 640X640 , The dataset training works ok for yolov4.

toplinuxsir · 2020-12-03T04:58:56Z

@toplinuxsir

I fixed an issue with letter_box for yolov4x-mish.cfg and yolov4csp.cfg, TTry to download the latest Darknet version, recompile and check the mAP by using ./darknet detector map ... command. What mAP do you get?

How many iterations did you train?

Do you get the same issue if you train yolov4x-mish.cfg with new_coords=0?

What width= and height= do you use in cfg-file? And what resolution of your images?

with the latest darknet version, trained for 1167 interations , the mAP@0.5 is still zero
and use command

darknet detector map  ./data/obj.data ./yolov4x-mish.cfg ./yolov4x-mish_best.wights

the mAP is still zero.

below as the training log:

Tensor Cores are disabled until the first 3000 iterations are reached.
 Last accuracy mAP@0.5 = 0.00 %, best = 0.00 % 
 1167: 1556.674927, 1418.323975 avg loss, 0.001000 rate, 7.389055 seconds, 74688 images, 76.743021 hours left
Loaded: 2.928214 seconds - performance bottleneck on CPU or Disk HDD/SSD
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.412603), count: 372, total_loss = 2719.300049 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.526694), count: 48, total_loss = 50.768501 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.505199), count: 4, total_loss = 0.829803 
 total_bbox = 5800145, rewritten_bbox = 0.241839 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.391346), count: 415, total_loss = 3045.585938 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.563026), count: 41, total_loss = 36.331882 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.517813), count: 1, total_loss = 0.433905 
 total_bbox = 5800602, rewritten_bbox = 0.241820 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.380171), count: 465, total_loss = 3351.983398 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.597791), count: 37, total_loss = 31.914627 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.734160), count: 3, total_loss = 1.090034 
 total_bbox = 5801107, rewritten_bbox = 0.241850 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.382941), count: 838, total_loss = 6034.654297 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.557431), count: 74, total_loss = 77.567657 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.639629), count: 7, total_loss = 1.671899 
 total_bbox = 5802026, rewritten_bbox = 0.241829 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.400005), count: 437, total_loss = 3457.242432 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.575009), count: 19, total_loss = 23.943022 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.605979), count: 2, total_loss = 0.505783 
 total_bbox = 5802484, rewritten_bbox = 0.241810 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.411907), count: 365, total_loss = 2944.755371 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.505411), count: 26, total_loss = 28.957836 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.519159), count: 2, total_loss = 0.166492 
 total_bbox = 5802877, rewritten_bbox = 0.241846 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.393304), count: 395, total_loss = 2971.791260 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.539111), count: 35, total_loss = 34.049568 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.655177), count: 2, total_loss = 0.889932 
 total_bbox = 5803309, rewritten_bbox = 0.241828 % 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 4.00, cls: 0.50) Region 168 Avg (IOU: 0.388944), count: 397, total_loss = 2937.497314 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 1.00, cls: 0.50) Region 185 Avg (IOU: 0.567104), count: 38, total_loss = 34.991013 
v3 (iou loss, Normalizer: (iou: 0.05, obj: 0.40, cls: 0.50) Region 202 Avg (IOU: 0.652811), count: 3, total_loss = 1.634178 
 total_bbox = 5803747, rewritten_bbox = 0.241809 %

toplinuxsir · 2020-12-03T22:57:08Z

@AlexeyAB I use the latest darknet version trained for more than 6500 interations ,the mAP still zero

toplinuxsir · 2020-12-04T04:03:36Z

I trained use the latest commit AlexeyAB/darknet@8d6e56e ,
The mAP is still zero after 1000 interations ,but the avg loss become smaller.
the log below ```

Tensor Cores are disabled until the first 3000 iterations are reached.
(next mAP calculation at 1000 iterations)
1000: 602.429871, 527.480469 avg loss, 0.001000 rate, 7.381818 seconds, 64000 images, 63.177227 hours left

calculation mAP (mean average precision)...
Detection layer: 168 - type = 28
Detection layer: 185 - type = 28
Detection layer: 202 - type = 28
2580
detections_count = 0, unique_truth_count = 79482
class_id = 0, name = xkakou, ap = 0.00% (TP = 0, FP = 0)
class_id = 1, name = dkakou, ap = 0.00% (TP = 0, FP = 0)
class_id = 2, name = bkakou, ap = 0.00% (TP = 0, FP = 0)
class_id = 3, name = flamp, ap = 0.00% (TP = 0, FP = 0)
class_id = 4, name = blamp, ap = 0.00% (TP = 0, FP = 0)
class_id = 5, name = diankuai1, ap = 0.00% (TP = 0, FP = 0)
class_id = 6, name = diankuai2, ap = 0.00% (TP = 0, FP = 0)
class_id = 7, name = xianshu, ap = 0.00% (TP = 0, FP = 0)
class_id = 8, name = mic, ap = 0.00% (TP = 0, FP = 0)
class_id = 9, name = xinyinmian, ap = 0.00% (TP = 0, FP = 0)

for conf_thresh = 0.25, precision = -nan, recall = 0.00, F1-score = -nan
for conf_thresh = 0.25, TP = 0, FP = 0, FN = 79482, average IoU = 0.00 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.000000, or 0.00 %
Total Detection Time: 258 Seconds

Set -points flag:
-points 101 for MS COCO
-points 11 for PascalVOC 2007 (uncomment difficult in voc.data)
-points 0 (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

mean_average_precision (mAP@0.5) = 0.000000
New best mAP!
Saving weights to backup//yolov4x-mish_best.weights
Saving weights to backup//yolov4x-mish_1000.weights
Saving weights to backup//yolov4x-mish_last.weights

AlexeyAB · 2020-12-04T18:25:42Z

@toplinuxsir

mAP is 0% for 1000% iterations, and is 5% for 2000 iterations.
But it seems that training with new_coords=0 is better than with new_coords=1

yolov4-csp.cfg (320x320) b=32 on MS COCO:
./darknet detector train F:/MSCOCO/coco_f.data cfg/yolov4-csp.cfg yolov4-csp.conv.142 -map

toplinuxsir · 2020-12-05T13:11:32Z

@AlexeyAB I use you latest commit AlexeyAB/darknet@4709f61

, with new_coords=1, avg loss is nan after 1000 interations

AlexeyAB · 2020-12-05T13:23:45Z

@toplinuxsir

Can you share your cfg-file?
And what training command do you use?

toplinuxsir · 2020-12-06T07:48:38Z

@AlexeyAB

my training command

./darknet detector train ./data/obj.data  ./yolov4x-mish.cfg   ./yolov4x-mish.conv.166  -map

yolov4x-mish.cfg

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=8
width=640
height=640
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500500
policy=steps
steps=400000,450000
scales=.1,.1

mosaic=1

letter_box=1

#optimized_memory=1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=80
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=40
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=80
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

# Downsample

[convolutional]
batch_normalize=1
filters=160
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=80
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=80
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=80
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=80
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=80
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=80
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=80
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=80
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=80
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-13

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=320
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-34

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=640
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear


[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-34

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=1280
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-19

[convolutional]
batch_normalize=1
filters=1280
size=1
stride=1
pad=1
activation=mish

########################## 6 0 6 6 3

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=640
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

### SPP ###
[maxpool]
stride=1
size=5

[route]
layers=-2

[maxpool]
stride=1
size=9

[route]
layers=-4

[maxpool]
stride=1
size=13

[route]
layers=-1,-3,-5,-6
### End SPP ###

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=640
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=640
activation=mish

[route]
layers = -1, -15

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[upsample]
stride=2

[route]
layers = 94

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1, -3

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=320
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=320
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=320
activation=mish

[route]
layers = -1, -8

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[upsample]
stride=2

[route]
layers = 57

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1, -3

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=160
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=160
activation=mish

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=160
activation=mish

[route]
layers = -1, -8

[convolutional]
batch_normalize=1
filters=160
size=1
stride=1
pad=1
activation=mish
stopbackward=800

##########################

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=320
activation=mish

[convolutional]
size=1
stride=1
pad=1
filters=45
activation=linear


[yolo]
mask = 0,1,2
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=10
num=9
jitter=.1
scale_x_y = 2.0
objectness_smooth=0
ignore_thresh = .7
truth_thresh = 1
#random=1
resize=1.5
iou_thresh=0.2
iou_normalizer=0.05
cls_normalizer=0.5
obj_normalizer=4.0
iou_loss=ciou
nms_kind=diounms
beta_nms=0.6
new_coords=1
max_delta=20

[route]
layers = -4

[convolutional]
batch_normalize=1
size=3
stride=2
pad=1
filters=320
activation=mish

[route]
layers = -1, -22

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=320
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=320
activation=mish

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=320
activation=mish

[route]
layers = -1,-8

[convolutional]
batch_normalize=1
filters=320
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=640
activation=mish

[convolutional]
size=1
stride=1
pad=1
filters=45
activation=linear


[yolo]
mask = 3,4,5
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=10
num=9
jitter=.1
scale_x_y = 2.0
objectness_smooth=1
ignore_thresh = .7
truth_thresh = 1
#random=1
resize=1.5
iou_thresh=0.2
iou_normalizer=0.05
cls_normalizer=0.5
obj_normalizer=1.0
iou_loss=ciou
nms_kind=diounms
beta_nms=0.6
new_coords=1
max_delta=5

[route]
layers = -4

[convolutional]
batch_normalize=1
size=3
stride=2
pad=1
filters=640
activation=mish

[route]
layers = -1, -55

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=640
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=640
activation=mish

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=640
activation=mish

[route]
layers = -1,-8

[convolutional]
batch_normalize=1
filters=640
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1280
activation=mish

[convolutional]
size=1
stride=1
pad=1
filters=45
activation=linear


[yolo]
mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=10
num=9
jitter=.1
scale_x_y = 2.0
objectness_smooth=1
ignore_thresh = .7
truth_thresh = 1
#random=1
resize=1.5
iou_thresh=0.2
iou_normalizer=0.05
cls_normalizer=0.5
obj_normalizer=0.4
iou_loss=ciou
nms_kind=diounms
beta_nms=0.6
new_coords=1
max_delta=2

toplinuxsir · 2020-12-06T22:56:27Z

@toplinuxsir

mAP is 0% for 1000% iterations, and is 5% for 2000 iterations.

But it seems that training with new_coords=0 is better than with new_coords=1

yolov4-csp.cfg (320x320) b=32 on MS COCO:
./darknet detector train F:/MSCOCO/coco_f.data cfg/yolov4-csp.cfg yolov4-csp.conv.142 -map

I trained for more than 9000 iterations but the mAP is still zero ,I will use you latest commit AlexeyAB/darknet@c47b24a

train again .

AlexeyAB · 2020-12-06T23:12:55Z

@toplinuxsir

Try to use the latest commit with new_coord=1. And use logistic activation instead of linear before [yolo] layers:

mAP increases faster with new_coord=1 than with new_coord=0

new_coord=1

new_coord=0

toplinuxsir · 2020-12-07T02:23:58Z

@AlexeyAB your new commit seems ok
After 1000 interations , the mAP is 86%, I will continue training until get the best result.
Thanks !

toplinuxsir · 2020-12-07T22:59:03Z

@AlexeyAB I trained for more than 6000 interations , I got the mAP for my custom dataset is: 99.45% , avg loss is 845(very big), but the opencv dnn model doese not work with yolov4x-mish.

toplinuxsir · 2020-12-08T04:26:11Z

@AlexeyAB The newest darknet version dose not save training results for every 1000 interations, Only two files : the best and the last.

abdulghani91 · 2021-01-14T09:13:33Z

@toplinuxsir if you are using Alexey darknet it saves every 10000 iterations, to change it to 1000 iteration go to src folder/ detector.c open it, line 385 you will found if the condition changes every 10000 with 1000 and it will work, I try it and it works for me.

if ((iteration >= (iter_save + 1000) || iteration % 1000 == 0) ||
(iteration >= (iter_save + 1000) || iteration % 1000 == 0) && net.max_batches < 1000)

toplinuxsir · 2021-01-14T23:05:24Z

@abdulghani91
Yes, AlexeyAB was already fixed it .

abdulghani91 · 2021-01-15T08:22:02Z

@toplinuxsir You mean it will save weight every 1000 iteration but for me every time I have to change the condition to make it save for every 1000 iteration, the latest version saves weights for every 10000itr.

toplinuxsir · 2021-01-17T01:50:20Z

@abdulghani91 Are you sure using the latest commit
The fixed commit:
AlexeyAB/darknet@b5ff7f4

abdulghani91 · 2021-01-17T07:25:31Z

@toplinuxsir I'm using google colab to train, and I try it before only for every 10000itr will store a weight file (10000, 20000, 30000), I try that before two weeks, but after I did the change on line 385 in the detector.c file the darknet start storing every 1000it, and I don't know if the condition that I change it right I just change every 10000 to 1000, and I clone the darknet from this link (https://github.com/AlexeyAB/darknet)
and to make sure about that I will run it again and give you feedback.

abdulghani91 · 2021-01-17T07:49:15Z

@abdulghani91 Are you sure using the latest commit
The fixed commit:
AlexeyAB/darknet@b5ff7f4

when I open the detector.c file in the Alexey darknet I found this condition:
if ((iteration >= (iter_save + 10000) || iteration % 10000 == 0) ||
(iteration >= (iter_save + 1000) || iteration % 1000 == 0) && net.max_batches < 10000)

but for the fixed that you mentioned they change the condition to:
if (iteration >= (iter_save + 10000) || iteration % 10000 == 0) {
if ((iteration >= (iter_save + 10000) || iteration % 10000 == 0) ||
(iteration >= (iter_save + 1000) || iteration % 1000 == 0) && net.max_batches < 10000)
so I think if I'm using the wright darknet link the detector.c is not updated or should I change it by myself

abdulghani91 · 2021-01-17T11:44:36Z

@toplinuxsir is this (AlexeyAB/darknet@b5ff7f4) the latest version or there is a newer one.

Fetulhak · 2021-04-23T07:35:45Z

Hi @toplinuxsir and @WongKinYiu, I was working on YOLOV4 with the latest version of Alexey Repo but I got the problem of -NAN loss. here is the output of my training log

When I train YOLOV4 the training log goes from iteration number 2 with the following output where some of the numbers are not -nan

2: -nan, -nan avg loss, 0.000000 rate, 34.954638 seconds, 128 images, 32.050234 hours left
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 139 Avg (IOU: 0.000000), count: 208, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 150 Avg (IOU: 0.000000), count: 220, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 161 Avg (IOU: 0.152179), count: 230, class_loss = 299.755157, iou_loss = 451.484528, total_loss = 751.239685

to iteration number 147 where all the numbers become -nan

147: -nan, -nan avg loss, 0.000000 rate, 23.773124 seconds, 9408 images, 20.681359 hours left
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 139 Avg (IOU: 0.000000), count: 152, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 150 Avg (IOU: 0.000000), count: 150, class_loss = -nan, iou_loss = -nan, total_loss = -nan
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 161 Avg (IOU: 0.000000), count: 141, class_loss = -nan, iou_loss = -nan, total_loss = -nan
total_bbox = 2658827, rewritten_bbox = 11.277079 %

any idea what the solution will be?

YhyBYK · 2021-09-04T08:00:35Z

Hi @toplinuxsir and @AlexeyAB ,

I am trying to fix same problem.
avg loss values too high for yolov4-csp model. (My dataset has 1920x1080 px drone based images, generally.)
I have already tried with 512x512 network size and I have started a new training with 416x416 network size but the avg loss values still too high.
Do you have any suggestion?

class: 12
iteration value:24000
Note: I have used of AlexeyAB's repo for yolov4-csp and only class, filter number, network size and iteration values has changed.

Thanks.

aseprohman · 2021-09-07T08:55:00Z

Hi @AlexeyAB,

if the image resolution I use for training in Yolov4 CSP is different, should I enable "random=1" ?

P-Phyoe · 2022-04-20T14:23:02Z

@AlexeyAB

I would like to use yolov4-leaky cfg file. If so, which weight file should I use?

elnaz-t · 2022-05-09T04:01:26Z

@AlexeyAB
Hi
I'm using yolov4.cfg for training on my dataset.(I'm using this address: https://github.com/Abhi-899/YOLOV4-Custom-Object-Detection)
after training 300 iteration, first I didn't get any bounding box for my images.
after searching about this problem, found that I should decrease the threshold. so I changed my 3 yolo layers so:

[yolo]
mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=3
num=9
jitter=.3
ignore_thresh = 0.07 ############## .7
truth_thresh = 1 ############ 1
random=1
scale_x_y = 1.05
iou_thresh= 0.213 ##############0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6
max_delta=5

now, I get alot of bounding boxes!!!
what should I do?
can any one help me?
my output image is in below link:
https://drive.google.com/file/d/1Jm7pAk8a89JgtPPeXLCLhRW_6hpV68l1/view?usp=sharing

aryannuha · 2024-04-28T09:16:48Z

@AlexeyAB

Tensor Cores are disabled until the first 3000 iterations are reached.
(next mAP calculation at 1000 iterations) H1000/6000: loss=0.2 hours left=1.4�
1000: 0.246971, 0.319099 avg loss, 0.002610 rate, 0.934611 seconds, 64000 images, 1.413358 hours left
calculation mAP (mean average precision)...
Detection layer: 30 - type = 28
Detection layer: 37 - type = 28
4
cuDNN status Error in: file: ./src/convolutional_kernels.cu function: forward_convolutional_layer_gpu() line: 541

cuDNN Error: CUDNN_STATUS_BAD_PARAM
Darknet error location: ./src/convolutional_kernels.cu, forward_convolutional_layer_gpu(), line #541
cuDNN Error: CUDNN_STATUS_BAD_PARAM: Success
backtrace (14 entries)
1/14: ./darknet(log_backtrace+0x38) [0x55fde50b2208]
2/14: ./darknet(error+0x3d) [0x55fde50b22ed]
3/14: ./darknet(+0x7ba70) [0x55fde50b4a70]
4/14: ./darknet(cudnn_check_error_extended+0x7c) [0x55fde50b506c]
5/14: ./darknet(forward_convolutional_layer_gpu+0x2c5) [0x55fde518fe75]
6/14: ./darknet(forward_network_gpu+0xc1) [0x55fde51a4431]
7/14: ./darknet(network_predict_gpu+0x140) [0x55fde51a7160]
8/14: ./darknet(validate_detector_map+0xa17) [0x55fde513c4c7]
9/14: ./darknet(train_detector+0x197d) [0x55fde513f15d]
10/14: ./darknet(run_detector+0xa32) [0x55fde5142e92]
11/14: ./darknet(main+0x332) [0x55fde5071772]
12/14: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7a477a9d4d90]
13/14: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7a477a9d4e40]
14/14: ./darknet(_start+0x25) [0x55fde50739f5]

after iteration 999, the training process suddenly stop, even though the number of max_batches=6000

wuzuiyuzui mentioned this issue Nov 18, 2020

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR #21

Open

arnaud-nt2i mentioned this issue Dec 7, 2020

Accuracy and speed of yolov4x-mish AlexeyAB/darknet#6987

Open

AhmedMesih mentioned this issue Jul 8, 2021

What' s difference .yaml and .cfg file in /models path for csp and large branch and is that important for custom training? #289

Open

How to train yolov4-csp via darknet ? #13

How to train yolov4-csp via darknet ? #13

Comments

toplinuxsir commented Nov 18, 2020

WongKinYiu commented Nov 18, 2020

toplinuxsir commented Nov 18, 2020

wuzhenxin1989 commented Nov 18, 2020

kadirbeytorun commented Nov 27, 2020

toplinuxsir commented Nov 29, 2020

WongKinYiu commented Nov 29, 2020

toplinuxsir commented Nov 30, 2020

kadirbeytorun commented Nov 30, 2020

toplinuxsir commented Nov 30, 2020

kadirbeytorun commented Nov 30, 2020

toplinuxsir commented Nov 30, 2020

AlexeyAB commented Dec 1, 2020 • edited

toplinuxsir commented Dec 1, 2020

toplinuxsir commented Dec 1, 2020 • edited

toplinuxsir commented Dec 1, 2020 • edited

AlexeyAB commented Dec 1, 2020

toplinuxsir commented Dec 1, 2020

toplinuxsir commented Dec 2, 2020

AlexeyAB commented Dec 2, 2020

toplinuxsir commented Dec 2, 2020

toplinuxsir commented Dec 2, 2020

AlexeyAB commented Dec 2, 2020

toplinuxsir commented Dec 3, 2020

toplinuxsir commented Dec 3, 2020 • edited

AlexeyAB commented Dec 3, 2020

toplinuxsir commented Dec 3, 2020

toplinuxsir commented Dec 3, 2020 • edited

toplinuxsir commented Dec 3, 2020

toplinuxsir commented Dec 4, 2020

AlexeyAB commented Dec 4, 2020

toplinuxsir commented Dec 5, 2020

AlexeyAB commented Dec 5, 2020

toplinuxsir commented Dec 6, 2020

toplinuxsir commented Dec 6, 2020

AlexeyAB commented Dec 6, 2020

toplinuxsir commented Dec 7, 2020

toplinuxsir commented Dec 7, 2020

toplinuxsir commented Dec 8, 2020

abdulghani91 commented Jan 14, 2021

toplinuxsir commented Jan 14, 2021

abdulghani91 commented Jan 15, 2021

toplinuxsir commented Jan 17, 2021 • edited

abdulghani91 commented Jan 17, 2021 • edited

abdulghani91 commented Jan 17, 2021

abdulghani91 commented Jan 17, 2021

Fetulhak commented Apr 23, 2021 • edited

YhyBYK commented Sep 4, 2021

aseprohman commented Sep 7, 2021

P-Phyoe commented Apr 20, 2022

elnaz-t commented May 9, 2022 • edited

aryannuha commented Apr 28, 2024 • edited

AlexeyAB commented Dec 1, 2020 •

edited

toplinuxsir commented Dec 1, 2020 •

edited

toplinuxsir commented Dec 1, 2020 •

edited

toplinuxsir commented Dec 3, 2020 •

edited

toplinuxsir commented Dec 3, 2020 •

edited

toplinuxsir commented Jan 17, 2021 •

edited

abdulghani91 commented Jan 17, 2021 •

edited

Fetulhak commented Apr 23, 2021 •

edited

elnaz-t commented May 9, 2022 •

edited

aryannuha commented Apr 28, 2024 •

edited