Network is not learning. mAP stays the same and Total Loss does not go below 6 #5

gustavz · 2018-01-19T12:02:07Z

Hey Victor, i had the same idea to do a realtime capable hand detector.
First i also did it with the oxford dataset, but like you said it is not really good.
Then i saw your repo and tried the egohands dataset which looks really promising.

But with both datasets my mAp always stays at 0 and the total loss never goes below ~6.
If you look at the weights you see that they stayed around zero, so no learning at all.

Do you have an idea why this could be?

Here is a screenshot of my current training with the exact same config you are using.
(by the way, why do you use a batch size of only 6? Is this any good except of saving memory?)

I would really appreciate it if you could help me with this.

And yeah, your egohands_dataset_clean.py code is really nice 👍

EDIT: Are you sure the csv files that you create contain the correct bounding boxes? Could you share the csv files you used to create the tf record files? Thanks!

victordibia · 2018-01-19T17:51:16Z

Hi @gustavz

Zero mAP
This is quite unusual, and I agree with you that an mAP of 0 suggests there is something wrong with the labelling. Given that the training is initialized from pretrained weights, its unlikely that its predictions are wrong all the time.
Batch Size
This was selected only because of memory issues.
Original CSV files
I am attaching my initial train/test label csv files, which you can compare to what is generated by the egohands script in the repo. I did some code cleanup after I built the original model so there is a slight possibility something changed slightly.

csvs.zip

Other Thoughts

Something you could do is verifying if training proceeds reasonably using only the tf records before combining it with the oxford dataset.
Also see the note here Unable to train on custom dataset (Object Detection) tensorflow/models#1863

Let me know how you resolve this.
P.S. I also wanted to combine both datasets but just havent had time to get around to it! Looking forward to learning more on your findings.

Thanks.
-V.

gustavz · 2018-01-19T17:57:49Z

Hey Victor, Thanks for your quick reply! I will compare the csv files on Monday when I’m back on my machine. The training I was talking about was done only on the egohands dataset. So no combining with oxford in this part. The strange thing I had similar bad results on training solely on the Oxford dataset as this time on the egohands. But training with all my other datasets runs as expected, that’s why I assumed there is something wrong with the csv files, as my tf 1.4 setup/config as well as my csv to tfrecord script appears to work normal on other sets...

…

Am 19.01.2018 um 18:51 schrieb Victor D ***@***.***>: Hi @gustavz Zero mAP This is quite unusual, and I agree with you that an mAP of 0 suggests there is something wrong with the labelling. Batch Size This was selected only because of memory issues. Original CSV files I am attaching my initial train/test label csv files, which you can compare to what is generated by the egohands script in the repo. I did some code cleanup after I built the original model so there is a slight possibility something changed slightly. csvs.zip Something you could do is verifying if training proceeds reasonably using only the tf records before combining it with the oxford dataset. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

gustavz · 2018-01-22T08:44:48Z

I compared your csv files with mine and they are totally different.
Yours are perfectly ordered by name and mine are all shuffled.
Furthermore the values are not the same and the sets are of unequal size.

Here is an example for unequal values (see rest attached):
csvs_gustav.tar.gz
YOURS:

filename,width,height,class,xmin,ymin,xmax,ymax
CARDS_COURTYARD_B_T_frame_0011.jpg,1280,720,hand,647,453,824,551
CARDS_COURTYARD_B_T_frame_0011.jpg,1280,720,hand,515,431,622,543

MINE:

CARDS_COURTYARD_B_T_frame_0011.jpg,1280,720,hand,803,623,981,718
CARDS_COURTYARD_B_T_frame_0011.jpg,1280,720,hand,679,340,856,464
CARDS_COURTYARD_B_T_frame_0011.jpg,1280,720,hand,511,299,676,428

But the thing is, i did not create the csv files by my own, i used your script to create them from the default dataset.

How did you modify your code after the commit zu github? Could you update it?

Would be very thankfull!

victordibia · 2018-01-22T14:04:35Z

Hmmm ... I actually did a similar check on Friday before I sent you my original csv's. I generated train and test csv's using the script and compared them and there were similar values for the same images. Can you retry the process? I'll suggest you pull a fresh copy of the repo and start from scratch (i.e in a new folder ... where the dataset gets downloaded from scratch).
Let me know if you still get different values.

-V.

gustavz · 2018-01-22T14:47:11Z

I did a complete delete and re-clone of your repo.
But the results are still the same, the csv are still in a shuffled order and train and test sets constist of different images with different bboxes than your csv files.

What could be the reason? I am using python 2.7 and openCV 3.3.1 if this could be important.

gustavz · 2018-01-23T12:48:28Z

Hey Victor,
i still have not found the reason why my machine produces completely different csv files with the same code you are using.
Have you ever experienced something similar with creating csv files?
Could you maybe share your train/test directories including images and the final csv files?

EDIT: maybe it is something with the random 10% test dataset creation?
Could you come up with a version of your function def split_data_test_eval_train(image_dir):
that is not using random but maybe a fixed number?

EDIT2: Ok the test and train datasets created by your dataset can't be equal each time you run the code as you dont use random.seed()
But eitherways the code will work differently on python 2.x and 3.x because of this:
https://stackoverflow.com/questions/11929701/why-is-seeding-the-random-generator-not-stable-between-versions-of-python

gustavz · 2018-01-23T16:10:47Z

Are you sure you convert the bounding boxes correctly?
Where do you set your coordinate center in the image? Top left or Bottom left?

Also i saw that You got some unused variables in your get_bbox_visualize() function:

index = 0
height = width = 0

idk if this was on purpose or forgotten for some value convertation?

furthermore in your readme you write that you split the data in three sets, but actually you only split in 2two, that's a bit confusing.

CLOSE REASON: sorting the image array solved all problems

gustavz closed this as completed Jan 24, 2018

victordibia mentioned this issue Feb 9, 2018

speed issue, 1FPS #10

Closed

Yemen-Romanian mentioned this issue Jun 20, 2019

Problem while training: zero mAp. #53

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network is not learning. mAP stays the same and Total Loss does not go below 6 #5

Network is not learning. mAP stays the same and Total Loss does not go below 6 #5

gustavz commented Jan 19, 2018 •

edited

victordibia commented Jan 19, 2018 •

edited

gustavz commented Jan 19, 2018 via email

gustavz commented Jan 22, 2018 •

edited

victordibia commented Jan 22, 2018 •

edited

gustavz commented Jan 22, 2018 •

edited

gustavz commented Jan 23, 2018 •

edited

gustavz commented Jan 23, 2018 •

edited

Network is not learning. mAP stays the same and Total Loss does not go below 6 #5

Network is not learning. mAP stays the same and Total Loss does not go below 6 #5

Comments

gustavz commented Jan 19, 2018 • edited

victordibia commented Jan 19, 2018 • edited

gustavz commented Jan 19, 2018 via email

gustavz commented Jan 22, 2018 • edited

victordibia commented Jan 22, 2018 • edited

gustavz commented Jan 22, 2018 • edited

gustavz commented Jan 23, 2018 • edited

gustavz commented Jan 23, 2018 • edited

gustavz commented Jan 19, 2018 •

edited

victordibia commented Jan 19, 2018 •

edited

gustavz commented Jan 22, 2018 •

edited

victordibia commented Jan 22, 2018 •

edited

gustavz commented Jan 22, 2018 •

edited

gustavz commented Jan 23, 2018 •

edited

gustavz commented Jan 23, 2018 •

edited