Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network is not learning. mAP stays the same and Total Loss does not go below 6 #5

Closed
gustavz opened this issue Jan 19, 2018 · 7 comments

Comments

@gustavz
Copy link

gustavz commented Jan 19, 2018

Hey Victor, i had the same idea to do a realtime capable hand detector.
First i also did it with the oxford dataset, but like you said it is not really good.
Then i saw your repo and tried the egohands dataset which looks really promising.

But with both datasets my mAp always stays at 0 and the total loss never goes below ~6.
If you look at the weights you see that they stayed around zero, so no learning at all.

Do you have an idea why this could be?

Here is a screenshot of my current training with the exact same config you are using.
(by the way, why do you use a batch size of only 6? Is this any good except of saving memory?)
screenshot from 2018-01-19 13-21-37
screenshot from 2018-01-19 13-21-11

I would really appreciate it if you could help me with this.

And yeah, your egohands_dataset_clean.py code is really nice 👍

EDIT: Are you sure the csv files that you create contain the correct bounding boxes? Could you share the csv files you used to create the tf record files? Thanks!

@victordibia
Copy link
Owner

victordibia commented Jan 19, 2018

Hi @gustavz

  • Zero mAP
    This is quite unusual, and I agree with you that an mAP of 0 suggests there is something wrong with the labelling. Given that the training is initialized from pretrained weights, its unlikely that its predictions are wrong all the time.

  • Batch Size
    This was selected only because of memory issues.

  • Original CSV files
    I am attaching my initial train/test label csv files, which you can compare to what is generated by the egohands script in the repo. I did some code cleanup after I built the original model so there is a slight possibility something changed slightly.

csvs.zip

Other Thoughts

Let me know how you resolve this.
P.S. I also wanted to combine both datasets but just havent had time to get around to it! Looking forward to learning more on your findings.

Thanks.
-V.

@gustavz
Copy link
Author

gustavz commented Jan 19, 2018 via email

@gustavz
Copy link
Author

gustavz commented Jan 22, 2018

I compared your csv files with mine and they are totally different.
Yours are perfectly ordered by name and mine are all shuffled.
Furthermore the values are not the same and the sets are of unequal size.

Here is an example for unequal values (see rest attached):
csvs_gustav.tar.gz
YOURS:

filename,width,height,class,xmin,ymin,xmax,ymax
CARDS_COURTYARD_B_T_frame_0011.jpg,1280,720,hand,647,453,824,551
CARDS_COURTYARD_B_T_frame_0011.jpg,1280,720,hand,515,431,622,543

MINE:

CARDS_COURTYARD_B_T_frame_0011.jpg,1280,720,hand,803,623,981,718
CARDS_COURTYARD_B_T_frame_0011.jpg,1280,720,hand,679,340,856,464
CARDS_COURTYARD_B_T_frame_0011.jpg,1280,720,hand,511,299,676,428

But the thing is, i did not create the csv files by my own, i used your script to create them from the default dataset.

How did you modify your code after the commit zu github? Could you update it?

Would be very thankfull!

@victordibia
Copy link
Owner

victordibia commented Jan 22, 2018

Hmmm ... I actually did a similar check on Friday before I sent you my original csv's. I generated train and test csv's using the script and compared them and there were similar values for the same images. Can you retry the process? I'll suggest you pull a fresh copy of the repo and start from scratch (i.e in a new folder ... where the dataset gets downloaded from scratch).
Let me know if you still get different values.

-V.

@gustavz
Copy link
Author

gustavz commented Jan 22, 2018

I did a complete delete and re-clone of your repo.
But the results are still the same, the csv are still in a shuffled order and train and test sets constist of different images with different bboxes than your csv files.

What could be the reason? I am using python 2.7 and openCV 3.3.1 if this could be important.

@gustavz
Copy link
Author

gustavz commented Jan 23, 2018

Hey Victor,
i still have not found the reason why my machine produces completely different csv files with the same code you are using.
Have you ever experienced something similar with creating csv files?
Could you maybe share your train/test directories including images and the final csv files?

EDIT: maybe it is something with the random 10% test dataset creation?
Could you come up with a version of your function def split_data_test_eval_train(image_dir):
that is not using random but maybe a fixed number?

EDIT2: Ok the test and train datasets created by your dataset can't be equal each time you run the code as you dont use random.seed()
But eitherways the code will work differently on python 2.x and 3.x because of this:
https://stackoverflow.com/questions/11929701/why-is-seeding-the-random-generator-not-stable-between-versions-of-python

@gustavz
Copy link
Author

gustavz commented Jan 23, 2018

Are you sure you convert the bounding boxes correctly?
Where do you set your coordinate center in the image? Top left or Bottom left?

Also i saw that You got some unused variables in your get_bbox_visualize() function:

index = 0
height = width = 0

idk if this was on purpose or forgotten for some value convertation?

furthermore in your readme you write that you split the data in three sets, but actually you only split in 2two, that's a bit confusing.

CLOSE REASON: sorting the image array solved all problems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants