-
Notifications
You must be signed in to change notification settings - Fork 45.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to train on custom dataset (Object Detection) #1863
Comments
Sorry for asking but hot did you produce your files to change for the "INSERT_PATH_HERE". I mean how did you produce the train.record and eval.record files needed to add in the above paths? |
@EmmanouelP I have a custom labeled dataset that was not in TFRecord form. So, I wrote a script to collect the labels from my dataset and output them as a TFRecord, which is essentially a file with a list of TFExample's. If you go here, you can see Tensorflow's sample script that does the same thing with another dataset that was downloaded online. https://github.com/tensorflow/models/blob/master/object_detection/create_pascal_tf_record.py |
@timisplump So basically you had your own raw dataset (with images and your annotation files) and you used one of the provided scripts (modified in some way) in order to produce the TFRecord form files? Thanks in advance for all the help.Just trying to make same custom training and compare/share results and maybe even solve your problem too :). |
Hi @timisplump - can you provide your labelmap too please? Sometimes that is at fault. |
@jch1
I'm still curious why that didn't work. Strangely enough, I reverted a few of the specs that I changed from the PETs file (learning rate, l2_regularizer weights) back to what they were, trained on a dataset of size 5000 (somewhat close to the pets dataset), and the training seemed to work correctly.
After the above changes, the specs were the same as the PET example except for batch_size (12, memory issues), num_classes (1 in my case), and image_resizer (height=504, width=960 b/c that's the size of my images). This allowed training to work for some reason. I doubt the Do you have any insight on what caused the original problem? |
@EmmanouelP Yeah, that's exactly what I did. I wrote my own script to retrieve them and then I stored them in the tfrecord file the same way that other script did ( |
@timisplump We currently ignore any class that has label index 0 (this is not very well documented, and we are in the process of adding better documentation). In your original label map, this would have caused your model to throw out all cars. |
@jch1 thanks a bunch for the reply. I bet that's the problem. Please document that soon so that others don't have to suffer through the pain I did! :) |
Yup, this is already in the works and my apologies that you had to go through this. Thanks for sticking it out! I'm closing this issue, but feel free to re-open if you have more to discuss. |
Hi Folks, I am facing issue while trying to run the train.py in Windows 10 system. Below is the error message what I am getting. PYTHONPATH set as Command I am usinng to train my model is I am struggling with this issue from last couple of days, any help/guidance to resolve this will be highly appreciable. Thanks, |
System information
Describe the problem
I am unable to train any of the pre-trained models on my own dataset. For testing purposes, i constructed a training dataset with only 1 image, so the model should simply learn to memorize that image's objects. This image is also used for the "test" set. Also, to make things simpler, I'm using only one class (cars) for detection.
I trained on this image with the SSD mobilenet and inception networks (and then tried again with Faster R-CNN, to the same results). Each model converged, or at least the loss went to 0. See below for training logs. However, when I ran
eval.py
on the latest saved model checkpoint, every single time it returns amAP
of 0.0. I froze the models using theexport_inference_graph.py
script, and output their detections using the iPython notebook and there are 25+ boxes, none of which are near any of the 9 cars in the image.I modified
trainer.py
so that it saves my model's checkpoint every minute of training, this way I don't have to wait until the saver decides to save the checkpoint. This was the only modification I made totrainer
or any of the training scripts.To construct my dataset, I used a custom script that took our annotations/labels and output them into TFRecords, the same way the examples did it. In my script, to be sure nothing weird was going on, I printed out the TFExample I wrote to file right before writing it. Below is the TFExample with the bytes_list omitted due to its size.
I've been debugging this issue for days.. Strangely, I am able to successfully train on the PETS dataset and the model appears to learn something when training on it. I'm really confused what I did wrong and what is making the model's loss go to 0 when it clearly isn't learning anything. Thanks for any help!
Source code / logs
Train logs
SSD_mobilenet config:
The text was updated successfully, but these errors were encountered: