-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guideline to train against other datasets with different classes #31
Comments
Good suggestion on self-driving dataset. Indeed |
Well I wonder if it's possible to dynamically construct a FC layer according to the number of classes you have in a labels.txt. For example, if I want to use yolo_tiny, but for a 5 class dataset rather than a 20class dataset, we could reformat the FC layers to generate appropriate numbers of output. In darkflow's current form, I would have to modify the yolo_tiny.cfg file, and tell the training script to ignore the FC weights and reinitialize new ones? |
That's a good suggestion too. The current design of To completely initialize the new net, just leave the |
Just so I understand you fully, when you say "identical layer", I just need to not modify the layers I don't want to change, and darkflow would detect changes in a new cfg file and initialize those variables properly? |
Correct, notice the first word is also there: The first matching layers are reused. The first mismatch will cause the left of the net to be initialized |
@thtrieu just realized layers don't have IDs, and to introduce change the cfg file, you actually have the change the layer structure. Just wondering if I could avoid that? The reasons being, I want to swap out FC layers, train them and finetune the entire network with lower learning rate. I want to load the pretrain weights, still train it. Is it possible to add an extra parameter that specify train=true and reinitilize=random or something of that sorts for each layer? |
Surely you can do that, but it will require source code modification though. |
Ok, working on that right now. Could you please point me to where the process script decide to reinitialise a layer when changes are detected? |
A bit complicated:
Hope this helps. |
Reading through weight_loader in loader.py, I'm having a hard time locating the exact line where the signature is compared and rejected. Could you kindly clarify? In the mean time, I'm planning to not touch the convolution layers at all and swap out the FC and detection layer with the following.
Will keep you updated on how it works |
|
and the number of classes in the detection layer as you have mentioned before right? |
Yes.
Will result in: |
Training now! To detail what i'm doing, I loaded the CSV annotation file from udacity dataset to produce dumps in the same format you expect in data.py, and in the udacity dataset there are 5 difference classes. Will keep you posted, and any tips would be much appreciated! |
The bullet points at the end of this post might be helpful https://thtrieu.github.io/notes/Fine-tuning-YOLO-4-classes#hand-picking-good-feature Besides, I would love to reference your training results/demo on this repo's README. If that's okay, do notify me when you're ready. |
Really good tips. I have the following for sample size at the moment
Loss converged to 3.0 now. Will run the regular to see if it's reasonable |
@thtrieu the loss shows up as 2.4, but when I perform testing using my test point, the probability produces nan. Just wondering if you have any clue how that could be possible? I'm guessing that nan would've been produced during training as well? |
Can you describe in detail what commands you did to obtain these results. They all seem new to me. |
To train I did: To run I did: Interestingly, when I pass in -1 to --load to load the latest check point to both --train and --test option, I got the following output
Failing to load any convolution layers it seems, no wonder it spits out NaN :( It does not do this when I pass in weights file as my --load argument. Seem to tell me that there might be versioning issues with ckpt format. I'm currently using TensorFlow 12.1, if it helps. tiny-yolov1-5c.cfg was modified from tiny-yolov1.cfg, with changes to [connected] and [detection] posted above. |
It is totally okay with the The strange thing to me is, how can you get any loss value when running a |
Ah good to know that it's loading the weights. I don't actually have a NaN loss value, what I'm referring to is NaN matrix it produces when I run a forward pass during the test procedure. I printed out the result of line 94 in net/flow.py: and out showed up as a NaN matrix, which makes it hard to believe that it would've produced a valid loss during training? |
NaN is not necessarily the probabilities in YOLO's formulation. It can be the coordinate offset, confidence, class, etc. You can always check to see what is the output matrix during training by putting If the matrices is indeed NaN during training, then there is a scaling problem due to overusing the old weights (N-1 layers are reused with totally different classes of object, and v1.1 is using Batch-Norm with arbitrary large scaling/offset parameters). To check this, try running the model without loading from any |
Thanks for the tips. I'm not sure what you mean by "putting And I'm seeing the same NaN matrix coming out of |
By fetches = [self.train_op, loss_op, self.top.out, self.top.inp.out, self.top.inp.inp.out, self.top.inp.inp.inp.out] will allow you to fetch the train op (meaning to train the net), loss op (too see the loss), and the last four layers' output matrix. You can certainly use a loop to create this list, the way I did above is just illustrative. If you were able to print the output of all intermediate layers, then it will be easier to debug your program (to see the NaN problem starts to happen at which layer). I believe this is a problem-specific issue because YOLO models on PASCAL VOC dataset all running fine. |
Used your command to fetch the intermediate layer outputs, and I actually don't see nan output at the last few layers during training, but I do see nan output during testing which starts at I would expect that if the network is producing NaN results, it would've done so during training as well? |
Found out something really peculiar. I downloaded the tiny-yolo.weights from link referred to by the yolov1 site, and found out the link actually points to tiny-yolov2 weights. This is proven by successful load of the final convolution layer when I use the v2 tiny-yolo.cfg. The NaN starts right at that layer as well, so I'm going to try tracking down the correct tiny-yolov1 weights, and train against it. |
yes, the official site of YOLO is now providing YOLO9000 only. If you want older versions, tell me and I'll upload them. |
If you could upload tiny-yolo-v1, that would be much appreciated. Just so you know when I try to load yolov1.weights, the walker asserts "Over-read". Not sure if you wish to maintain yolov1 loading anymore, but i thought I would bring that to your attention. |
to be clear, There is v1.0 (without batch-norm), v1.1 (with batch-norm) and v2 (yolo9000). Which one are you referring to? It might be this |
Just to update you on this, I'm training the weights you provided using v1.1/tiny-yolov1.cfg, with 5 classes modification I made above. The loss is around 2.2, and the output are not really valid. Will try to keep it going for one more day before I give up :) I had to disable the following assert in line 74 of loader.py to load the tiny-yolov1.weights at all.
|
Training YOLO can be a daunting task, especially for those with limited computational resources. I encourage you to go a little further.
|
It's odd the training loss for tiny-yolov1.weights is around the same 1.8-2.0 region, yet it actually makes sensible detections. I do have a GTX 1070, so I'm doing a bit better than running purely on CPU. Will keep you posted tmr. |
Getting some results that makes sense now! Yolo is picking up cars in the dataset, although the bounding box is often drawn with an offset and with the wrong width/height. |
make sure you are using Python3, or convert your code to appropriate one because there is a difference between integer/float division between python2 and python3 that can make a consistent mislocation of bounding boxes. |
Yeesh, I'm fairly certain that I'm not using Python 3 at the moment. Will try that. In general, the bounding box seems to be very small, which can be caused by the small bounding box annotations in the Udacity datasets (some times it gets below 5 pixels in width or height). If that doesn't improve things, I'll move to Python 3. |
It's not converging the right solutions :( the boxes show up at roughly the right place but the sizes are wrong. I'll put the code up on my fork for anyone to investigate! |
That will single out many possibilities. Debugging Deep Learning application is not simple. |
Overfitting did the trick!! Will post my results shortly. Thanks alot for your help. |
@thtrieu, here's my fork for training against the Udacity SDC dataset: https://github.com/y22ma/darkflow/tree/udacity Udacity employs a different annotation format than PASCAL VOC, and I hacked the dataset.py script to load the udacity annotation using my function. How would you like this to be handled? |
Could you please say more about the theory behind the step 3? |
Hello there, I am really interested in using this library for training on my own datasets. I have some problems when trying to test few images after training. Could you help me to understand better how it works? |
While testing I have the following output:
but on testing images it detects nothing. Do you have any ideas what's wrong? |
In what format should be annotations - xml or some other formats are acceptable? |
@eugtanchik into $DARKFLOW_ROOT/net/yolov2/test.py to print boxes.probs, make sure your confidence beyond the threshold |
Hi, |
E:\Users\ZP\Desktop\Getdata>flow.py --model cfg/yolov2-tiny-voc.cfg --load bin/yolov2-tiny-voc.weights --savepb Parsing ./cfg/yolov2-tiny-voc.cfg |
A guideline to train against other datasets such as the udacity self driving dataset would be much appreciated.
Do I create a labels.txt in the root folder, and specify a model name outside of coco_models and voc_models listed in darkflow/misc.py?
The text was updated successfully, but these errors were encountered: