Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various questions #13

Closed
abagshaw opened this issue Apr 13, 2018 · 8 comments
Closed

Various questions #13

abagshaw opened this issue Apr 13, 2018 · 8 comments

Comments

@abagshaw
Copy link

abagshaw commented Apr 13, 2018

First off, thanks very much for all your work on this - a really clean and understandable implementation of YOLO 馃槃 I have a few questions (probably very basic) about getting started with training and inference with tensornets - in specific to the YOLOv2 model:

  1. I am wanting to train YOLOv2 on the Open Images dataset. How can the tensornets implementation of YOLOv2 be used with a custom number of classes (545 in my case)?
  2. After training, how would one save the weights to a file and load them back in later for inference?
  3. Can layers be frozen during training (i.e. only train the last 4 layers) similar to what one can do in darknet with stopbackward=1?
  4. Is there support for training the YOLOv2 model with a TPU? If not, is there a somewhat straightforward path to adding that functionality?
  5. Can YOLOv2 be trained and used for inference with FP16 precision to speed things up - and can the existing pretrained weights be converted to FP16 so as to not necessitate training from scratch?

Thanks very much!

@taehoonlee
Copy link
Owner

Thank you for your attention, @abagshaw! Basically, the model functions always return tf.Tensor; thus you can manipulate models with any regular TensorFlow APIs. My answers are:

  1. I am planning to release the loss function and training codes for YOLO in a week. The following is an example of your own YOLOv2 (I think I should write a simpler API 馃槀):
from tensornets.utils import set_args
from tensornets.utils import var_scope

opts = nets.references.yolo_utils.opts('yolov2voc')
opts['classes'] = 545

@var_scope('yolov2')
@set_args(nets.references.yolos.__args__)
def yolov2_for_you(x, is_training=False, scope=None, reuse=None):
    def _get_boxes(*args, **kwargs):
        return nets.references.yolo_utils.get_v2_boxes(opts, *args, **kwargs)
    x = nets.references.yolos.yolo(x, [1, 1, 3, 3, 5, 5], 125, is_training, scope, reuse)
    x.get_boxes = _get_boxes
    return x
  1. You can use regular tf.train.Saver. Note that the restoring mechanism of TensorNets is:
np.save('your_weights', sess.run(model.get_weights()))  # save
sess.run([w.assign(v) for (w, v) in zip(model.get_weights(), np.load('your_weights.npz'))])  # load
  1. There are many ways to freeze weights, and I recommend:
train_list = model.get_weights()[6:]  # The first six weights are not going to be updated.
train = tf.train.AdamOptimizer().minimize(loss, var_list=train_list)
  1. I didn't do it, but it seems easy to couple models with a TPU (see the official example). If you don't mind, can you give it a try? 馃槃

  2. Very good point. I will add the FP16 supports, Thank you!

@abagshaw
Copy link
Author

@taehoonlee Thanks for the super quick response!

  1. Thanks for the example - makes sense. One other thing, shouldn't the number of filters be changed to 2750 given the number of classes?

I'll certainly see if I can get training up and running on a TPU when I have a chance after the loss function is implemented. Thanks again for all your work on this!

@taehoonlee
Copy link
Owner

You are right @abagshaw, (# classes + 5) * # anchors = (545 + 5) * 5 = 2750. Thank you so much!

@taehoonlee
Copy link
Owner

@abagshaw, Please see #14 for the FP16 supports. The native operations support the FP16 since TF>=1.1.0 except that tf.nn.depthwise_conv2d, tf.nn.separable_conv2d, and tf.contrib.layers.batch_norm do TF>=1.5.0. Therefore, you can just put tf.placeholder(tf.float16) and then call all the network functions when TF>=1.5.0.

@abagshaw
Copy link
Author

@taehoonlee Awesome, thanks! Also, no pressure - but any progress on YOLO training? 馃槂

@taehoonlee
Copy link
Owner

@abagshaw, Almost finished 馃敟.

@taehoonlee
Copy link
Owner

@abagshaw, I wrote and tested training codes based on darknet and darkflow, but learning did not progress at 5% mAP. Thus, I added a draft version first and will revise it with pytorch implementation.

Would you have the bandwidth to review the draft, @abagshaw? I would appreciate if you could find any reasons for poor convergence 馃槶.

@abagshaw
Copy link
Author

@taehoonlee Awesome, thanks so much! Right now I'm swamped with work so I don't see myself getting time to tinker around with it - but if a break comes up I'll certainly give it a shot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants