-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
evaluating the trained model performance based on ckpt file #19
Comments
@YunYang1994 Hey, could you give some advice towards to these issues? |
Have you solved it? i also meet same problem |
add _ = tf.Variable(initial_value='fake_variable') before saver = tf.train.Saver(), it works in my codes. You can try |
@WeifaGan Thanks a lot for sharing! I have tried to add " add _ = tf.Variable(initial_value='fake_variable') " before saver = tf.train.Saver(), but there is an error: NotFoundError (see above for traceback): Key Variable not found in checkpoint Here is my question. I use the previous version of train.py, it can run and create But when I use the latest version of train.py, it shows return object_mask, intersect_area, iou_scores"the error comes: could you share your train.py? give some advice? |
I met the errors you mentioned above. |
@WeifaGan Hi, you are so helpful and thanks very much. and for the IndexError, I have tried to # return object_mask, intersect_area, iou_scores, but the error comes: => loading yolov3/darknet-53/Conv_49/BatchNorm/gamma:0 During handling of the above exception, another exception occurred: Traceback (most recent call last): Caused by op 'yolov3/PyFunc_1', defined at: InvalidArgumentError (see above for traceback): ValueError: could not broadcast input array from shape (10,4) into shape (8,4) You can see there are two problems: one is the training loss equals to nan, another one is the shape error. do you have some advice? |
@qiaomai89 For the second issue, I changed "true_boxes_batch[i][0][0][0][0:len(true_boxes_per_layer)] = true_boxes_per_layer " in line about 370 of yolo3.py as follows: After modifing, I think you can run. |
@WeifaGan you are really sooooooooo nice, kind,patient and helpful! I really appreciate your help! now I can run both training and testing. By the way, what is your model performance? is it equal to the one trained by the codes provided by the author(darknet) in the same data set? |
@qiaomai89 |
@WeifaGan Hi, I am training both two frames(darknet and tf) in the same data set, if there is any result, I will let you know. And if there is anything you find, pls share with me. Thanks! |
@WeifaGan could you do testing on ckpt files? I have tried two ways: first, running convert_weight.py, and I get three pb files. Then running nms_demo.py to see the jpg results, but there is nothing on the picture, and there is the log: Another one I have tried: running test.py on tfrecords data, and there is the log: Any advice about this one? |
@qiaomai89 For convenient communication, I think we can add QQ or Wechat. |
@WeifaGan you can add me wechat: 455741772 |
@WeifaGan Hi: I follow your suggestion code change and also lower score_thresh =0.1 but I still got the same result as @qiaomai89 did: l GPU (device: 0, name: TITAN V, pci bus id: 0000:65:00.0, compute capability: 7.0) do you solve this issue? |
Hi,
But there is a problem that when we run the train.py, three ckpt files are saved. But how can we run these model files to test performance?
Traceback (most recent call last):
File "test.py", line 28, in
saver = tf.train.Saver()
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1293, in init
self.build()
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1302, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1327, in _build
raise ValueError("No variables to save")
ValueError: No variables to save
Another way I tried convert_weight.py --ckpt_file file --freeze, but it says
Traceback (most recent call last):
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [255] rhs shape= [33]
[[Node: save/Assign_349 = Assign[T=DT_FLOAT, _class=["loc:@yolov3/yolo-v3/Conv_6/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](yolov3/yolo-v3/Conv_6/biases, save/RestoreV2/_149)]]
Could you help me with this problem? Thanks a lot!
The text was updated successfully, but these errors were encountered: