Skip to content

Conversation

Viktor-Nilsson
Copy link

Added an optional parameter that allows passing a path to a checkpoint file when calling objectdetector.create()
If a checkpoint path is passed, the underlying tf.keras.model will load the model weights from the checkpoint before training is started.

Copy link

@bhack bhack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extend the test to cover this new option?

@Viktor-Nilsson
Copy link
Author

@MarkDaoust
I'm not sure how to do that in a good way. The existing test uses a random generated .jpg ( to avoid binary files in the git repo ?) .

I could add a new test case that loads one of my existing trained checkpoints, evaluates the model and verifies the test case and that weights are loaded by checking that the AP is high enough.
Adding a checkpoint file for efficientdet-lite0 in the git repo is however not so nice since it is ~ 33MB of binary data.
Thoughts?

@MarkDaoust
Copy link
Member

I'm not sure why the CODEOWNERS file didn't assign Khanh and Lu directly. They're the real owners here.

@bhack
Copy link

bhack commented Jan 18, 2022

I'm not sure why the CODEOWNERS file didn't assign Khanh and Lu directly. They're the real owners here.

Cause the pattern is wrong. Here we are in a subdir that It is only covered by your global.

@MarkDaoust
Copy link
Member

Oh, right. I'll send a fix for that.

@khanhlvg
Copy link
Member

@ziyeqinghan Could you take a look?

@ThuhinSatheesh
Copy link

ThuhinSatheesh commented Feb 21, 2022

I tried training a model from a checkpoint but while training the losses returned NaN values. Is there any way around this or am I doing something wrong?
error

@MarkDaoust MarkDaoust requested review from khanhlvg and removed request for MarkDaoust and wolffg February 22, 2022 20:33
@IvanColantoni
Copy link

IvanColantoni commented Apr 7, 2022

Hi and thanks for this.
I would add that loading weights with model.load_weights() method didn't work in my case.
I restored the checkpoint from model_dir by importing the function :
from tensorflow_examples.lite.model_maker.third_party.efficientdet.keras.util_keras import restore_ckpt in the object_detector_spec.py file and calling it in the if block before the model.fit() method as you suggested:

if load_checkpoint_path is not None:
       restore_ckpt(model,load_checkpoint_path)

From what I understand this is because checkpoint for EfficientDetNetTrainHub are different and need a custom function to correctly restore them. Not sure about it though.
be sure that in load_checkpoint_path dir there are ckpt-xx.dataxxx , ckpt-xx.index plus a checkpoint plain text file with the number of checkpoint you want to restore e.g:

from my terminal in model_dir path:

cat checkpoint give

model_checkpoint_path: "ckpt-100"
all_model_checkpoint_paths: "ckpt-100"

@imneonizer
Copy link

Since there is no option to create issues, I just have a question how to do multi GPU training using tflite model maker ?
https://github.com/tensorflow/examples/blob/master/tensorflow_examples/lite/model_maker/core/task/object_detector.py#L73-L75

@grewe
Copy link

grewe commented Oct 26, 2022

This does not seem to be in the actual code, yet I see a commit here. What is the status?

@Viktor-Nilsson
Copy link
Author

Closing pr since it was reported not to work for other who attempted to use the code and I have no capacity to further investigate it.

@Viktor-Nilsson Viktor-Nilsson deleted the vnilsson_load_checkpoint branch October 27, 2022 07:30
@Bede-sv
Copy link

Bede-sv commented Jan 11, 2023

@Viktor-Nilsson This worked for me when I tried it

@justingrayston
Copy link

I'd be keen to get this supported too, and as I am sure many others would as the ability to improve your own custom model is key without being wasteful with GPU retraining on data you've already trained with before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S CL Change Size: Small

Projects

None yet

Development

Successfully merging this pull request may close these issues.