cannot save/restore contrib.learn.DNNClassifier #3340

Closed
llealgt opened this Issue Jul 16, 2016 · 13 comments

Projects

None yet

8 participants

@llealgt
llealgt commented Jul 16, 2016

Hi , i been struggling some days trying to save a contrib.learn.DNNClassifier and im getting desperate can you help me? I tried everything it says in official documentation, but it sees as if the documentation isnt coherent with the API, things i have tried are:

  • rain.Saver() ,but got a "No variables to save" error
  • tried https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/learn/python/learn this examples, created my DNNClassifier but when trying to call function save() on my classifier , it says 'DNNClassifier' object has no attribute 'save'
  • Tried the deprecated class TensorflowDNNClassifier and it can be saved right ,but when you try to restore it ,it says that theres no model to restore.
  • Tried to restore the saved TensorflowDNNClassifier with Estimator.restore() but it says that Estimator has no attribute restore.

Is there a way to save and restore a DNNClassifier? This question is asked multiple times in stackoverflow and in https://gitter.im/tensorflow/skflow

I would be very thankfull if you can help me.

@terrytangyuan
Member

@llealgt Thanks for reporting. I believe you are not able to restore at this moment. There are checkpoint loading util functions but they are not integrated with estimators yet.
@martinwicke Any ideas on what's the timing on those? Many people are having this issue.

@llealgt
llealgt commented Jul 18, 2016

Hello guys , any news on this ? :)

@terrytangyuan
Member

Actually, I missed this earlier but you can restore by specifying same model_dir when you call the constructor and it will load the saved model for you. Let me know if that solves your issue. Thanks.

@SuperJonotron

Similarly trying to figure this one out using the example here using a DNNClassifier:
https://www.tensorflow.org/versions/r0.9/tutorials/tflearn/index.html

It looks like the answer currently provided is only for deprecated TensorflowDNNClassifier (as it pertained to restore only) and does not address the initial question of saving and restoring the new DNNClassifier.

Is the functionality for the DNNClassifier to be saved and restored currently in place? If so, could we see an example and/or be pointed to the documentation where this is explained?

@terrytangyuan
Member

Check master version of the doc and use what I described above.

@SuperJonotron

Couldn't find anything in the docs that actually explains the behavior but it looks like as long as you have the model_dir in the constructor it is automatically saved and/or restored with nothing else needing to be done.

@terrytangyuan
Member

Yeah it needs to be updated/clarified

On Tuesday, July 19, 2016, SuperJonotron notifications@github.com wrote:

Couldn't find anything in the docs that actually explains the behavior but
it looks like as long as you have the model_dir in the constructor it is
automatically saved and/or restored with nothing else needing to be done.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#3340 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEEnSlO3k7YGJfVquvwt6OseW-KJ_Hixks5qXZohgaJpZM4JN8cS
.

@llealgt
llealgt commented Jul 20, 2016

Thank you guys , i tested it and it seems as it works, my test was: created a DNNClassifier specifiyng model_dir on constructor, fit(), used get_variable_value to wet weights of the last layer , created another DNNClassifier using the same model_dir in constructor, used get_variable_value and the same weights were printed, so i worked as you said, but now i have some additional questions
in this case from what i saw, you can "pause" and "resume" training using the files generated, what happens if you are training and power goes off or something happens? does it stores multiple checkpoints and will use last one? Creating a model and traing with for example 100 steps, and then restoring it with this method and training another 100 steps is equivalent to training 200 steps from the beggining?

Now my other question is, i trained using model_dir, then created another model with the same model_dir to restore it, but when tried predict with the restored model i got the eerror : ValueError: Either linear_feature_columns or dnn_feature_columns should be defined.
So i had to train 5 steps and then predict, is there a way to restore and predict without having to do this "dummy training" ? Can you get the total count of training steps using .get_variable_value () ? And the last one: can you create a multi-output classifier? like sending a list or array of outputs when you train, and then get a list of the same size when you predict?

@martinwicke
Member
martinwicke commented Jul 20, 2016 edited

Right now, there isn't a way to restore and predict without running at least one step of training (which is a missing feature).

You can get the value of global_step (which is the total number of training steps).

We're working on a multi-headed classififer.

@llealgt
llealgt commented Jul 20, 2016

Thank you guys! This answers will help me on my current task. Greetings!

@michaelisard
Member

Closing for now. @martinwicke please open a separate issue if you want to track the feature of restoring without predicting.

@IncubatedChun

I am using v0.10 and did learn.DNNRegressor(..., model_dir='some path') and then
new_regressor = learn.DNNRegressor('path where the model is') and got the error

tensorflow.contrib.learn.python.learn.estimators._sklearn.NotFittedError: Couldn't find trained model at /var/folders/yf/gdqcvwpd67j98_zn92qy3bl80000gn/T/tmpjp19wmrx.

while doing new_regressor.predict(some data).
I checked the path and the model indeed is saved.

@npakhomova
npakhomova commented Nov 29, 2016 edited

Hi, I've faced with the same problem.
The only Estimator that have save/restore methods are TensorFlowEstimator and it is deprecated and restore method throws NotImplementedError
@terrytangyuan Thank you for advising using the same model_dirrectory for restoring model.
But I've faced with problem, that when restoring model from directory - it is also necessary to set (somehow) self._targets_info to initialize target variable in tensorflow.contrib.learn.python.learn.estimators.estimator.Estimator#_get_predict_ops method

I've found the only way how to do it: use classifier.fit method with step=0

So this is how my restore method looks like:

classifier = learn.Estimator(model_fn=myModel,model_dir=modelPath)
classifier.fit(train, target , steps=0)
classifier.predict(textForClassification,  as_iterable=True)

train can be empty, but dimentional must be equal to initial, target - must contain target classes for classification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment