Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot save/restore contrib.learn.DNNClassifier #3340

Closed
llealgt opened this issue Jul 16, 2016 · 28 comments
Closed

cannot save/restore contrib.learn.DNNClassifier #3340

llealgt opened this issue Jul 16, 2016 · 28 comments
Labels

Comments

@llealgt
Copy link

@llealgt llealgt commented Jul 16, 2016

Hi , i been struggling some days trying to save a contrib.learn.DNNClassifier and im getting desperate can you help me? I tried everything it says in official documentation, but it sees as if the documentation isnt coherent with the API, things i have tried are:

  • rain.Saver() ,but got a "No variables to save" error
  • tried https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/learn/python/learn this examples, created my DNNClassifier but when trying to call function save() on my classifier , it says 'DNNClassifier' object has no attribute 'save'
  • Tried the deprecated class TensorflowDNNClassifier and it can be saved right ,but when you try to restore it ,it says that theres no model to restore.
  • Tried to restore the saved TensorflowDNNClassifier with Estimator.restore() but it says that Estimator has no attribute restore.

Is there a way to save and restore a DNNClassifier? This question is asked multiple times in stackoverflow and in https://gitter.im/tensorflow/skflow

I would be very thankfull if you can help me.

@terrytangyuan
Copy link
Member

@terrytangyuan terrytangyuan commented Jul 16, 2016

@llealgt Thanks for reporting. I believe you are not able to restore at this moment. There are checkpoint loading util functions but they are not integrated with estimators yet.
@martinwicke Any ideas on what's the timing on those? Many people are having this issue.

@llealgt
Copy link
Author

@llealgt llealgt commented Jul 18, 2016

Hello guys , any news on this ? :)

@terrytangyuan
Copy link
Member

@terrytangyuan terrytangyuan commented Jul 19, 2016

Actually, I missed this earlier but you can restore by specifying same model_dir when you call the constructor and it will load the saved model for you. Let me know if that solves your issue. Thanks.

@SuperJonotron
Copy link

@SuperJonotron SuperJonotron commented Jul 20, 2016

Similarly trying to figure this one out using the example here using a DNNClassifier:
https://www.tensorflow.org/versions/r0.9/tutorials/tflearn/index.html

It looks like the answer currently provided is only for deprecated TensorflowDNNClassifier (as it pertained to restore only) and does not address the initial question of saving and restoring the new DNNClassifier.

Is the functionality for the DNNClassifier to be saved and restored currently in place? If so, could we see an example and/or be pointed to the documentation where this is explained?

@terrytangyuan
Copy link
Member

@terrytangyuan terrytangyuan commented Jul 20, 2016

Check master version of the doc and use what I described above.

@SuperJonotron
Copy link

@SuperJonotron SuperJonotron commented Jul 20, 2016

Couldn't find anything in the docs that actually explains the behavior but it looks like as long as you have the model_dir in the constructor it is automatically saved and/or restored with nothing else needing to be done.

@terrytangyuan
Copy link
Member

@terrytangyuan terrytangyuan commented Jul 20, 2016

Yeah it needs to be updated/clarified

On Tuesday, July 19, 2016, SuperJonotron notifications@github.com wrote:

Couldn't find anything in the docs that actually explains the behavior but
it looks like as long as you have the model_dir in the constructor it is
automatically saved and/or restored with nothing else needing to be done.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#3340 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AEEnSlO3k7YGJfVquvwt6OseW-KJ_Hixks5qXZohgaJpZM4JN8cS
.

@llealgt
Copy link
Author

@llealgt llealgt commented Jul 20, 2016

Thank you guys , i tested it and it seems as it works, my test was: created a DNNClassifier specifiyng model_dir on constructor, fit(), used get_variable_value to wet weights of the last layer , created another DNNClassifier using the same model_dir in constructor, used get_variable_value and the same weights were printed, so i worked as you said, but now i have some additional questions
in this case from what i saw, you can "pause" and "resume" training using the files generated, what happens if you are training and power goes off or something happens? does it stores multiple checkpoints and will use last one? Creating a model and traing with for example 100 steps, and then restoring it with this method and training another 100 steps is equivalent to training 200 steps from the beggining?

Now my other question is, i trained using model_dir, then created another model with the same model_dir to restore it, but when tried predict with the restored model i got the eerror : ValueError: Either linear_feature_columns or dnn_feature_columns should be defined.
So i had to train 5 steps and then predict, is there a way to restore and predict without having to do this "dummy training" ? Can you get the total count of training steps using .get_variable_value () ? And the last one: can you create a multi-output classifier? like sending a list or array of outputs when you train, and then get a list of the same size when you predict?

@martinwicke
Copy link
Member

@martinwicke martinwicke commented Jul 20, 2016

Right now, there isn't a way to restore and predict without running at least one step of training (which is a missing feature).

You can get the value of global_step (which is the total number of training steps).

We're working on a multi-headed classififer.

@llealgt
Copy link
Author

@llealgt llealgt commented Jul 20, 2016

Thank you guys! This answers will help me on my current task. Greetings!

@michaelisard
Copy link

@michaelisard michaelisard commented Jul 25, 2016

Closing for now. @martinwicke please open a separate issue if you want to track the feature of restoring without predicting.

@IncubatedChun
Copy link

@IncubatedChun IncubatedChun commented Sep 28, 2016

I am using v0.10 and did learn.DNNRegressor(..., model_dir='some path') and then
new_regressor = learn.DNNRegressor('path where the model is') and got the error

tensorflow.contrib.learn.python.learn.estimators._sklearn.NotFittedError: Couldn't find trained model at /var/folders/yf/gdqcvwpd67j98_zn92qy3bl80000gn/T/tmpjp19wmrx.

while doing new_regressor.predict(some data).
I checked the path and the model indeed is saved.

@npakhomova
Copy link

@npakhomova npakhomova commented Nov 29, 2016

Hi, I've faced with the same problem.
The only Estimator that have save/restore methods are TensorFlowEstimator and it is deprecated and restore method throws NotImplementedError
@terrytangyuan Thank you for advising using the same model_dirrectory for restoring model.
But I've faced with problem, that when restoring model from directory - it is also necessary to set (somehow) self._targets_info to initialize target variable in tensorflow.contrib.learn.python.learn.estimators.estimator.Estimator#_get_predict_ops method

I've found the only way how to do it: use classifier.fit method with step=0

So this is how my restore method looks like:

classifier = learn.Estimator(model_fn=myModel,model_dir=modelPath)
classifier.fit(train, target , steps=0)
classifier.predict(textForClassification,  as_iterable=True)

train can be empty, but dimentional must be equal to initial, target - must contain target classes for classification

@jony0917
Copy link
Contributor

@jony0917 jony0917 commented Apr 12, 2017

I have a extended question: how can i restore a DNNClassifier from a checkpoint

@martinwicke
Copy link
Member

@martinwicke martinwicke commented Apr 12, 2017

Just give the directory containing the checkpoint to its constructor.

@rjpg
Copy link

@rjpg rjpg commented Apr 23, 2017

using classifier = learn.DNNClassifier how can we get a saved_model.pb instead of grapg.pbtxt ... ?

In another words using learn.DNNClassifier how can we save the model in python and then use the files in tensorflow for Java ? (using SavedModelBundle.load() )
this functions require for a "saved_model.pb" or saved_model.pbtxt" to be on the directory.

defining a "model_dir" in DNNClassifier in python does not produce a saved_model.pbtxt file it generate a graph.pbtxt file but even if I rename it to saved_model.pbtxt it will not open in tensorflow for java.

I have test saving models in .pbtxt file and the load in Java worked. But using
tensorflow.contrib.learn.DNNClassifier

I dont see how to get a model_saved.pbtxt file to load in Java ...

@martinwicke
Copy link
Member

@martinwicke martinwicke commented Apr 24, 2017

Use the export_savedmodel method to export a SavedModel. Look at, or post to StackOverflow if you have trouble.

@rjpg
Copy link

@rjpg rjpg commented Apr 24, 2017

hello,

the problem using export method is understanding what serving_input_fn is and how to define it.

I still dont know what it is (if there is some documentation about it I would be grateful).

I mange to use export using the following lines (dont know for sure is that is correct ...) :
`
tfrecord_serving_input_fn = tf.contrib.learn.build_parsing_serving_input_fn(layers.create_feature_spec_for_parsing(feature_columns))

classifier.export_savedmodel(export_dir_base="test", serving_input_fn = tfrecord_serving_input_fn,as_text=True)
`

Now the model_saved.pbtxt is loaded in java.

using :

SavedModelBundle bundle= SavedModelBundle.load("/java/workspace/APIJavaSampleCode/tfModels/dnn/ModelSave","serve");

Then I had another problem ... to execute the model we need to pass strings to say what is the input and output "operation".

Someone told me (on youtube) they are "input_example_tensor" and ouput "dnn/multi_class_head/predictions/probabilities". Dont know how they discover this (?).
(again if there is some documentation about it I would be grateful)
So the code I have in java for the moment is this :

`
SavedModelBundle bundle=SavedModelBundle.load("/java/workspace/APIJavaSampleCode/tfModels/dnn/ModelSave","serve");
Session s = bundle.session();

	double[] inputDouble = {1.0,0.7982741870963959,1.0,-0.46270838239235024,0.040320274521029376,0.443451913224413,-1.0,1.0,1.0,-1.0,0.36689718911339564,-0.13577379160035796,-0.5162916256414466,-0.03373651520104648,1.0,1.0,1.0,1.0,0.786999801054777,-0.43856035121103853,-0.8199093927945158,1.0,-1.0,-1.0,-0.1134921695894473,-1.0,0.6420892436196663,0.7871737734493178,1.0,0.6501788845358409,1.0,1.0,1.0,-0.17586627413625022,0.8817194210401085};
	float [] inputfloat=new float[inputDouble.length];
	for(int i=0;i<inputfloat.length;i++)
	{
		inputfloat[i]=(float)inputDouble[i];
	}
	Tensor inputTensor = Tensor.create(new long[] {35}, FloatBuffer.wrap(inputfloat) );
	
	Tensor result = s.runner()
            .feed("input_example_tensor", inputTensor)
            .fetch("dnn/multi_class_head/predictions/probabilities")
            .run().get(0);

`

Now I have the following error on java side :

Exception in thread "main" org.tensorflow.TensorFlowException: Output 0 of type float does not match declared output type string for node _recv_input_example_tensor_0 = _Recv[_output_shapes=[[-1]], client_terminated=true, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=-5952653602839817343, tensor_name="input_example_tensor:0", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/cpu:0"]() at org.tensorflow.Session.run(Native Method) at org.tensorflow.Session.access$100(Session.java:48) at org.tensorflow.Session$Runner.runHelper(Session.java:285) at org.tensorflow.Session$Runner.run(Session.java:235) ...

Any suggestion to overcame this error (I hope the last challenging problem to execute a DNNClassifier, made in python, in java ) ?

thanks in advance

@cejudo
Copy link

@cejudo cejudo commented May 17, 2017

Hi

I've been following the Estimator tutorial https://www.tensorflow.org/extend/estimators and it works, but now that i want to save the model I can´t find a way to do it.
I am using export_savedmodel but i can't generate a serving input function. I've tried doing it this way:

from tensorflow.contrib.layers import create_feature_spec_for_parsing
feature_spec = create_feature_spec_for_parsing(feature_columns)
from tensorflow.contrib.learn.python.learn.utils import input_fn_utils
sif = input_fn_utils.build_parsing_serving_input_fn(feature_spec)
nn.export_savedmodel(export_dir_base='PATH',serving_input_fn=sif)

But i don't know what values to use in feature_columns variable, because the tutorial says that our features are our input data.

Can you help me figuring this out?

@rjpg
Copy link

@rjpg rjpg commented May 17, 2017

on my script I have something like this that I have found on the net (and it works ):

#Save Model into saved_model.pbtxt file (possible to Load in Java)
tfrecord_serving_input_fn = tf.contrib.learn.build_parsing_serving_input_fn(layers.create_feature_spec_for_parsing(feature_columns))  
classifier.export_savedmodel(export_dir_base="test", serving_input_fn = tfrecord_serving_input_fn,as_text=True)

@cejudo
Copy link

@cejudo cejudo commented May 17, 2017

Thanks for the reply

I checked the code lines but in this one

tfrecord_serving_input_fn = tf.contrib.learn.build_parsing_serving_input_fn(layers.create_feature_spec_for_parsing(feature_columns))

What is the value that 'feature_columns' holds?
That's what i don't quite understand. In the classifier example we do have a variable called 'feature_columns' and it is the combination of 'wide_columns + deep_columns'.

Now, for the estimator example we don't have the 'feature_columns' variable. So i tried to use my input data as my features as it says in the tutorial, but that doesn't seem to be working.

I also tried using

feature_columns=tf.contrib.learn.infer_real_valued_columns_from_input(input_data)

but that does not work either. I got the following error:

TypeError: Expected binary or unicode string, got {'': <tf.Tensor 'ParseExample/ParseExample:0' shape=(?, 9) dtype=float32>}

Am I doing something wrong?

@rjpg
Copy link

@rjpg rjpg commented May 17, 2017

feature_columns = [tf.contrib.layers.real_valued_column("", dimension=train_inputs.shape[1])]

it's the size of the inputs

@xav12358
Copy link

@xav12358 xav12358 commented Jan 22, 2018

I also had trouble with save and freeze DNNclassifier model. I try to freeze the model generate in the tensorflow tutorial:
https://www.tensorflow.org/get_started/estimator

The model generate files in /tmp/iris_model:

    checkpoint
    eval
    events.out.tfevents.1516649318.xavier-OMEN-by-HP-Laptop
    graph.pbtxt
    model.ckpt-10.data-00000-of-00001
    model.ckpt-10.index
    model.ckpt-10.meta
    model.ckpt-1.data-00000-of-00001
    model.ckpt-1.index
    model.ckpt-1.meta

When I try to freeze the model with the tools in tensorflow source I get that error:

`    python3 ./tensorflow/tensorflow/python/tools/freeze_graph.py /tmp/iris_model checkpoint '' doesn't exist!
`
@JackKZ
Copy link

@JackKZ JackKZ commented Apr 4, 2018

Does anyone know how to restore a trained DNNclassifier and use it on a new data set please? I tried this

x={"x": np.array(test_set.data)}

with tf.Session() as sess:    
    saver = tf.train.import_meta_graph(base_dir+"Neural Net Final Dictionary/tmp checkpoints/"+'model.ckpt-5000.meta')
    saver.restore(sess,tf.train.latest_checkpoint(base_dir+"Neural Net Final Dictionary/tmp checkpoints/"))
    graph = tf.get_default_graph()
    sess.run(graph,feed_dict={'x:0':np.array(test_set.data)})

I got the error saying

TypeError: Cannot interpret feed_dict key as Tensor: The name 'x:0' refers to a Tensor which does not exist. The operation, 'x', does not exist in the graph.”

Does anyone know what's wrong here?

Thanks!

@martinwicke
Copy link
Member

@martinwicke martinwicke commented Apr 4, 2018

@JackKZ
Copy link

@JackKZ JackKZ commented Apr 9, 2018

Thanks a lot Martin! That solves my problem perfectly!

@sharmavipin1608
Copy link

@sharmavipin1608 sharmavipin1608 commented Jul 3, 2018

I went through past comments, but I'm still having trouble trying to save and restore a trained model from this tutorial (https://www.tensorflow.org/get_started/get_started_for_beginners) to be used for prediction.

Could anyone please show the code on how to do it?

@monk1337
Copy link

@monk1337 monk1337 commented Sep 7, 2018

I was facing many issues while restoring and saving , finally here is working Ipython notebook with data
https://github.com/monk1337/DNNClassifier-example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.