-
Couldn't load subscription status.
- Fork 74.9k
Fix cuDNN LSTM implementation selection with LoadSavedModel C++ API. #56525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cuDNN LSTM implementation selection with LoadSavedModel C++ API. #56525
Conversation
|
Hi @penpornk Can you please review this PR ? Thank you! |
|
Hi @ezhulenev Can you please review this PR ? Thank you! |
|
@API92 Can you please check build failures. Thank you! |
|
@gbaned Fixed. |
|
Hi @ezhulenev Can you please review this PR ? Thank you! |
|
Adding @reedwm since @ezhulenev is on vacation for a while. |
|
Hi @API92 , Can you please resolve the conflicts? Thank you! |
95bdb1d to
e87a7c7
Compare
If I save tf.keras.layers.LSTM layer with _could_use_gpu_kernel=True into the SavedModel format with tf.saved_model.save, then cuDNN kernel is used when I load this model with tf.saved_model.load and it works fast. But if I load this model from C++ with tensorflow::LoadSavedModel function, then cuDNN kernel isn't used and it is slow.
Here is colab demonstrating the issue https://colab.research.google.com/drive/16WN0sqOoL37M7-5XMhGb-irkRX7fh503?usp=sharing . If model is loaded with tf.saved_model.load, then tf.keras.layers.LSTM and tf.raw_ops.CudnnLSTM layers both takes about 100 ms. But if model is loaded with LoadSavedModel from C++, then tf.keras.layers.LSTM takes about 275 ms, while tf.raw_ops.CudnnLSTM takes 100 ms.
There were some problems in FunctionOptimizer, ImplementationSelector and MetaOptimizer optimizers: