-
Notifications
You must be signed in to change notification settings - Fork 617
Description
Describe the feature and the current behavior/state.
Hi TFA team, the current seq2seq beam search decoder implementation takes in beam_width & maximum_iterations during decoder initialization init()
instead of dynamic calls call
. This makes it difficult for tuning beam_width when there's a trained model. However, it's a common request that in practice, when we have a trained model, we would like to tune the beam_width & maximum_iterations for best inference latency and performance without retraining or tweaking the model.
Can you support this request? Or is there any way to fulfill the above request in a simple way?
Relevant information
- Are you willing to contribute it (yes/no): no
- Are you willing to maintain it going forward? (yes/no): no
- Is there a relevant academic paper? (if so, where): no
- Is there already an implementation in another framework? (if so, where): no
- Was it part of tf.contrib? (if so, where): no
Which API type would this fall under (layer, metric, optimizer, etc.)
Addons - Seq2seq
Who will benefit with this feature?
TensorFlow Addons users who want to tune beam_width with input params beam_width during inference time (TF serving, JNI etc.), instead of tweaking models
Any other info.