Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to speed up text generation in TensorFlow reference example notebook? #39654

Closed
zredlined opened this issue May 18, 2020 · 9 comments · Fixed by tensorflow/text#626
Closed
Assignees
Labels
comp:gpu GPU related issues TF 2.1 for tracking issues in 2.1 release type:performance Performance Issue

Comments

@zredlined
Copy link

The tensorflow official example for text generation (https://github.com/tensorflow/docs/blob/master/site/en/tutorials/text/text_generation.ipynb) runs in a loop as defined below. The text generation feels slow, and according to NVTOP only uses a fraction of the available GPU resources (15-20%).

def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 1000

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

Do you have any suggestions on how I can speed this up? Or parallelize it by generating multiple examples at the same time? A quick look at cprofiler shows that 90% of the time is spent on the single line predictions = model(input_eval), so this is where we'd most likely find a speedup. Would appreciate any advice, and happy to submit a PR if I'm able to speed it up!

System information

Describe the current behavior
Text generation works fine, but feels slow. Using NVTOP it shows only 15% GPU utilization on average.

Describe the expected behavior
Hoping to speed up text generation by better leveraging the GPU

Standalone code to reproduce the issue
This issue can be replicated by running the standard TensorFlow text generation tutorial on Google Colaboratory with GPU

Other info / logs Include any logs or source code that would be helpful to

Screen Shot 2020-05-18 at 10 20 17 AM

@zredlined zredlined added the type:performance Performance Issue label May 18, 2020
@Saduf2019 Saduf2019 added the TF 2.1 for tracking issues in 2.1 release label May 19, 2020
@Saduf2019
Copy link
Contributor

@zredlined
Can you please share simple stand alone code to replicate the issue or if possible share a colab gist for us to a analyse the error

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label May 19, 2020
@zredlined
Copy link
Author

zredlined commented May 19, 2020

@Saduf2019 It is not an error, the code just does not efficiently leverage the GPU by default and I'm hoping to find some advice on speeding it up. You can run the colab here:

https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/text/text_generation.ipynb

The line I'm hoping to speed up is in the generate_text() function above:
predictions = model(input_eval)

@Saduf2019 Saduf2019 added comp:gpu GPU related issues and removed stat:awaiting response Status - Awaiting response from author labels May 20, 2020
@zredlined
Copy link
Author

@jvishnuvardhan It seems to me the challenge is getting parallelization for the GPU, while maintaining statefulness of the LSTM to predict the next character in the sentence.

Perhaps I can batch several lines to generate at once into the model.predict() while maintaining individual LSTM state per line in the batch? Or load multiple models as workers? Any suggestions or pseudocode would be much appreciated!

@zredlined
Copy link
Author

Any suggestions here? It would be acceptable to generate multiple texts simultaneously to more effectively use the GPU. Any insights would be appreciated

@gowthamkpr
Copy link

@zredlined Try batching several lines to generate at once into the model.predict() while maintaining individual LSTM state per line in the batch and let us know if it speeds up on no.

@gowthamkpr gowthamkpr added the stat:awaiting response Status - Awaiting response from author label May 31, 2020
@zredlined
Copy link
Author

@gowthamkpr thanks! I can't figure out how to maintain LSTM state per line in the batch. The model.predict() appears to just update a single LSTM state after processing each line in the batch. Any suggestions on how to do this?

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Jun 3, 2020
@gowthamkpr gowthamkpr assigned sanjoy and unassigned gowthamkpr Jun 8, 2020
@gowthamkpr gowthamkpr added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jun 8, 2020
@MarkDaoust MarkDaoust self-assigned this Jun 24, 2020
@MarkDaoust
Copy link
Member

MarkDaoust commented Jun 24, 2020

I'm working on this, for other reasons, but I'll try to fix this at the same time.
It may take a little while to land, but wrapping that in a tf.function, and batching the inputs should give a good speedup.

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jun 26, 2020
@Saduf2019
Copy link
Contributor

@zredlined
Could you please check on tf 2,4,1 and let us know if you still face this issue.

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Apr 29, 2021
@MarkDaoust
Copy link
Member

I got the tf.function implementation working in that tutorial.

The tf.function only runs one step at a time, so it's still not ideal.

In this commit I fixed NMT-with attention to tf.function compile the whole loop, with batched-inputs. That should be even faster.

tensorflow/docs@9e18593

That commit got rolled-back because of 2.4/2.5 incompatibilities, but I'm planning to resubmit it as soon as tf 2.5 is released.

@Saduf2019 Saduf2019 removed the stat:awaiting response Status - Awaiting response from author label Apr 30, 2021
tf-text-github-robot pushed a commit to tensorflow/text that referenced this issue May 25, 2021
For TF2.5

- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes: tensorflow/tensorflow#38248
Fixes: tensorflow/tensorflow#39654
See also: tensorflow/tensorflow#49237
PiperOrigin-RevId: 370250185
tf-text-github-robot pushed a commit to tensorflow/text that referenced this issue May 25, 2021
For TF2.5

- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes: tensorflow/tensorflow#38248
Fixes: tensorflow/tensorflow#39654
See also: tensorflow/tensorflow#49237
PiperOrigin-RevId: 370250185
tf-text-github-robot pushed a commit to tensorflow/text that referenced this issue May 25, 2021
For TF2.5

- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes: tensorflow/tensorflow#38248
Fixes: tensorflow/tensorflow#39654
See also: tensorflow/tensorflow#49237
PiperOrigin-RevId: 370250185
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu GPU related issues TF 2.1 for tracking issues in 2.1 release type:performance Performance Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants