# TensorFlow. Word Generation with LSTM (existing model)

Word generation in NLP involves creating meaningful text. Google TensorFlow, an open-source machine learning framework, offers tools to build and train models for this. This notebook covers the basics, capabilities, applications, and steps to develop word generation models using TensorFlow.

![](./assets/cover.jpg)

Blog Post: [TensorFlow. Word Generation with LSTM](https://vitalyzhukov.com/en/tensorflow-word-generation-with-lstm)

# Prerequisites

To work with TensorFlow models, you will need two libraries open-source Python libraries: NumPy (Numerical Python) and TensorFlow.

In [None]:
!pip install tensorflow
!pip install numpy
!pip install matplotlib

Let's import the required packages:

In [None]:
# Import TensorFlow
import tensorflow as tf

# Import numpy
import numpy as np

Load existing model

In [None]:
model = tf.keras.models.load_model("./model/model_word_generation.keras")

Load tokenizer instance 

In [None]:
tokenizer = tf.keras.preprocessing.text.tokenizer_from_json(open("./model/model_tokenizer.json").read())

In [None]:
def predict_next_character(prefix, crazy_index:int):
    """Predict next characters

    :param prefix: Existing part of the word
    :param crazy_index: The number of predicted characters is used to choose one.
    :return: Predicted character
    """
    encoded = tokenizer.texts_to_sequences([prefix])
    encoded = tf.keras.utils.pad_sequences(encoded, maxlen=50, padding="pre")
    predicted_characters = np.asarray(model.predict(encoded, verbose=0, batch_size=1)[0]).astype('float64')
    
    if crazy_index is None or crazy_index == 0:
        return np.argmax(predicted_characters)
    else:
        if crazy_index > len(predicted_characters) : crazy_index = len(predicted_characters)
        
        # getting top {crazy_index} possible characters
        candidate_args = np.argsort(predicted_characters, axis=0)[-crazy_index:]
        probas = np.take(predicted_characters, candidate_args)
        
        # randomly get one the top possible characters
        probas = np.random.multinomial(1, np.exp(np.arctan(probas))/np.sum(np.exp(np.arctan(probas))),1)
        
        return candidate_args[np.argmax(probas)]

In [None]:
def generate_words(prefix, no_words:int, crazy_index:int):
    """Generate words

    :param prefix: Existing part of the word
    :param no_words: Number of words to generate
    :param crazy_index: The number of predicted characters is used to choose one.
    :return: List of generated words
    """
    max_text_lenght = 20
    meta_token = tokenizer.word_index["<END>"]  
    words = []
    
    for _ in range(no_words):
        word_prefix = prefix
        for _ in range(max_text_lenght):
            predicted_character = predict_next_character(word_prefix, crazy_index)
            
            # stop prediction if the next character is the meta token presenting end of word
            if predicted_character == meta_token:
                break
            
            # convert tensor to character
            predicted_char = tokenizer.sequences_to_texts([[predicted_character]])
            
            # append the character to the result word
            word_prefix = word_prefix + predicted_char[0]
                
        words.append(word_prefix)

    return words

In [None]:
generate_words("", 10, 3)