# PROJECT DESCRIPTION

1. Import the Required Libraries
    * random: For generating random numbers.
    -numpy (np): For numerical operations and array manipulation.
    -tensorflow (tf): Deep learning framework for building and training models.
    -Sequential: A linear stack of layers used to build the model.
    -LSTM, Dense, and Activation: Layers to be used in the model.
    -RMSprop: An optimizer for training the model.
    
    
2. Download and Preprocess the Text
    * get_file() used to download Shakespeare dataset from URL and save to file.
    -Text file then opened, decoded using UTF-8 encoding, converted lowercase.
    -Portion of text (300,000 to 800,000 characters) for faster processing.
    
    
3. Create Character Mappings
    * Characters in text identified and sorted to create list of characters.
    -char_to_index and index_to_char, created to map characters indices.
    
    
4. Define Constants
    * SEQUENCE_LENGTH indicate length of input sequences for training the model.
    -STEP_SIZE, indicated step size between consecutive sequences.
    
    
5. Generate Training Sequences and Labels
    * Text divided into sequences of length SEQUENCE_LENGTH with STEP_SIZE.
    -For each sequence, the corresponding next character is extracted.
    -Input x and y sequences created and initialized as arrays of zeros.
    -Characters in each sequence are one-hot encoded and stored in the x array.
    -Next characters are one-hot encoded and stored in the y array.
   
   
6. Build the Model
    * A sequential model is created.
    -An LSTM layer with 128 units is added as the input layer.
    -Dense layer with # of units equal to number of unique characters is added.
    -Activation layer(softmax) obtain probability distribution over characters.
   
   
7. Compile the Model
    * Compiled categorical cross-entropy loss function and RMSprop optimizer with learning rate of 0.01.
   
   
8. Train the Model
    * Trained using the fit() function with input sequences x and labels y.
    -Training is performed in batches of size 256 for 4 epochs.
    
    
9. Load the Trained Model
    * Loaded from file "textgenerator.model" using tf.keras.models.load_model().
    
    
10. Define the "sample()" function
    * Function takes predicted probabilities (preds) and temp value as input.
    -Probabilities are converted to a log scale and divided by the temperature.
    -Softmax function is applied to obtain the new probabilities.
    -"multinomial()" from distribution and returns index of selected character.
    
    
11. Define the "generate_text()" function
    * Function takes a length and temperature as input.
    -Random starting index within the text range is chosen.
    -Initial sequence of length SEQUENCE_LENGTH is selected.
    -Predicts next character based on input sequence and sampled character is appended to generated text.
    -First character removed from input sequence to shift window, and process continues iteratively.
    -Generated text is returned.
    
    
12. Generate the Text with Different Temperatures
    * Array of temperatures [0.2, 0.4, 0.6, 0.8, 1.0] and fixed length (100) are defined.
    -For each temperature, generate_text() function is called and output is printed to console. 

In [114]:
#Import libraries
import random
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Activation, Dropout
from tensorflow.keras.optimizers import Adam

In [None]:
#Download and preprocess the text
filepath = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
text = open(filepath, 'rb').read().decode(encoding='utf-8').lower()
text = text[300000:800000]

In [None]:
#Create character mappings
characters = sorted(set(text))
char_to_index = {c: i for i, c in enumerate(characters)}
index_to_char = {i: c for i, c in enumerate(characters)}

In [None]:
#Define Constants
SEQUENCE_LENGTH = 40
STEP_SIZE = 3

In [None]:
#Generate training sequences and labels
sentences = [text[i:i + SEQUENCE_LENGTH] for i in range(0, len(text) - SEQUENCE_LENGTH, STEP_SIZE)]
next_characters = [text[i + SEQUENCE_LENGTH] for i in range(0, len(text) - SEQUENCE_LENGTH, STEP_SIZE)]

x = np.zeros((len(sentences), SEQUENCE_LENGTH, len(characters)), dtype=np.bool_)
y = np.zeros((len(sentences), len(characters)), dtype=np.bool_)

for i, sentence in enumerate(sentences):
    x[i, np.arange(SEQUENCE_LENGTH), [char_to_index[char] for char in sentence]] = 1
    y[i, char_to_index[next_characters[i]]] = 1

In [None]:
#Build the model
model = Sequential([
    LSTM(128, input_shape=(SEQUENCE_LENGTH, len(characters)), return_sequences=True),
    Dropout(0.2),
    LSTM(128),
    Dropout(0.2),
    Dense(len(characters)),
    Activation('softmax')
])

model.compile(loss='categorical_crossentropy', optimizer=RMSprop(learning_rate=0.001))

In [None]:
#Train the model
model.fit(x, y, batch_size = 256, epochs = 6)

In [None]:
#Load the trained model
model = tf.keras.models.load_model('textgenerator.model')

def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

def generate_text(length, temperature):
    start_index = random.randint(0, len(text) - SEQUENCE_LENGTH - 1)
    generated = text[start_index: start_index + SEQUENCE_LENGTH]
    for _ in range(length):
        x = np.zeros((1, SEQUENCE_LENGTH, len(characters)))
        for t, character in enumerate(generated):
            x[0, t, char_to_index[character]] = 1
    
        predictions = model.predict(x, verbose=0)[0]
        next_index = sample(predictions, temperature)
        next_character = index_to_char[next_index]
    
        generated += next_character
        generated = generated[1:]

    return generated

In [None]:
#Generate text with different temperatures
temperatures = [0.2, 0.4, 0.6, 0.8, 1.0]
length = 100

for temp in temperatures:
    print(f'---------Temperature {temp}---------')
    print(generate_text(length, temp))