# Fine-Tuning of GPT-2

Large Language Models (LLMs) have been shown to be effective at a variety of NLP tasks. An LLM is first pre-trained on a large corpus of text in a self-supervised fashion. Pre-training helps LLMs learn general-purpose knowledge, such as statistical relationships between words. An LLM can then be fine-tuned on a downstream task of interest (such as sentiment analysis).

See this: <https://keras.io/examples/nlp/parameter_efficient_finetuning_of_gpt2_with_lora/>

In [3]:
import os

os.environ["KERAS_BACKEND"] = "tensorflow"

import keras_nlp
import keras
import matplotlib.pyplot as plt
import tensorflow as tf
import time

keras.mixed_precision.set_global_policy("mixed_float16")

The dtype policy mixed_float16 may run slowly because this machine does not have a GPU. Only Nvidia GPUs with compute capability of at least 7.0 run quickly with mixed_float16.


In [4]:
from tensorflow.python.client import device_lib

def is_compatible_gpu_available():
    devices = device_lib.list_local_devices()
    compatible_gpu = any("compute capability: 7" in d.physical_device_desc for d in devices if d.device_type == "GPU")
    return compatible_gpu

if is_compatible_gpu_available():
    keras.mixed_precision.set_global_policy("mixed_float16")
else:
    print("Compatible GPU not found. Using default precision.")


Compatible GPU not found. Using default precision.


In [5]:
# General hyperparameters
BATCH_SIZE = 32
NUM_BATCHES = 500
EPOCHS = 1  # Can be set to a higher value for better results
MAX_SEQUENCE_LENGTH = 128
MAX_GENERATION_LENGTH = 200

GPT2_PRESET = "gpt2_base_en"

# LoRA-specific hyperparameters
RANK = 4
ALPHA = 32.0

In [6]:
import requests
import tensorflow as tf

# URL of your Shakespeare data
data_url = 'https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt'

# Download the Shakespeare dataset
response = requests.get(data_url)
shakespeare_text = response.text

# Example: Print the first 500 characters to verify
print(shakespeare_text[:500])


First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.

All:
We know't, we know't.

First Citizen:
Let us kill him, and we'll have corn at our own price.
Is't a verdict?

All:
No more talking on't; let it be done: away, away!

Second Citizen:
One word, good citizens.

First Citizen:
We are accounted poor


In [7]:
# Split the text into lines
documents = shakespeare_text.split('\n')



In [8]:
# Create a dataset of documents
documents_ds = tf.data.Dataset.from_tensor_slices(documents)

In [9]:
BATCH_SIZE = 16  # Adjust based on your setup
NUM_BATCHES = 100  # Define how many batches you want to take for training

train_ds = (
    documents_ds
    .batch(BATCH_SIZE)
    .cache()
    .prefetch(tf.data.AUTOTUNE)
)

train_ds = train_ds.take(NUM_BATCHES)


In [10]:
def generate_text(model, input_text, max_length=200):
    start = time.time()

    output = model.generate(input_text, max_length=max_length)
    print("\nOutput:")
    print(output)

    end = time.time()
    print(f"Total Time Elapsed: {end - start:.2f}s")

In [44]:
import tensorflow as tf

def get_optimizer_and_loss():
    # Use the legacy version of the Adam optimizer for better performance on M1/M2 Macs
    optimizer = tf.keras.optimizers.legacy.Adam(
        learning_rate=5e-5
    )
    
    # Continue using the SparseCategoricalCrossentropy loss as before
    loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    
    return optimizer, loss




In [29]:
preprocessor = keras_nlp.models.GPT2CausalLMPreprocessor.from_preset(
    "gpt2_base_en",
    sequence_length=MAX_SEQUENCE_LENGTH,
)
gpt2_lm = keras_nlp.models.GPT2CausalLM.from_preset(
    "gpt2_base_en", preprocessor=preprocessor
)

gpt2_lm.summary()

In [46]:
gpt2_lm.compile(
    optimizer='adam',  # Specify optimizer by string, letting Keras handle instantiation.
    loss='sparse_categorical_crossentropy',  # Similarly, specify loss by string if applicable.
    metrics=['accuracy']
)


In [49]:
generate_text(gpt2_lm, "All the bettle dreene, for To his like thou thron!", max_length=MAX_GENERATION_LENGTH)



Output:
All the bettle dreene, for To his like thou thron!

The man that is to be, the one to come.


The man that is to be, the one to come.


To his like thou hast!


The man that is to be!


And he shall be with you in heaven,

and with him in earth,

in the sea, and with him in heaven.


And he shall be with thee in heaven,

and with thee in earth,

in the sea, in the sea, and in heaven.


And he shall be with thee in heaven,

and with thee in earth,

in the sea, in the sea, in the heaven.


And thou shall be with me,

in my power and my glory,


and with me, in my glory,


And with me, in my power and my glory,
Total Time Elapsed: 11.31s


In [50]:
gpt2_lm.fit(train_ds, epochs=EPOCHS)


2024-04-05 18:34:30.253535: W tensorflow/compiler/tf2xla/kernels/assert_op.cc:38] Ignoring Assert operator compile_loss/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal_1/Assert/Assert


[1m 83/100[0m [32m━━━━━━━━━━━━━━━━[0m[37m━━━━[0m [1m1:31[0m 5s/step - accuracy: 0.0048 - loss: 0.6715

2024-04-05 18:42:14.558561: W tensorflow/core/kernels/data/cache_dataset_ops.cc:854] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m565s[0m 5s/step - accuracy: 0.0044 - loss: 0.6736


<keras_core.src.callbacks.history.History at 0x3c3079250>

In [51]:
generate_text(gpt2_lm, "All the bettle dreene, for To his like thou thron!", max_length=MAX_GENERATION_LENGTH)


Output:
All the bettle dreene, for To his like thou thron! on on we then on we we we we we then in on on on, then in we, on in we, in then we in then in in, on in, on on, in on,,, then, on on on we on in on then we we we we then, on we we then give on we on we, we, on on on we, we then give on give, on then,, in on on on on, on on we we, we on then, on give give we, then on on, we we we then on on we, on on on,,, then we, on on, we on on,, on, on on on on on on we we in, then then we we we we then on, on, on,, we we on then, on, we on, on we on,, on, on we on on then
Total Time Elapsed: 11.05s


We'll be back to this later with a more in-depth look.  The env is important.  Check your python env html for more details