## Tugas Praktikum

Gunakan tf.GradientTape untuk men track nilai gradient. Anda dapat mempelajari lebih lanjut tentang pendekatan ini dengan membaca eager execution guide.
Prosedurnya adalah :
1. Jalankan Model dan hitung loss dengan tf.GradientTape.
2. Hitung update dan terapkan pada model dengan optimizer

### Mengulang Praktikum 2

In [1]:
import tensorflow as tf
import numpy as np
import os
import time

In [2]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt


In [3]:
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')

print(f'Length of text: {len(text)} characters')

Length of text: 1115394 characters


In [4]:
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



In [5]:
vocab = sorted(set(text))
print(f'{len(vocab)} unique characters')

65 unique characters


In [6]:
example_texts = ['abcdefg', 'xyz']
chars = tf.strings.unicode_split(example_texts, input_encoding='UTF-8')
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

In [7]:
ids_from_chars = tf.keras.layers.StringLookup(vocabulary=list(vocab), mask_token=None)

In [8]:
ids = ids_from_chars(chars)

In [9]:
chars_from_ids = tf.keras.layers.StringLookup(vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

In [10]:
chars = chars_from_ids(ids)
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

In [11]:
tf.strings.reduce_join(chars, axis=-1).numpy()

array([b'abcdefg', b'xyz'], dtype=object)

In [12]:
def text_from_ids(ids):
    return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)

In [13]:
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8'))
all_ids

<tf.Tensor: shape=(1115394,), dtype=int64, numpy=array([19, 48, 57, ..., 46,  9,  1])>

In [14]:

ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)

In [15]:
for ids in ids_dataset.take(10):
    print(chars_from_ids(ids).numpy().decode('utf-8'))

F
i
r
s
t
 
C
i
t
i


In [16]:
seq_length = 100

In [17]:
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True)

for seq in sequences.take(1):
  print(chars_from_ids(seq))

tf.Tensor(
[b'F' b'i' b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':'
 b'\n' b'B' b'e' b'f' b'o' b'r' b'e' b' ' b'w' b'e' b' ' b'p' b'r' b'o'
 b'c' b'e' b'e' b'd' b' ' b'a' b'n' b'y' b' ' b'f' b'u' b'r' b't' b'h'
 b'e' b'r' b',' b' ' b'h' b'e' b'a' b'r' b' ' b'm' b'e' b' ' b's' b'p'
 b'e' b'a' b'k' b'.' b'\n' b'\n' b'A' b'l' b'l' b':' b'\n' b'S' b'p' b'e'
 b'a' b'k' b',' b' ' b's' b'p' b'e' b'a' b'k' b'.' b'\n' b'\n' b'F' b'i'
 b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':' b'\n' b'Y'
 b'o' b'u' b' '], shape=(101,), dtype=string)


In [18]:
for seq in sequences.take(5):
    print(text_from_ids(seq).numpy())

b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
b'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
b"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
b"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
b'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'


In [19]:
def split_input_target(sequence):
    input_text = sequence[:-1]
    target_text = sequence[1:]
    return input_text, target_text

In [20]:
split_input_target(list("Tensorflow"))

(['T', 'e', 'n', 's', 'o', 'r', 'f', 'l', 'o'],
 ['e', 'n', 's', 'o', 'r', 'f', 'l', 'o', 'w'])

In [21]:
dataset = sequences.map(split_input_target)

In [22]:
for input_example, target_example in dataset.take(1):
    print("Input :", text_from_ids(input_example).numpy())
    print("Target:", text_from_ids(target_example).numpy())

Input : b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target: b'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '


In [23]:
# Batch size
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = (
    dataset
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE))

dataset

<_PrefetchDataset element_spec=(TensorSpec(shape=(64, 100), dtype=tf.int64, name=None), TensorSpec(shape=(64, 100), dtype=tf.int64, name=None))>

In [24]:
# Length of the vocabulary in StringLookup Layer
vocab_size = len(ids_from_chars.get_vocabulary())

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

In [25]:
class MyModel(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__(self)
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(rnn_units,
                                   return_sequences=True,
                                   return_state=True)
    self.dense = tf.keras.layers.Dense(vocab_size)

  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs
    x = self.embedding(x, training=training)
    if states is None:
      states = self.gru.get_initial_state(x)
    x, states = self.gru(x, initial_state=states, training=training)
    x = self.dense(x, training=training)

    if return_state:
      return x, states
    else:
      return x

In [26]:
model = MyModel(
    vocab_size=vocab_size,
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

In [27]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 66) # (batch_size, sequence_length, vocab_size)


In [28]:
model.summary()

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       multiple                  16896     
                                                                 
 gru (GRU)                   multiple                  3938304   
                                                                 
 dense (Dense)               multiple                  67650     
                                                                 
Total params: 4022850 (15.35 MB)
Trainable params: 4022850 (15.35 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [29]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()

In [30]:
sampled_indices

array([33, 17, 27, 11, 65,  2, 39, 40, 35, 44, 54, 21, 42, 48, 35, 46, 12,
       20, 29, 34, 64, 39, 65,  8, 57, 45, 40, 56, 24, 16, 15, 46, 36, 13,
       12, 58, 42, 47, 55, 27, 21,  7, 34, 54, 34, 48, 35, 50,  5,  8, 15,
       27, 50, 22, 10, 58, 37, 22,  6, 52, 60, 17, 12, 49, 31, 64, 65, 47,
       39, 49, 12,  3, 10,  2, 12, 46, 42, 59, 50,  8, 13, 61, 37,  7, 53,
       61,  8, 45, 55, 36, 17,  6,  1, 51,  9, 28, 40, 52, 62, 43])

In [31]:
print("Input:\n", text_from_ids(input_example_batch[0]).numpy())
print()
print("Next Char Predictions:\n", text_from_ids(sampled_indices).numpy())

Input:
 b'ORTHUMBERLAND:\nNor I.\n\nCLIFFORD:\nCome, cousin, let us tell the queen these news.\n\nWESTMORELAND:\nFare'

Next Char Predictions:
 b"TDN:z ZaVeoHciVg;GPUyZz-rfaqKCBgW?;schpNH,UoUiVk&-BNkI3sXI'muD;jRyzhZj;!3 ;gctk-?vX,nv-fpWD'\nl.Oamwd"


In [32]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)

In [33]:
example_batch_mean_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("Mean loss:        ", example_batch_mean_loss)

Prediction shape:  (64, 100, 66)  # (batch_size, sequence_length, vocab_size)
Mean loss:         tf.Tensor(4.1897697, shape=(), dtype=float32)


In [34]:
tf.exp(example_batch_mean_loss).numpy()

66.00759

In [35]:
model.compile(optimizer='adam', loss=loss)

In [36]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

In [37]:
EPOCHS = 20

history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20



### Praktikum Tugas

In [38]:
class CustomTraining(MyModel):
    @tf.function
    def train_step(self, inputs):
        # Memisahkan inputs menjadi data dan label
        inputs, labels = inputs

        # Menggunakan GradientTape untuk melacak operasi perhitungan gradien
        with tf.GradientTape() as tape:
            # Mendapatkan prediksi dari model
            predictions = self(inputs, training=True)

            # Menghitung nilai kerugian (loss)
            loss = self.loss(labels, predictions)

        # Menghitung gradien terhadap parameter-model
        grads = tape.gradient(loss, model.trainable_variables)

        # Menggunakan optimizer untuk menerapkan gradien ke parameter-model
        self.optimizer.apply_gradients(zip(grads, model.trainable_variables))

        # Mengembalikan nilai loss
        return {'loss': loss}

In [39]:
# Membuat instance dari kelas CustomTraining
model = CustomTraining(
    vocab_size=len(ids_from_chars.get_vocabulary()), # Menggunakan ukuran vokabular dari layer ids_from_chars
    embedding_dim=embedding_dim,                      # Dimensi vektor embedding
    rnn_units=rnn_units                               # Jumlah unit dalam lapisan GRU
)

In [40]:
# Mengkompilasi model
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)  # Fungsi loss Sparse Categorical Crossentropy
)

In [41]:
# Melatih model dengan dataset dengan satu epoch
model.fit(dataset, epochs=1)



<keras.src.callbacks.History at 0x7807f31c67d0>

Atau jika ingin lebih mengetahui dalamnya, kita bisa membuat custom training loop sendiri:

In [42]:
EPOCHS = 20  # Jumlah epoch pelatihan

mean = tf.metrics.Mean()  # Objek Mean untuk menghitung rata-rata kerugian

for epoch in range(EPOCHS):
    start = time.time()  # Waktu awal epoch

    mean.reset_states()  # Mengatur ulang state objek Mean untuk epoch baru

    for (batch_n, (inp, target)) in enumerate(dataset):
        # Melakukan satu langkah pelatihan (train step) pada model untuk setiap batch
        logs = model.train_step([inp, target])

        # Mengupdate state objek Mean dengan nilai kerugian dari batch tersebut
        mean.update_state(logs['loss'])

        # Mencetak informasi setiap 50 batch
        if batch_n % 50 == 0:
            template = f"Epoch {epoch+1} Batch {batch_n} Loss {logs['loss']:.4f}"
            print(template)

    # Setiap 5 epoch, simpan weights model
    if (epoch + 1) % 5 == 0:
        model.save_weights(checkpoint_prefix.format(epoch=epoch))

    print()
    print(f'Epoch {epoch+1} Loss: {mean.result().numpy():.4f}')
    print(f'Time taken for 1 epoch {time.time() - start:.2f} sec')
    print("_" * 80)

# Simpan weights model setelah seluruh pelatihan selesai
model.save_weights(checkpoint_prefix.format(epoch=epoch))


Epoch 1 Batch 0 Loss 2.1697
Epoch 1 Batch 50 Loss 2.1011
Epoch 1 Batch 100 Loss 1.9801
Epoch 1 Batch 150 Loss 1.8832

Epoch 1 Loss: 1.9950
Time taken for 1 epoch 13.67 sec
________________________________________________________________________________
Epoch 2 Batch 0 Loss 1.8333
Epoch 2 Batch 50 Loss 1.7817
Epoch 2 Batch 100 Loss 1.6886
Epoch 2 Batch 150 Loss 1.6303

Epoch 2 Loss: 1.7190
Time taken for 1 epoch 20.48 sec
________________________________________________________________________________
Epoch 3 Batch 0 Loss 1.6056
Epoch 3 Batch 50 Loss 1.5730
Epoch 3 Batch 100 Loss 1.5864
Epoch 3 Batch 150 Loss 1.5282

Epoch 3 Loss: 1.5570
Time taken for 1 epoch 10.87 sec
________________________________________________________________________________
Epoch 4 Batch 0 Loss 1.4771
Epoch 4 Batch 50 Loss 1.4450
Epoch 4 Batch 100 Loss 1.4089
Epoch 4 Batch 150 Loss 1.4176

Epoch 4 Loss: 1.4564
Time taken for 1 epoch 10.89 sec
_____________________________________________________________________

### Jalankan kode diatas dan sebutkan perbedaanya dengan praktikum 2?
Perbedaan antara kode di atas dan praktikum 2 terletak pada tingkat kendali yang diberikan train model. Praktikum 2 menggunakan pendekatan "teacher-forcing", yang berarti selama training, model diberi masukan yang sebenarnya (ground truth) sebagai langkah berikutnya dalam urutan, bukan hasil prediksi model. Pendekatan ini digunakan untuk mencegah kesalahan akumulasi yang dapat terjadi ketika prediksi yang salah diberikan sebagai masukan berikutnya.

Di sisi lain, kode di atas memberikan lebih banyak kendali dalam hal pelatihan. Model dilatih secara ekplisit menggunakan metode `train_step` yang memanfaatkan `tf.GradientTape` untuk melacak gradien dan menerapkan update pada model. Hal ini memungkinkan penggunaan pendekatan yang lebih fleksibel terhadap pelatihan, di mana setiap langkah pelatihan dapat diatur dengan lebih rinci. Selain itu, metode tersebut juga memungkinkan penyimpanan checkpoint model pada interval waktu tertentu.