## Tugas Praktikum

Gunakan tf.GradientTape untuk men track nilai gradient. Anda dapat mempelajari lebih lanjut tentang pendekatan ini dengan membaca eager execution guide.
Prosedurnya adalah :
1. Jalankan Model dan hitung loss dengan tf.GradientTape.
2. Hitung update dan terapkan pada model dengan optimizer

### praktikum 2

In [1]:
import tensorflow as tf 
import numpy as np 
import os 
import time 

In [2]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt


In [3]:
text = open(path_to_file, 'rb').read().decode(encoding='utf-8') 

print(f'Length of text: {len(text)} characters')

Length of text: 1115394 characters


In [4]:
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



In [5]:
vocab = sorted(set(text)) 
print(f'{len(vocab)} unique characters') 

65 unique characters


In [6]:
example_texts = ['abcdefg', 'xyz']
chars = tf.strings.unicode_split(example_texts, input_encoding='UTF-8')
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

In [7]:
ids_from_chars = tf.keras.layers.StringLookup(vocabulary=list(vocab), mask_token=None)

In [8]:
ids = ids_from_chars(chars)

<tf.RaggedTensor [[40, 41, 42, 43, 44, 45, 46], [63, 64, 65]]>

In [9]:
chars_from_ids = tf.keras.layers.StringLookup(vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None)

In [10]:
chars = chars_from_ids(ids)
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

In [11]:
tf.strings.reduce_join(chars, axis=-1).numpy()

array([b'abcdefg', b'xyz'], dtype=object)

In [12]:
def text_from_ids(ids):
    return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)

In [13]:
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8'))
all_ids

<tf.Tensor: shape=(1115394,), dtype=int64, numpy=array([19, 48, 57, ..., 46,  9,  1])>

In [14]:

ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids) 

In [15]:
for ids in ids_dataset.take(10): 
    print(chars_from_ids(ids).numpy().decode('utf-8')) 

F
i
r
s
t
 
C
i
t
i


In [16]:
seq_length = 100 

In [17]:
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True) 

for seq in sequences.take(1): 
  print(chars_from_ids(seq)) 

tf.Tensor(
[b'F' b'i' b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':'
 b'\n' b'B' b'e' b'f' b'o' b'r' b'e' b' ' b'w' b'e' b' ' b'p' b'r' b'o'
 b'c' b'e' b'e' b'd' b' ' b'a' b'n' b'y' b' ' b'f' b'u' b'r' b't' b'h'
 b'e' b'r' b',' b' ' b'h' b'e' b'a' b'r' b' ' b'm' b'e' b' ' b's' b'p'
 b'e' b'a' b'k' b'.' b'\n' b'\n' b'A' b'l' b'l' b':' b'\n' b'S' b'p' b'e'
 b'a' b'k' b',' b' ' b's' b'p' b'e' b'a' b'k' b'.' b'\n' b'\n' b'F' b'i'
 b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':' b'\n' b'Y'
 b'o' b'u' b' '], shape=(101,), dtype=string)


In [18]:
for seq in sequences.take(5): 
    print(text_from_ids(seq).numpy()) 

b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
b'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
b"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
b"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
b'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'


In [19]:
def split_input_target(sequence):
    input_text = sequence[:-1] 
    target_text = sequence[1:]
    return input_text, target_text 

In [20]:
split_input_target(list("Tensorflow")) 

(['T', 'e', 'n', 's', 'o', 'r', 'f', 'l', 'o'],
 ['e', 'n', 's', 'o', 'r', 'f', 'l', 'o', 'w'])

In [21]:
dataset = sequences.map(split_input_target) 

In [22]:
for input_example, target_example in dataset.take(1): 
    print("Input :", text_from_ids(input_example).numpy()) 
    print("Target:", text_from_ids(target_example).numpy()) 

Input : b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target: b'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '


In [23]:
# Batch size
BATCH_SIZE = 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000 

dataset = (
    dataset 
    .shuffle(BUFFER_SIZE) 
    .batch(BATCH_SIZE, drop_remainder=True) 
    .prefetch(tf.data.experimental.AUTOTUNE)) 

dataset 

<_PrefetchDataset element_spec=(TensorSpec(shape=(64, 100), dtype=tf.int64, name=None), TensorSpec(shape=(64, 100), dtype=tf.int64, name=None))>

In [24]:
# Length of the vocabulary in StringLookup Layer
vocab_size = len(ids_from_chars.get_vocabulary()) 

# The embedding dimension
embedding_dim = 256 

# Number of RNN units
rnn_units = 1024 

In [25]:
class MyModel(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__(self)
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim) 
    self.gru = tf.keras.layers.GRU(rnn_units, 
                                   return_sequences=True,
                                   return_state=True) 
    self.dense = tf.keras.layers.Dense(vocab_size) 

  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs 
    x = self.embedding(x, training=training)
    if states is None:
      states = self.gru.get_initial_state(x)
    x, states = self.gru(x, initial_state=states, training=training)
    x = self.dense(x, training=training) 

    if return_state:
      return x, states 
    else:
      return x

In [26]:
model = MyModel( 
    vocab_size=vocab_size, 
    embedding_dim=embedding_dim, 
    rnn_units=rnn_units)

In [27]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch) 
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)") 

(64, 100, 66) # (batch_size, sequence_length, vocab_size)


In [28]:
model.summary() 

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       multiple                  16896     
                                                                 
 gru (GRU)                   multiple                  3938304   
                                                                 
 dense (Dense)               multiple                  67650     
                                                                 
Total params: 4022850 (15.35 MB)
Trainable params: 4022850 (15.35 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [29]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1) 
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy() 

In [30]:
sampled_indices # menampilkan array variabel sampled_indices

array([16, 50, 52,  9, 48, 18, 60, 36, 24, 40, 50, 17, 58, 65, 49, 53, 40,
        3,  5,  1, 45, 57, 35, 10, 30, 19, 37, 28, 33,  6,  7, 15,  8, 63,
       21, 45,  0, 18, 30, 25, 38, 29,  5, 50, 18,  0,  9, 62, 39, 31, 24,
       49,  3,  1, 46, 61,  5, 31, 39, 62, 51, 14, 56, 35,  0, 47,  1,  3,
       15, 34, 51, 33, 12,  0, 43, 16, 48,  2, 17, 39, 28,  9, 37, 25, 40,
       45, 53, 58, 25, 47, 61, 29, 47, 53, 22, 17, 15, 16, 26, 16])

In [31]:
print("Input:\n", text_from_ids(input_example_batch[0]).numpy()) 
print()
print("Next Char Predictions:\n", text_from_ids(sampled_indices).numpy()) 

Input:
 b'furnish me of reason. They are come.\nYour mother was most true to wedlock, prince;\nFor she did print'

Next Char Predictions:
 b"Ckm.iEuWKakDszjna!&\nfrV3QFXOT',B-xHf[UNK]EQLYP&kE[UNK].wZRKj!\ngv&RZwlAqV[UNK]h\n!BUlT;[UNK]dCi DZO.XLafnsLhvPhnIDBCMC"


In [32]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True) 

In [33]:
example_batch_mean_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("Mean loss:        ", example_batch_mean_loss)

Prediction shape:  (64, 100, 66)  # (batch_size, sequence_length, vocab_size)
Mean loss:         tf.Tensor(4.1890483, shape=(), dtype=float32)


In [34]:
tf.exp(example_batch_mean_loss).numpy()

65.959984

In [35]:
model.compile(optimizer='adam', loss=loss) 

In [36]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints' 
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True) 

In [37]:
EPOCHS = 20

history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback]) 

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20



### Praktikum Tugas

In [38]:
class CustomTraining(MyModel):
  @tf.function # mengonversi fungsi Python menjadi graph TensorFlow
  def train_step(self, inputs):
      inputs, labels = inputs # Menyimpan value inputs dengan membaginya ke inputs dan labels
      with tf.GradientTape() as tape: # Membuat GradientTape untuk menghitung gradien dari operasi TensorFlow yang direkam
          predictions = self(inputs, training=True) # memanggil model untuk menghasilkan prediksi dengan mengaktifkan mode training
          loss = self.loss(labels, predictions) # menghitung kerugian model dengan membandingkan labels dengan predictions

      grads = tape.gradient(loss, model.trainable_variables) # menghitung gradien dari kerugian model terhadap variabel-variabel yang dapat dilatih
      self.optimizer.apply_gradients(zip(grads, model.trainable_variables)) # memperbarui parameter model menggunakan optimizer

      return {'loss': loss} # Mengembalikan dictionary dengan key 'loss' dan value variabel loss

In [39]:
model = CustomTraining( # Membuat Model CustomTraining
    vocab_size=len(ids_from_chars.get_vocabulary()), # dengan kosakata yang digunakna sebanyak vocabulary yang dibuat sebelumnya
    embedding_dim=embedding_dim, # argument embedding_dim dari variabel embedding_dim
    rnn_units=rnn_units) # dan banyak unit rnn dengan variabel rnn_units

In [40]:
model.compile(optimizer = tf.keras.optimizers.Adam(),
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)) # Compile model dengan optimizer ADAM yang menhasilkan output berupa logits dengan menggunakan function loss berupa SparseCategoricalCrossentropy

In [41]:
model.fit(dataset, epochs=1) # Melatih model dengan dataset dan 1 epochs



<keras.src.callbacks.History at 0x7b9b08f7d330>

Atau jika ingin lebih mengetahui dalamnya, kita bisa membuat custom training loop sendiri:

In [42]:
EPOCHS = 20 # Inisisalisasi variabel EPOCH

mean = tf.metrics.Mean() # membuat metrik rata-rata

for epoch in range(EPOCHS): # Perulangan sebanyak EPCOHS
    start = time.time() # Mencatat waktu mulai proses pada perulangan saat ini

    mean.reset_states() # mereset metrik rata-rata
    for (batch_n, (inp, target)) in enumerate(dataset): # melakukan iterasi pada dataset dan mendapatkan batch data yang terdiri dari input dan target dengan menambahkan indeks disetiap elemennya
        logs = model.train_step([inp, target]) # melatih modelg dengan menggunakan batch data yang diberikan
        mean.update_state(logs['loss']) # memperbarui metrik rata-rata dengan loss training

        if batch_n % 50 == 0: # Jika batch_n modulus 50 sama dengan 0
            template = f"Epoch {epoch+1} Batch {batch_n} Loss {logs['loss']:.4f}" # Menampilkan Epoch saat ini, value batch dan loss
            print(template) # Menampilkan template

    # saving (checkpoint) the model every 5 epochs
    if (epoch + 1) % 5 == 0: # Jika epoch + 1 modulus 5 sama dengan 0
        model.save_weights(checkpoint_prefix.format(epoch=epoch)) # menyimpan bobot model ke file dengan memanfaatkan checkpoint_prefix sebelumnya

    print()
    print(f'Epoch {epoch+1} Loss: {mean.result().numpy():.4f}') # Menampilkan epoch saat ini dan hasil dari mean
    print(f'Time taken for 1 epoch {time.time() - start:.2f} sec') # Menampilkan estimasi waktu dari selisih waktu saat ini dengan start
    print("_"*80) # Menampilkan 80 garis bawah

model.save_weights(checkpoint_prefix.format(epoch=epoch)) # menyimpan bobot model machine learning ke file dengan memanfaatkan checkpoint_prefix sebelumnya

Epoch 1 Batch 0 Loss 2.1850
Epoch 1 Batch 50 Loss 2.0719
Epoch 1 Batch 100 Loss 1.9632
Epoch 1 Batch 150 Loss 1.8372

Epoch 1 Loss: 1.9996
Time taken for 1 epoch 13.58 sec
________________________________________________________________________________
Epoch 2 Batch 0 Loss 1.7799
Epoch 2 Batch 50 Loss 1.7642
Epoch 2 Batch 100 Loss 1.6440
Epoch 2 Batch 150 Loss 1.6568

Epoch 2 Loss: 1.7203
Time taken for 1 epoch 11.90 sec
________________________________________________________________________________
Epoch 3 Batch 0 Loss 1.6131
Epoch 3 Batch 50 Loss 1.5627
Epoch 3 Batch 100 Loss 1.5742
Epoch 3 Batch 150 Loss 1.5305

Epoch 3 Loss: 1.5578
Time taken for 1 epoch 11.72 sec
________________________________________________________________________________
Epoch 4 Batch 0 Loss 1.4767
Epoch 4 Batch 50 Loss 1.4475
Epoch 4 Batch 100 Loss 1.4692
Epoch 4 Batch 150 Loss 1.4812

Epoch 4 Loss: 1.4576
Time taken for 1 epoch 11.65 sec
_____________________________________________________________________

Jalankan kode diatas dan sebutkan perbedaanya dengan praktikum 2?
> Perbedaan kode diatas dengan praktikum 2 yaitu pada praktikum 2 tidak memberi kita banyak kendali atas pelatihan. praktikum 2 menggunakan "teacher-forcing" yang mencegah prediksi buruk diumpankan kembali ke model, sehingga model tidak pernah belajar untuk pulih dari kesalahan.