<a href="https://colab.research.google.com/github/won-hj/deep_learning_study/blob/main/text_generation/text_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2019 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Text generation with an RNN

This tutorial demonstrates how to generate text using a character-based RNN. You will work with a dataset of Shakespeare's writing from Andrej Karpathy's [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). Given a sequence of characters from this data ("Shakespear"), train a model to predict the next character in the sequence ("e"). Longer sequences of text can be generated by calling the model repeatedly.

Note: Enable GPU acceleration to execute this notebook faster. In Colab: *Runtime > Change runtime type > Hardware accelerator > GPU*.

This tutorial includes runnable code implemented using [tf.keras](https://www.tensorflow.org/guide/keras/sequential_model) and [eager execution](https://www.tensorflow.org/guide/eager). The following is the sample output when the model in this tutorial trained for 30 epochs, and started with the prompt "Q":

<pre>
QUEENE:
I had thought thou hadst a Roman; for the oracle,
Thus by All bids the man against the word,
Which are so weak of care, by old care done;
Your children were in your holy love,
And the precipitation through the bleeding throne.

BISHOP OF ELY:
Marry, and will, my lord, to weep in such a one were prettiest;
Yet now I was adopted heir
Of the world's lamentable day,
To watch the next way with his father with his face?

ESCALUS:
The cause why then we are all resolved more sons.

VOLUMNIA:
O, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead,
And love and pale as any will to that word.

QUEEN ELIZABETH:
But how long have I heard the soul for this world,
And show his hands of life be proved to stand.

PETRUCHIO:
I say he look'd on, if I must be content
To stay him from the fatal of our country's bliss.
His lordship pluck'd from this sentence then for prey,
And then let us twain, being the moon,
were she such a case as fills m
</pre>

While some of the sentences are grammatical, most do not make sense. The model has not learned the meaning of words, but consider:

* The model is character-based. When training started, the model did not know how to spell an English word, or that words were even a unit of text.

* The structure of the output resembles a play—blocks of text generally begin with a speaker name, in all capital letters similar to the dataset.

* As demonstrated below, the model is trained on small batches of text (100 characters each), and is still able to generate a longer sequence of text with coherent structure.

대부분 문법파괴적인 문장들이고, 모델도 단어의 의미를 학습하진 않았음
* 모델은 문자 기반으로 만들어져있어서, 학습이 시작되면 영어 단어가 어떻게 이루어져있는지 혹은 그 단어들이 텍스트를 구성하는 의미인지도 모름
* 모델은 화자의 이름이 대문자로 표시된 각본과 같은 구조를 가진 말뭉치 구조의 출력을 만듦
* 100자의 작은 묶음으로 학습됐지만 문법, 문맥, 맥락 등 전체적인 흐름이 자연스럽고 더 긴 텍스트 시퀀스를 만들 수 있

## 셋업

### Import TensorFlow and other libraries

In [2]:
import tensorflow as tf

import numpy as np
import os
import time

### Download the Shakespeare dataset

Change the following line to run this code on your own data.

In [3]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', origin='https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
[1m1115394/1115394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


### Read the data

First, look in the text:

In [4]:
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print(f'Length of text: {len(text)} characters')

Length of text: 1115394 characters


In [5]:
# take a look at the first 250 characters in text
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



In [6]:
# the unique characters in the file
vocab = sorted(set(text))
print(f'{len(vocab)} unique characters')

65 unique characters


## Process the text

### Vectorize the text

Before training, you need to convert the strings to a numerical representation.

The `tf.keras.layers.StringLookup` layer can convert each character into a numeric ID. It just needs the text to be split into tokens first.

훈련하기 전, 문자열을 숫자로 표현해야 함

`tf.keras.layers.StringLookup` 레이어는 문자열과 인덱스 간 매칭을 해줌
토큰 단위로 분리될 텍스트만 필요함

In [7]:
example_text = ['abcdefg', 'xyz']

chars = tf.strings.unicode_split(example_text, input_encoding='UTF-8')
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

`tf.keras.layers.StringLookup' 레이어:

In [8]:
ids_from_chars = tf.keras.layers.StringLookup(
    vocabulary=list(vocab), mask_token=None,
)

실행하면 토큰들을 character IDs로 변환

In [9]:
ids = ids_from_chars(chars)
ids

<tf.RaggedTensor [[40, 41, 42, 43, 44, 45, 46], [63, 64, 65]]>

Since the goal of this tutorial is to generate text, it will also be important to invert this representation and recover human-readable strings from it. For this you can use `tf.keras.layers.StringLookup(..., invert=True)`.  

이 튜토리얼은 텍스트를 생성하는게 목표이기 때문에 이렇게 문자열을 벡터로 만들고, 다시 사람이 읽을 수 있도록 문자열로 되돌리는 과정이 중요함
`tf.keras.layers.StringLookup(...,invert=True)`를 사

Note: Here instead of passing the original vocabulary generated with `sorted(set(text))` use the `get_vocabulary()` method of the `tf.keras.layers.StringLookup` layer so that the `[UNK]` tokens is set the same way.

`sorted(set(text))`로 만든 단어장보단 `tf.keras.layers.StringLoopup`의 `get_vocabulary()`를 사용하는게 `[UNK]`토큰이 일관되게 적용됨

In [10]:
chars_from_ids = tf.keras.layers.StringLookup(
    vocabulary=ids_from_chars.get_vocabulary(), invert=True, mask_token=None
)

This layer recovers the characters from the vectors of IDs, and returns them as a `tf.RaggedTensor` of characters:

인덱스 벡터를 원래 문자로 되돌리는 레이어이며, `tf.RaggedTensor` 형태의 문자들을 반환

In [11]:
chars = chars_from_ids(ids)
chars

<tf.RaggedTensor [[b'a', b'b', b'c', b'd', b'e', b'f', b'g'], [b'x', b'y', b'z']]>

You can `tf.strings.reduce_join` to join the characters back into strings.

`tf.strings.reduce_join`으로도 사용 가능

In [12]:
tf.strings.reduce_join(chars, axis=-1).numpy()

array([b'abcdefg', b'xyz'], dtype=object)

In [13]:
def text_from_ids(ids):
  return tf.strings.reduce_join(chars_from_ids(ids), axis=-1)

### The prediction task

Given a character, or a sequence of characters, what is the most probable next character? This is the task you're training the model to perform. The input to the model will be a sequence of characters, and you train the model to predict the output—the following character at each time step.

Since RNNs maintain an internal state that depends on the previously seen elements, given all the characters computed until this moment, what is the next character?

하나의 문자나 문자열이 주어질 때 어떤 문자가 다음에 올 수 있을지에 대한 문제가 모델 훈련 시의 목표
각 시점마다 다음에 올 문자를 예측하기 위해 시퀀스 현태의 문자들을 모델에 넣어 훈련

RNN은 이전 상태를 기억하고 활용하기 때문에 현재까지의 계산 결과를 가지고 다음 상태를 예측

### Create training examples and targets

Next divide the text into example sequences. Each input sequence will contain `seq_length` characters from the text.

For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right.

So break the text into chunks of `seq_length+1`. For example, say `seq_length` is 4 and our text is "Hello". The input sequence would be "Hell", and the target sequence "ello".

To do this first use the `tf.data.Dataset.from_tensor_slices` function to convert the text vector into a stream of character indices.

텍스트를 시퀀스로 나눔
각 입력 시퀀스에는 `seq_length`개의 문자 포함

각각의 인풋 시퀀스마다 연관있는 타겟들은 오른쪽으로 한 칸 이동하지 않는 이상 같은 길이를 가짐

따라서 텍스트를 `seq_length+1` 이라는 청크들로 쪼갬
예를 들어, `seq_length`가 4이고 "Hello"라는 텍스트가 있다면 입력 시퀀스는 "Hell"이며 타겟 시퀀스는 "ello"가 됨

이것을 하려면 먼저 `tf.data.Dataset.from_tensor_slices`를 이용해 텍스트 벡터를 문자인덱스 스트림으로 바꿔야 함

In [14]:
all_ids = ids_from_chars(tf.strings.unicode_split(text, 'UTF-8'))
all_ids

<tf.Tensor: shape=(1115394,), dtype=int64, numpy=array([19, 48, 57, ..., 46,  9,  1])>

In [15]:
ids_dataset = tf.data.Dataset.from_tensor_slices(all_ids)

In [16]:
for ids in ids_dataset.take(10):
  print(chars_from_ids(ids).numpy().decode('utf-8'))

F
i
r
s
t
 
C
i
t
i


In [17]:
seq_length=100



The `batch` method lets you easily convert these individual characters to sequences of the desired size.

`batch`메서드를 사용하면 독립된 문자들을 원하는 크기의 시퀀스로 만드는데 좋음

In [18]:
sequences = ids_dataset.batch(seq_length+1, drop_remainder=True)

for seq in sequences.take(1):
  print(chars_from_ids(seq))

tf.Tensor(
[b'F' b'i' b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':'
 b'\n' b'B' b'e' b'f' b'o' b'r' b'e' b' ' b'w' b'e' b' ' b'p' b'r' b'o'
 b'c' b'e' b'e' b'd' b' ' b'a' b'n' b'y' b' ' b'f' b'u' b'r' b't' b'h'
 b'e' b'r' b',' b' ' b'h' b'e' b'a' b'r' b' ' b'm' b'e' b' ' b's' b'p'
 b'e' b'a' b'k' b'.' b'\n' b'\n' b'A' b'l' b'l' b':' b'\n' b'S' b'p' b'e'
 b'a' b'k' b',' b' ' b's' b'p' b'e' b'a' b'k' b'.' b'\n' b'\n' b'F' b'i'
 b'r' b's' b't' b' ' b'C' b'i' b't' b'i' b'z' b'e' b'n' b':' b'\n' b'Y'
 b'o' b'u' b' '], shape=(101,), dtype=string)


It's easier to see what this is doing if you join the tokens back into strings:

토큰들을 스트링으로 다시 합치면 어떻게 되는지 보기 좋음

In [62]:
for seq in sequences.take(5):
  print(text_from_ids(seq).numpy())

b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
b'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
b"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
b"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
b'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'


For training you'll need a dataset of `(input, label)` pairs. Where `input` and
`label` are sequences. At each time step the input is the current character and the label is the next character.

Here's a function that takes a sequence as input, duplicates, and shifts it to align the input and label for each timestep:

훈련에는 시퀀스 형태의 `(input, label)`쌍이 필요
각 시점에서 input은 현재의 문자이고 label은 다음에 올 문자

다음은 시퀀스를 입력으로 받아 각 시점마다의 input과 label의 위치를 맞춤

In [19]:
def split_input_target(sequence):
  input_text = sequence[:-1]
  target_text = sequence[1:]
  return input_text, target_text

In [20]:
split_input_target(list('Tensorflow'))

(['T', 'e', 'n', 's', 'o', 'r', 'f', 'l', 'o'],
 ['e', 'n', 's', 'o', 'r', 'f', 'l', 'o', 'w'])

In [21]:
dataset = sequences.map(split_input_target)

In [22]:
for input_ex, target_ex in dataset.take(1):
  print('input: ', text_from_ids(input_ex).numpy())
  print('target: ', text_from_ids(target_ex).numpy())

input:  b'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
target:  b'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '


### Create training batches

You used `tf.data` to split the text into manageable sequences. But before feeding this data into the model, you need to shuffle the data and pack it into batches.

`tf.data`로 텍스트를 사용할 시퀀스로 만들었음
그러나 모델에 입력하기 전에 섞고 배치로 묶어야 함

In [23]:
BATCH_SIZE = 64

# buffer size to shuffle
'''
TensorFlow의 tf.data는 무한한 데이터 스트림도 처리할 수 있어야 해.
(예: 끝이 없는 데이터 생성기)
그런데 데이터를 한 번에 다 메모리에 올려서 섞으면?
→ 메모리 부족으로 속도가 느려지고, 큰 데이터는 아예 불가능!
그래서 일정 크기의 버퍼(buffer)를 유지하면서 그 안에서만 랜덤하게 섞어
'''
BUFFER_SIZE = 10000

dataset = (dataset
           .shuffle(BUFFER_SIZE)
           .batch(BATCH_SIZE)
           .prefetch(tf.data.experimental.AUTOTUNE))

dataset

<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 100), dtype=tf.int64, name=None), TensorSpec(shape=(None, 100), dtype=tf.int64, name=None))>

## Build The Model

This section defines the model as a `keras.Model` subclass (For details see [Making new Layers and Models via subclassing](https://www.tensorflow.org/guide/keras/custom_layers_and_models)).

This model has three layers:

* `tf.keras.layers.Embedding`: The input layer. A trainable lookup table that will map each character-ID to a vector with `embedding_dim` dimensions;
* `tf.keras.layers.GRU`: A type of RNN with size `units=rnn_units` (You can also use an LSTM layer here.)
* `tf.keras.layers.Dense`: The output layer, with `vocab_size` outputs. It outputs one logit for each character in the vocabulary. These are the log-likelihood of each character according to the model.

* `tf.keras.layers.Embedding`: 입력 계층, 학습 가능한 lookup 테이블, 각 character-ID를 `embedding_dim`차원에 매핑
* `tf.keras.layers.GRU`: `units=rnn_units`크기의 RNN(LSTM 사용 가능)
* `tf.keras.layers.Dense`: 출력 계층, `vocab_size`개의 출력을 가짐, 각 문자당 하나의 로짓을 출력, 각 문자의 로그우도 점수로 출력



In [24]:
# Length of the vocabulary in StringLookup Layer
vocab_size = len(ids_from_chars.get_vocabulary())

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 1024

In [54]:
class MyModel(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, rnn_units):
    super().__init__()
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(rnn_units,
                                   return_sequences=True,
                                   return_state=True)
    self.dense = tf.keras.layers.Dense(vocab_size)

  #@tf.function
  def call(self, inputs, states=None, return_state=False, training=False):
    x = inputs
    x = self.embedding(x, training=training)
    if states is None:
        #1 불필요한 [0] 사용, 다른 반환값 사
        #states = self.gru.get_initial_state(x)[0]# 동적인 batch_size 가져오기
        #states = tf.zeros([batch_size, self.gru.units])  # rnn_units 크기로 초기화
        #2 반환값 개수 불일치
        #states = self.gru.get_initial_state(batch_size=tf.shape(x)[0])
        ####
        #states = self.gru.get_initial_state(tf.shape(x)[0]) #[0]
        batch_size = tf.shape(x)[0]  # 동적 batch_size 가져오기
        states = tf.zeros([batch_size, self.gru.units])  # (batch_size, rnn_units)

    #r = self.gru(x, initial_state=states, training=training)
    #x, states = r[0], r[1:] #
    outputs  = self.gru(x, initial_state=states, training=training)
    print('gru 출력 수 :  ',len(outputs)) #65
    print(f"Type of outputs: {type(outputs)}")
    print(f"Length of outputs: {len(outputs)}")

    if isinstance(outputs, tuple) or isinstance(outputs, list):
            x = outputs[0]  # 첫 번째 요소 (출력 시퀀스)
            states = outputs[1] if len(outputs) > 1 else None  # 두 번째 요소 (최종 상태)
    else:
            x = outputs  # 단일 텐서일 경우

    #x, states = self.gru(x, initial_state=states, training=training)
    x = self.dense(x, training=training)

    if return_state:
      return x, states
    else:
      return x

In [55]:
model = MyModel(
    vocab_size=vocab_size,
    embedding_dim=embedding_dim,
    rnn_units=rnn_units)

For each character the model looks up the embedding, runs the GRU one timestep with the embedding as input, and applies the dense layer to generate logits predicting the log-likelihood of the next character:


각 문자(character)에 대해,
1️⃣ 모델은 해당 문자의 임베딩을 조회(look up the embedding)하고,
2️⃣ 임베딩 벡터를 입력으로 사용해 GRU를 한 타임스텝 실행(runs the GRU one timestep with the embedding as input)한 후,
3️⃣ Dense 레이어를 적용하여 다음 문자의 로그-우도를 예측하는 로짓(logits)을 생성(applies the dense layer to generate logits predicting the log-likelihood of the next character)한다.

![A drawing of the data passing through the model](https://github.com/tensorflow/text/blob/master/docs/tutorials/images/text_generation_training.png?raw=1)

Note: For training you could use a `keras.Sequential` model here. To  generate text later you'll need to manage the RNN's internal state. It's simpler to include the state input and output options upfront, than it is to rearrange the model architecture later. For more details see the [Keras RNN guide](https://www.tensorflow.org/guide/keras/rnn#rnn_state_reuse).

## Try the model

Now run the model to see that it behaves as expected.

First check the shape of the output:

In [56]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape)  # (batch_size, sequence_length, vocab_size)

gru 출력 수 :   65
Type of outputs: <class 'tuple'>
Length of outputs: 65
(64, 100, 66)


In the above example the sequence length of the input is `100` but the model can be run on inputs of any length:

위의 예시의 경우 seq_length가 100이지만 입력의 길이는 상관없음

In [57]:
model.summary()

To get actual predictions from the model you need to sample from the output distribution, to get actual character indices. This distribution is defined by the logits over the character vocabulary.

Note: It is important to _sample_ from this distribution as taking the _argmax_ of the distribution can easily get the model stuck in a loop.

Try it for the first example in the batch:

실제 예측 결과를 가져오려면 문자 vocab으로 만들어지는 로짓에 의해 정의되는 출력의 분포로부터 샘플링하여 문자 인덱스를 가져와야 함

샘플링이 중요한데, 출력 분포에서 argmax를 사용하면 모델이 루프에 빠지귀 쉬

In [58]:
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()

Decode these to see the text predicted by this untrained model:

In [59]:
print('input: \n', text_from_ids(input_example_batch[0]).numpy())
print()
print('Next Char Predictions:\n', text_from_ids(sampled_indices).numpy())

input: 
 b're note: the report of her is\nextended more than can be thought to begin from such a cottage.\n\nPOLIX'

Next Char Predictions:
 b':m,&uZ.lHV$K!jTuCAHclkO;xAroibHpHEUJCF[UNK]KuTqr&\n&!zELnPYVy;WrKes-OYU;M\nw qrmaLCkcNhijKESrxNVTYUfe:jlQn'


## Train the model

At this point the problem can be treated as a standard classification problem. Given the previous RNN state, and the input this time step, predict the class of the next character.

이 시점에서 문제를 일반적인 분류 문제(classification problem) 로 처리할 수 있습니다.
이전 RNN 상태와 현재 입력을 기반으로 다음 문자의 클래스를 예측하면 됩니다.

### Attach an optimizer, and a loss function

The standard `tf.keras.losses.sparse_categorical_crossentropy` loss function works in this case because it is applied across the last dimension of the predictions.

Because your model returns logits, you need to set the `from_logits` flag.


표준 손실 함수인 `tf.keras.losses.sparse_categorical_crossentropy`를 사용할 수 있습니다.

이 함수는 예측값의 마지막 차원(각 문자의 클래스 확률)에서 적용되므로 적절합니다.

또한, 모델이 로짓(logits)을 반환하므로 `from_logits=True` 옵션을 설정해야 합니다.

In [60]:
loss = tf.losses.SparseCategoricalCrossentropy(from_logits=True)

In [61]:
example_batch_mean_loss = loss(target_example_batch, example_batch_predictions)
print('Prediction shape: ', example_batch_predictions.shape, ' # (batch_size, sequence_length, vocab_size)')
print('Mean loss:        ', example_batch_mean_loss)

Prediction shape:  (64, 100, 66)  # (batch_size, sequence_length, vocab_size)
Mean loss:         tf.Tensor(4.189313, shape=(), dtype=float32)


A newly initialized model shouldn't be too sure of itself, the output logits should all have similar magnitudes. To confirm this you can check that the exponential of the mean loss is approximately equal to the vocabulary size. A much higher loss means the model is sure of its wrong answers, and is badly initialized:

새로 초기화된 모델은 자신감이 너무 높으면 안 됩니다.
즉, 출력되는 로짓(logits) 값들은 비슷한 크기를 가져야 합니다.

이를 확인하는 방법 중 하나는,
👉 손실 값의 지수(exponential) 를 계산하면 대략 어휘 크기(vocabulary size) 와 비슷해야 합니다.

만약 손실 값이 훨씬 크다면?

모델이 틀린 정답에 대해 너무 확신하고 있다는 의미입니다.
즉, 잘못된 초기화(bad initialization) 가 이루어졌을 가능성이 큽니다.

In [62]:
tf.exp(example_batch_mean_loss).numpy()

65.97745

Configure the training procedure using the `tf.keras.Model.compile` method. Use `tf.keras.optimizers.Adam` with default arguments and the loss function.



In [63]:
model.compile(optimizer='adam', loss=loss)

### Configure checkpoints

Use a `tf.keras.callbacks.ModelCheckpoint` to ensure that checkpoints are saved during training:

1️⃣ 체크포인트 저장 (Checkpoints)
모델 학습 도중 tf.keras.callbacks.ModelCheckpoint 를 사용해 체크포인트를 저장하면,
👉 학습이 중간에 중단되더라도 저장된 가중치를 불러와 다시 학습할 수 있음.

In [64]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, 'ckpt_{epoch}', '.weights.h5')

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True
)

### Execute the training

To keep training time reasonable, use 10 epochs to train the model. In Colab, set the runtime to GPU for faster training.

2️⃣ 모델 학습 실행 (Execute Training)
Colab을 사용할 경우, GPU 런타임 설정을 하면 학습 속도가 훨씬 빨라짐
예제에서는 10 에포크(epoch) 를 사용
하지만 더 좋은 성능을 원하면 EPOCHS = 30 처럼 더 오래 학습 가능


In [65]:
EPOCHS = 20
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

## 계속 문제 생김 아마 튜토리얼 만들어진 때랑 지금이랑 버전차이가 나서 생기는 출력 형식때문에 안맞아서 그런듯

Epoch 1/20


OperatorNotAllowedInGraphError: Exception encountered when calling GRU.call().

[1mIterating over a symbolic `tf.Tensor` is not allowed. You can attempt the following resolutions to the problem: If you are running in Graph mode, use Eager execution mode or decorate this function with @tf.function. If you are using AutoGraph, you can try decorating this function with @tf.function. If that does not work, then you may be using an unsupported feature or your source code may not be visible to AutoGraph. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-source-code for more information.[0m

Arguments received by GRU.call():
  • sequences=tf.Tensor(shape=(None, 100, 256), dtype=float32)
  • initial_state=tf.Tensor(shape=(None, 1024), dtype=float32)
  • mask=None
  • training=True

## Generate text

The simplest way to generate text with this model is to run it in a loop, and keep track of the model's internal state as you execute it.



📌 텍스트 생성 과정 시각화
![To generate text the model's output is fed back to the input](https://github.com/tensorflow/text/blob/master/docs/tutorials/images/text_generation_sampling.png?raw=1)

Each time you call the model you pass in some text and an internal state. The model returns a prediction for the next character and its new state. Pass the prediction and state back in to continue generating text.

3️⃣ 텍스트 생성 (Text Generation)
기본 원리
모델의 출력을 다시 입력으로 사용
내부 상태(state)를 유지하면서 반복 실행
한 글자씩 예측해 텍스트를 점진적으로 생성

The following makes a single step prediction:

In [None]:
class OneStep(tf.keras.Model):
  def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
    super().__init()
    self.temperature = temperature
    self.model = model
    self.chars_from_ids = chars_from_ids
    self.ids_from_chars = ids_from_chars

    #create a mask to prevent '[UNK]' from being generated
    skip_ids = self.ids_from_chars(['[UNK]'])[:, None]
    sparse_mask = tf.SparseTensor(
        #put a -inf at each bad index
        values=[-float('inf')]*len(skip_ids),
        indices=skip_ids,
        #match the shape to the vocabulary
        dense_shape=[len(ids_from_chars.get_vocabulary())])
    self.prediction_mask = tf.sparse.to_dense(sparse_mask)

  @tf.function
  def generate_one_step(self, inputs, states=None):
    #convert strings to token ids
    input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
    input_ids = self.ids_from_chars(input_chars).to_tensor()

    #run the model
    #predicted_logits.shape is [batch, char, next_char_logits]
    predicted_logits, states = self.model(inputs=input_ids, states=states,returen_state=True)

    #only use the last prediction
    predicted_logits = predicted_logits[:, -1, :]
    predicted_logits = predicted_logits/self.temperature
    #apply the prediction mask: prevent '[UNK]' from being generated
    predicted_logits = predicted_logits + self.prediction_mask

    #sample the output logits to generate token ids
    predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
    predicted_ids = tf.squeeze(predicted_ids, axis=-1)

    #convert from token ids to characters
    predicted_chars = self.chars_from_ids(predicted_ids)

    returen predicted_chars, states


Run it in a loop to generate some text. Looking at the generated text, you'll see the model knows when to capitalize, make paragraphs and imitates a Shakespeare-like writing vocabulary. With the small number of training epochs, it has not yet learned to form coherent sentences.

4️⃣ 텍스트 생성 방식
💡 한 번에 한 글자씩 예측하는 방식

초기 문자열과 내부 상태를 모델에 입력
다음 글자에 대한 확률 분포(logits) 예측
예측된 글자를 입력으로 다시 사용
이를 반복하여 새로운 텍스트 생성
🔹 한 번 실행하면 한 글자 예측
🔹 반복 실행하면 전체 문장 생성 가능

In [None]:
start = time.time()
states=None
next_char = tf.constant(['ROMEO:'])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end = time.time()
print(result[0].numpy().decode('utf-8'), '\n\n', '_'*80)
print('\nRun time:', end-start)

The easiest thing you can do to improve the results is to train it for longer (try `EPOCHS = 30`).

You can also experiment with a different start string, try adding another RNN layer to improve the model's accuracy, or adjust the temperature parameter to generate more or less random predictions.

5️⃣ 성능 개선 방법
에포크(Epochs) 증가: EPOCHS = 30 으로 학습량 증가
시작 문자열(Start String) 변경: 다른 입력 패턴 시도
RNN 계층 추가: 모델의 학습 용량 증가
온도(Temperature) 조절:
낮추면 (temperature < 1.0) → 예측이 더 확정적 (보수적)
높이면 (temperature > 1.0) → 더 창의적이지만 랜덤성이 증가

If you want the model to generate text *faster* the easiest thing you can do is batch the text generation. In the example below the model generates 5 outputs in about the same time it took to generate 1 above.

6️⃣ 더 빠르게 텍스트 생성하는 방법
텍스트 생성을 배치(batch)로 수행
기존 방식: 한 글자씩 순차적으로 생성 → 느림
배치 방식: 한 번에 여러 개의 출력을 생성 → 더 빠름

In [None]:
start = time.time()
states=None
next_char = tf.constant(['ROMEO:' 'ROMEO', "ROMEO", "ROMEO", "ROMEO"])
result = [next_char]

for n in range(1000):
  next_char, states = one_step_model.generate_one_step(next_char, states=states)
  result.append(next_char)

result = tf.strings.join(result)
end=time.time()
print(result, '\n\n'+'_'*80)
print('\nRun time:', end-start)

## Export the generator

This single-step model can easily be [saved and restored](https://www.tensorflow.org/guide/saved_model), allowing you to use it anywhere a `tf.saved_model` is accepted.

1️⃣ 모델 내보내기 (Export the Generator)
단일 스텝 모델(single-step model) 을 tf.saved_model 형식으로 저장 가능
이렇게 하면 다른 환경에서도 모델을 쉽게 불러와 사용 가능
👉 관련 문서: SavedModel Guide

In [None]:
tf.save_model.save(one_step_model, 'one_step')
one_step_reloaded=tf.saved_model.load('one_step')

In [None]:
states=None
next_char = tf.constant(['ROMEO:'])
result = [next_char]

for n in range(100):
  next_char, states = one_step_reloaded.generate_one_step(next_char, states=states)
  result.append(next_char)

print(tf.strings.join(result)[0].numpy().decode('utf-8'))

## Advanced: Customized Training

The above training procedure is simple, but does not give you much control.
It uses teacher-forcing which prevents bad predictions from being fed back to the model, so the model never learns to recover from mistakes.

So now that you've seen how to run the model manually next you'll implement the training loop. This gives a starting point if, for example, you want to implement _curriculum  learning_ to help stabilize the model's open-loop output.

The most important part of a custom training loop is the train step function.

Use `tf.GradientTape` to track the gradients. You can learn more about this approach by reading the [eager execution guide](https://www.tensorflow.org/guide/eager).

The basic procedure is:

1. Execute the model and calculate the loss under a `tf.GradientTape`.
2. Calculate the updates and apply them to the model using the optimizer.


2️⃣ 고급: 맞춤형 학습 (Customized Training)
위에서 사용한 학습 방식은 "교사 강제 학습(Teacher Forcing)"
즉, 정답을 직접 입력으로 주기 때문에 모델이 틀렸을 때 복구하는 법을 배우지 못함
따라서, 맞춤형 학습 루프를 만들어 모델이 더 강건하게 학습할 수 있도록 개선 가능
📌 이런 방식을 적용하는 이유?
✅ 오픈 루프(open-loop) 예측 안정화
✅ 커리큘럼 학습(curriculum learning) 적용 가능
✅ 보다 세밀한 학습 제어 가능

3️⃣ 맞춤형 학습 루프(Custom Training Loop)
✔ tf.GradientTape 를 사용해 모델 학습 과정을 직접 제어 가능
✔ Eager Execution 모드에서 동작하며, 보다 유연한 학습 가능

📌 기본 학습 절차
1️⃣ 모델 실행 및 손실 계산 (tf.GradientTape 사용)
2️⃣ 손실에 대한 그래디언트(Gradient) 계산
3️⃣ 옵티마이저를 사용해 모델 업데이트

👉 관련 문서: Eager Execution Guide

In [None]:
class CustomTraining(MyModel):
  @tf.function
  def train_step(self, inputs):
    inputs, labels = inputs
    with tf.GradientTape() as tape:
      predictions = self(inputs, training=True)
      loss = self.loss(labels, predictions)
    grads = tape.gradient(loss, model.trainable_variables)
    self.optimizer.apply_gradients(zip(grads, model.trainable_variables))

    return {'loss':loss}

The above implementation of the `train_step` method follows [Keras' `train_step` conventions](https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit). This is optional, but it allows you to change the behavior of the train step and still use keras' `Model.compile` and `Model.fit` methods.

4️⃣ train_step 함수 사용 방식
💡 Keras의 train_step 규칙을 따르는 방법

train_step 을 직접 정의해도 Model.compile() 및 Model.fit()을 그대로 사용할 수 있음
💡 완전한 맞춤형 학습 루프 작성

Model.fit()을 사용하지 않고, 완전히 독립적인 학습 루프를 만들 수도 있음
예를 들어, 더 높은 수준의 제어가 필요한 경우 직접 학습 루프를 구현 가능
✅ ➡ 따라서, 기본 학습 방법을 활용할 수도 있고, 필요에 따라 완전한 맞춤형 학습 루프를 구현할 수도 있음! 🚀

In [None]:
model = CustomTraining(
    vocab_size=len(ids_from_chars.get_vocabulary()),
    embedding_dim=embedding_dim,
    rnn_units=rnn_units
)

In [None]:
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

In [None]:
model.fit(dataset, epochs=1)

Or if you need more control, you can write your own complete custom training loop:

In [None]:
EPOCHS=10

mean = tf.metrics.Mean()

for epoch in range(EPOCHS):
  start = time.time()

  mean.reset_states()
  for (batch_n, (inp, target)) in enumerate(dataset):
    logs = model.train_step([inp, target])
    mean.update_state(logs['loss'])

    if batch_n % 50 ==0:
      template = f'Epoch {epoch+1} Batch {batch_n} Loss {logs['loss']:.4f}'
      print(template)

  # saving checkpoint the model every 5 epochs
  if (epoch + 1) % 5 == 0:
    model.save_weights(checkpoint_prefix.format(epoch=epoch))

  print()
  print(f'Epoch {epoch+1} Loss: {mean.result().numpy():.4f}')
  print(f'Time taken for 1 epoch {time.time() - start:.2f} sec')
  print("_"*80)

model.save_weights(checkpoint_prefix.format(epoch=epoch))