## 케라스로 RNN 구현하기
[SimpleRNN 공식문서](https://www.tensorflow.org/api_docs/python/tf/keras/layers/SimpleRNN)

```python
from tensorflow.keras.layers import SimpleRNN

model.add(SimpleRNN(hidden_units))
---------------------------------------------------------------------------
# 추가 인자를 사용할 때
model.add(SimpleRNN(hidden_units, input_shape=(timesteps, input_dim)))

# 다른 표기
model.add(SimpleRNN(hidden_units, input_length=M, input_dim=N))
```

* hidden_units
  * dimensionality of the output space, 출력 공간의 차원
  * 은닉 상태의 크기를 정의
  * 메모리 셀이 다음 시점의 메모리 셀과 출력층으로 보내는 값의 크기(output_dim)와 동일
  * RNN의 용량(capacity)을 늘린다고 보면 됨
  * 보통 128, 256, 512, 1024 등의 값을 가짐
* timesteps
  * 입력 시퀀스의 길이(input_length)
  * 시점의 수
* input_dim
  * 입력의 크기

In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN

model = Sequential()
model.add(SimpleRNN(units=3, input_shape=(2, 10)))
# model.add(SimpleRNN(3, input_length=2, input_dim=10))와 같은 코드
model.summary()

Metal device set to: Apple M1 Pro

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn (SimpleRNN)      (None, 3)                 42        
                                                                 
Total params: 42
Trainable params: 42
Non-trainable params: 0
_________________________________________________________________


2023-02-21 18:46:46.630532: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-02-21 18:46:46.630773: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


* RNN 층은 (batch_size, timesteps, input_dim) 크기의 3D 텐서를 입력 받음
* 자연어에서는 보통
  * input_dim : 인코딩 단어 벡터
  * timesteps : 단어의 수, 문장의 길이
  * batch_size : 문장의 수

In [4]:
model = Sequential()
model.add(SimpleRNN(3, batch_input_shape=(8,2,10)))
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_2 (SimpleRNN)    (8, 3)                    42        
                                                                 
Total params: 42
Trainable params: 42
Non-trainable params: 0
_________________________________________________________________


In [3]:
# return_sequences=True
# Whether to return the last output in the output sequence, or the full sequence. Default: False.
# 출력 값으로 (batch_size, timesteps, output_dim) 크기의 3D Tensor return
model = Sequential()
model.add(SimpleRNN(3, batch_input_shape=(8,2,10), return_sequences=True))
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_1 (SimpleRNN)    (8, 2, 3)                 42        
                                                                 
Total params: 42
Trainable params: 42
Non-trainable params: 0
_________________________________________________________________


## Python으로 RNN 구현

In [7]:
import numpy as np

# 문장의 길이
timesteps = 10

# 입력의 차원, 단어 벡터의 차원
input_dim = 4

# 은닉 상태의 크기, 메모리 셀의 용량
hidden_units = 8

inputs = np.random.random((timesteps, input_dim))
print(inputs)

# 8의 차원을 가지는 벡터, h_{t-1}
hidden_state_t = np.zeros((hidden_units))
print("초기 은닉 상태 :", hidden_state_t)

[[0.93679362 0.28312536 0.26017233 0.56932907]
 [0.44912398 0.80783303 0.04227668 0.82879041]
 [0.59945913 0.52241481 0.58891286 0.33280499]
 [0.77877785 0.24336579 0.99086586 0.1379726 ]
 [0.54610933 0.84598669 0.85937943 0.84233939]
 [0.41027956 0.98810542 0.2214896  0.80721748]
 [0.50277082 0.11929504 0.71400028 0.78613042]
 [0.36680646 0.85468146 0.8977861  0.97187282]
 [0.61733787 0.45114618 0.66857906 0.46742497]
 [0.54387829 0.99271519 0.69112894 0.47563419]]
초기 은닉 상태 : [0. 0. 0. 0. 0. 0. 0. 0.]


### 가중치와 편향을 각 크기에 맞게 정의

In [8]:
# 입력에 대한 가중치, 단어 차원에 맞게 설정
Wx = np.random.random((hidden_units, input_dim))

# 은닉 상태에 대한 가중치
Wh = np.random.random((hidden_units, hidden_units))

# 편향
b = np.random.random((hidden_units))

print("가중차 Wx의 크기(shape) :", np.shape(Wx))
print("가중차 Wh의 크기(shape) :", np.shape(Wh))
print("편향 b의 크기(shape) :", np.shape(b))

가중차 Wx의 크기(shape) : (8, 4)
가중차 Wh의 크기(shape) : (8, 8)
편향 b의 크기(shape) : (8,)


In [16]:
total_hidden_states = []

# 문장의 단어들을 하나씩 순회
for input_t in inputs:
    
    # 가설 함수 계산
    # Wx * Xt + Wh * Ht-1 + b
    output_t = np.tanh(np.dot(Wx, input_t) +  np.dot(Wh, hidden_state_t) + b)
    print("가설 함수로 예측한 결과 :", output_t)
    total_hidden_states.append(list(output_t))
    hidden_state_t = output_t

가설 함수로 예측한 결과 : [0.99998491 0.9998602  0.99999718 0.99998758 0.99843477 0.9999498
 0.99997012 0.99998906]
가설 함수로 예측한 결과 : [0.99999411 0.99988677 0.99999855 0.99998583 0.99827419 0.99994577
 0.99997031 0.99998988]
가설 함수로 예측한 결과 : [0.99998381 0.99989015 0.9999971  0.99998765 0.99886745 0.99994869
 0.99996718 0.99998153]
가설 함수로 예측한 결과 : [0.99997325 0.99989941 0.99999625 0.99999015 0.99917252 0.99995628
 0.99997307 0.99997973]
가설 함수로 예측한 결과 : [0.99999613 0.9999522  0.99999949 0.9999939  0.99934314 0.99998268
 0.99998948 0.99999541]
가설 함수로 예측한 결과 : [0.9999954  0.99991695 0.99999907 0.99998861 0.99872396 0.99996442
 0.99997596 0.99999191]
가설 함수로 예측한 결과 : [0.99998886 0.99988512 0.99999726 0.99998878 0.99866778 0.99991844
 0.99998263 0.99998587]
가설 함수로 예측한 결과 : [0.99999693 0.99995385 0.99999953 0.99999375 0.99931852 0.99997881
 0.9999908  0.9999952 ]
가설 함수로 예측한 결과 : [0.99998671 0.99990015 0.99999765 0.99998918 0.9989323  0.99995273
 0.99997525 0.99998558]
가설 함수로 예측한 결과 : [0.99999321 0.9999396 

In [14]:
total_hidden_states = np.stack(total_hidden_states, axis=0)
print('모든 시점의 은닉 상태 :')
print(total_hidden_states)

모든 시점의 은닉 상태 :
[[0.99998491 0.9998602  0.99999718 0.99998758 0.99843477 0.9999498
  0.99997012 0.99998906]
 [0.99999411 0.99988677 0.99999855 0.99998583 0.99827419 0.99994577
  0.99997031 0.99998988]
 [0.99998381 0.99989015 0.9999971  0.99998765 0.99886745 0.99994869
  0.99996718 0.99998153]
 [0.99997325 0.99989941 0.99999625 0.99999015 0.99917252 0.99995628
  0.99997307 0.99997973]
 [0.99999613 0.9999522  0.99999949 0.9999939  0.99934314 0.99998268
  0.99998948 0.99999541]
 [0.9999954  0.99991695 0.99999907 0.99998861 0.99872396 0.99996442
  0.99997596 0.99999191]
 [0.99998886 0.99988512 0.99999726 0.99998878 0.99866778 0.99991844
  0.99998263 0.99998587]
 [0.99999693 0.99995385 0.99999953 0.99999375 0.99931852 0.99997881
  0.9999908  0.9999952 ]
 [0.99998671 0.99990015 0.99999765 0.99998918 0.9989323  0.99995273
  0.99997525 0.99998558]
 [0.99999321 0.9999396  0.99999912 0.99999174 0.99926495 0.99997957
  0.99997894 0.99999153]]


## 깊은 순환 신경망

In [18]:
# 은닉층을 2개 추가하는 경우
model = Sequential()
model.add(SimpleRNN(hidden_units, input_length=10, input_dim=5, return_sequences=True))
model.add(SimpleRNN(hidden_units, return_sequences=True))
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_2 (SimpleRNN)    (None, 10, 8)             112       
                                                                 
 simple_rnn_3 (SimpleRNN)    (None, 10, 8)             136       
                                                                 
Total params: 248
Trainable params: 248
Non-trainable params: 0
_________________________________________________________________


## 양방향 순환 신경망, Bidirectional Recurrent Neural Network

In [19]:
from tensorflow.keras.layers import Bidirectional

timesteps = 10
input_dim = 5

model = Sequential()
model.add(Bidirectional(SimpleRNN(hidden_units, return_sequences=True), input_shape=(timesteps, input_dim)))
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bidirectional (Bidirectiona  (None, 10, 16)           224       
 l)                                                              
                                                                 
Total params: 224
Trainable params: 224
Non-trainable params: 0
_________________________________________________________________


## 깊은 양방향 순환 신경망

In [20]:
from tensorflow.keras.layers import Bidirectional

timesteps = 10
input_dim = 5

# 다른 인공 신경망 모델들도 마찬가지이지만, 은닉층을 무조건 추가한다고 해서 모델의 성능이 좋아지는 것은 아님
# 은닉층을 추가하면 학습할 수 있는 양이 많아지지만 반대로 훈련 데이터 또한 많은 양이 필요 
# 아래의 코드는 은닉층이 4개인 경우
model = Sequential()
model.add(Bidirectional(SimpleRNN(hidden_units, return_sequences=True), input_shape=(timesteps, input_dim)))
model.add(Bidirectional(SimpleRNN(hidden_units, return_sequences=True)))
model.add(Bidirectional(SimpleRNN(hidden_units, return_sequences=True)))
model.add(Bidirectional(SimpleRNN(hidden_units, return_sequences=True)))
model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bidirectional_1 (Bidirectio  (None, 10, 16)           224       
 nal)                                                            
                                                                 
 bidirectional_2 (Bidirectio  (None, 10, 16)           400       
 nal)                                                            
                                                                 
 bidirectional_3 (Bidirectio  (None, 10, 16)           400       
 nal)                                                            
                                                                 
 bidirectional_4 (Bidirectio  (None, 10, 16)           400       
 nal)                                                            
                                                                 
Total params: 1,424
Trainable params: 1,424
Non-traina

### 케라스 GRU

In [21]:
from tensorflow.keras.layers import GRU

model = Sequential()
model.add(GRU(hidden_units, input_shape=(timesteps, input_dim)))
model.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 gru (GRU)                   (None, 8)                 360       
                                                                 
Total params: 360
Trainable params: 360
Non-trainable params: 0
_________________________________________________________________
