# Keras 임베딩 계층 사용 실습

이 노트북은 아래 문서에 실려 있는 예제를 실습하면서 작성한 것입니다.

* [How to Use Word Embedding Layers for Deep Learning with Keras
](https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/)

## 데이터 준비

In [1]:
import numpy as np

# define documents
docs = ['Well done!',
    'Good work',
    'Great effort',
    'nice work',
    'Excellent!',
    'Weak',
    'Poor effort!',
    'not good',
    'poor work',
    'Could have done better.']

# define class labels
labels = np.array([1,1,1,1,1,0,0,0,0,0])

## 임베딩 학습하기

In [2]:
from tensorflow.keras.preprocessing.text import one_hot

# integer encode the documents
vocab_size = 50
encoded_docs = [one_hot(d, vocab_size) for d in docs]
print(encoded_docs)

2024-03-23 14:25:44.264902: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-23 14:25:44.266256: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-23 14:25:44.286760: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-23 14:25:44.286778: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-23 14:25:44.287335: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to

[[32, 48], [19, 5], [23, 46], [11, 5], [9], [37], [19, 46], [13, 19], [19, 5], [29, 8, 48, 30]]


다음은 one_hot() 함수를 사용하여 단어를 인코딩할 때 충돌이 발생하는 경우입니다.

In [3]:
print(one_hot('Well better', vocab_size))
print(one_hot('Good Great good', vocab_size))
print(one_hot('not have', vocab_size))

[32, 30]
[19, 23, 19]
[13, 8]


In [4]:
from tensorflow.keras.utils import pad_sequences

# pad documents to a max length of 4 words
max_length = 4
padded_docs = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
print(padded_docs)

[[32 48  0  0]
 [19  5  0  0]
 [23 46  0  0]
 [11  5  0  0]
 [ 9  0  0  0]
 [37  0  0  0]
 [19 46  0  0]
 [13 19  0  0]
 [19  5  0  0]
 [29  8 48 30]]


In [5]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Flatten, Dense

# define the model
model = Sequential()
model.add(Embedding(vocab_size, 8, input_length=max_length))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# summarize the model
print(model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 4, 8)              400       
                                                                 
 flatten (Flatten)           (None, 32)                0         
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
Total params: 433 (1.69 KB)
Trainable params: 433 (1.69 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None


In [6]:
# fit the model
model.fit(padded_docs, labels, epochs=50, verbose=0)

# evaluate the model
loss, accuracy = model.evaluate(padded_docs, labels, verbose=0)
print('Accuracy: %f' % (accuracy*100))

Accuracy: 89.999998


## 사전 학습된 GloVe 임베딩 사용하기