IMDB 영화평데이터 > 감성분류를 위한 트랜스포머 아키텍처 모델 구축

1. 정수토큰 시퀀스(길이80)입력
2. 토큰 임베딩 + 위치임베딩
3. 멀티헤드어텐션
4. concate + 정규화 
5. FFN(Dense + Dense)
6. concate + 정규화 
7. 분류기

In [17]:
# 정수토큰 시퀀스(길이80)입력

In [18]:
import tensorflow as tf
from tensorflow.keras import Model, layers

In [19]:
# 토큰 임베딩
inputs = layers.Input(shape = (80,))
input_embedding = layers.Embedding(input_dim=1000, output_dim=32)(inputs)

In [20]:
# 위치 임베딩
positions = tf.range(start=0, limit=80)
pos_embedding = layers.Embedding(input_dim=80, output_dim=32)(positions)
pos_enc_output = pos_embedding + input_embedding

In [21]:
# 3. 멀티헤드어텐션 3헤드

In [22]:
attention_ouput = layers.MultiHeadAttention(num_heads=3, key_dim=32)(pos_enc_output, pos_enc_output) # K,V


In [23]:
# 4. concate + 정규화

In [24]:
x = layers.add([pos_enc_output, attention_ouput])
x = layers.BatchNormalization()(x)

In [25]:
# 5. FFN(Dense + Dense)
# 6. concate + 정규화 

In [26]:
from tensorflow.keras.models import Sequential
ffnn = Sequential(
    [
        layers.Dense(64,activation='relu'),
        layers.Dense(32,activation='relu')
    ]
)(x)
x = layers.add([ffnn, x])
layers.BatchNormalization()(x)

<KerasTensor shape=(None, 80, 32), dtype=float32, sparse=False, ragged=False, name=keras_tensor_33>

In [27]:
# 7. 분류기 (Dense))

In [28]:
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(64, activation='relu')(x)
x = layers.Dropout(0.1)(x)
outputs = layers.Dense(2, activation='softmax')(x)

In [29]:
# 모델 구성

In [30]:
model = Model(inputs=inputs, outputs=outputs)
model.summary()

In [32]:
# 손실함수와 옵디마이저 지정
model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [33]:
# imdb data load

In [34]:
from tensorflow.keras.datasets import imdb
(X_train, y_train),(X_test,y_test) = imdb.load_data(num_words=10000)
(X_train, y_train),(X_test,y_test)

((array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),
         list([1, 194, 1153, 194, 8255, 78, 

In [35]:
# 텍스트데이터 전처리 - 데이터 패딩 (길이 통일)
from tensorflow.keras.preprocessing.sequence import pad_sequences
x_train_padd= pad_sequences(X_train, maxlen=80, padding ='post', truncating = 'post')
x_test_padd= pad_sequences(X_test, maxlen=80, padding ='post', truncating = 'post')

In [36]:
model.fit(x_train_padd, y_train,
                        epochs=10,
                        batch_size=200,
                        )

Epoch 1/10


2025-09-05 12:30:59.843924: I external/local_xla/xla/service/service.cc:163] XLA service 0x7ca3a4019160 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2025-09-05 12:30:59.843954: I external/local_xla/xla/service/service.cc:171]   StreamExecutor device (0): NVIDIA GeForce RTX 4060 Laptop GPU, Compute Capability 8.9
2025-09-05 12:30:59.905927: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2025-09-05 12:31:00.212873: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:473] Loaded cuDNN version 91200
2025-09-05 12:31:00.391489: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.
2025-09-05 12:31:00.

[1m  7/125[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m2s[0m 23ms/step - accuracy: 0.5319 - loss: 0.6950

I0000 00:00:1757043068.021545   21152 device_compiler.h:196] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 7ms/step - accuracy: 0.7180 - loss: 0.5345
Epoch 2/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.7853 - loss: 0.4563
Epoch 3/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7906 - loss: 0.4449
Epoch 4/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7928 - loss: 0.4397
Epoch 5/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7967 - loss: 0.4308
Epoch 6/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.7980 - loss: 0.4249
Epoch 7/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8021 - loss: 0.4188
Epoch 8/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - accuracy: 0.8036 - loss: 0.4144
Epoch 9/10
[1m125/125[0m [32m━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x7ca3f07b80d0>

In [37]:
model.evaluate(x_test_padd, y_test)

2025-09-05 12:31:34.636903: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.
2025-09-05 12:31:34.636963: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.
2025-09-05 12:31:34.637012: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.








[1m756/782[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 2ms/step - accuracy: 0.7453 - loss: 0.5324

2025-09-05 12:31:38.557529: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.
2025-09-05 12:31:38.557586: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.
2025-09-05 12:31:38.557630: I external/local_xla/xla/service/gpu/autotuning/dot_search_space.cc:208] All configs were filtered out because none of them sufficiently match the hints. Maybe the hints set does not contain a good representative set of valid configs? Working around this by using the full hints set instead.








[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 5ms/step - accuracy: 0.7461 - loss: 0.5310


[0.5310009121894836, 0.7461199760437012]

125/125 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8141 - loss: 0.4019

In [38]:
import numpy as np
pred = model.predict(x_test_padd)
pred = np.argmax(pred, axis=1)

[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step


In [39]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, pred)

array([[ 7277,  5223],
       [ 1124, 11376]])