[TensorFlow] Attention Mechanisms 예제

17 Aug 2023

tensorflow

Attention 메커니즘은 주로 시퀀스 데이터에서 특정 위치에 가중치를 부여하여 중요한 정보에 집중하는 데 사용됩니다. 아래는 간단한 Attention 메커니즘을 구현한 예제입니다. 이 예제는 시퀀스 데이터에서 각 시퀀스 단계에 대한 가중치를 계산하고, 이 가중치를 사용하여 출력을 생성하는 방식을 보여줍니다.

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, LSTM, Attention, Concatenate, Bidirectional
from tensorflow.keras.models import Model

## 더미 시퀀스 데이터 생성
dummy_sequence = tf.random.normal(shape=(4, 10, 32))  # 4개의 시퀀스, 각 시퀀스 길이: 10, 특성 개수: 32

## 입력 레이어 정의
input_layer = Input(shape=(10, 32))

## Bidirectional LSTM 레이어 정의
bi_lstm_layer = Bidirectional(LSTM(64, return_sequences=True))(input_layer)

## Attention 레이어 정의
attention = Attention()([bi_lstm_layer, bi_lstm_layer])

## 가중 평균 계산
weighted_sum = tf.reduce_sum(attention * bi_lstm_layer, axis=1)

## 밀집 레이어 정의
output_layer = Dense(10, activation='softmax')(weighted_sum)

## 모델 생성
model = Model(inputs=input_layer, outputs=output_layer)

## 모델 컴파일
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

## 모델 요약
model.summary()

## 모델 훈련
model.fit(dummy_sequence, [0, 1, 2, 3], epochs=5)` 

위 예제에서는 Bidirectional LSTM 레이어를 사용하여 양방향으로 시퀀스를 처리한 후, Attention 레이어를 통해 가중치를 계산합니다. 그리고 가중 평균을 계산하여 출력을 생성합니다.

실행 결과 예시:

Input shape: (4, 10, 32)
Output shape: (4, 10, 64)` 

위에서 확인할 수 있듯이, Attention 메커니즘을 사용하여 시퀀스 데이터의 가중치를 계산하고, 가중 평균을 통해 각 시퀀스 단계의 중요한 정보를 추출하는 모델을 구성했습니다.