Keras implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention which introduces an attention based image caption generator. The model changes its attention to the relevant part of the image while it generates each word.
This figure form the original paper gives a short explanation of the network's structure.
This project depends on the Keras Utility & Layer Collection (kulc), which implements many useful layers and utility functions for attention based models.