Attn-CapBN: An Attention-based Bangla Image Captioning using the Bangla View Dataset

Published in Intelligent Systems with Applications (Under Review; Elsevier-Q1), 2025

Understanding images through natural language descriptions is a fundamental challenge in artificial intelligence, bridging computer vision and natural language processing. While significant progress has been made in English image captioning, resource-scarce languages such as Bangla remain underexplored. To address this gap, this study introduces BanglaView, a large-scale Bangla image captioning dataset inspired by Flickr30k, consisting of 31,783 images paired with 158,915 professionally verified Bangla captions. Each image is annotated with five diverse and fluent descriptions, ensuring both quality and linguistic richness. Building on this resource, we propose Attn-CapBN, an encoder–decoder framework that employs a custom-designed CNN feature extractor and a visual attention mechanism with a GRU decoder to generate contextually coherent Bangla captions. Rigorous training and evaluation were conducted using both BanglaView and the BAN-Cap dataset. Experimental results demonstrate that Attn-CapBN achieves strong performance, with BLEU-1 to BLEU-4 scores of 0.623, 0.487, 0.394, and 0.333 on BanglaView, and 0.620, 0.485, 0.398, and 0.332 on BAN-Cap, respectively. These results surpass existing baselines, highlighting the effectiveness of the proposed CNN-based feature extractor and attention-guided decoding. The contributions of this work include the release of BanglaView as a benchmark dataset and the introduction of Attn-CapBN as a robust architecture for Bangla image captioning, paving the way for further advancements in regional language captioning and multimodal AI research.

Recommended citation: Md Anwar Hossain, Mirza AFM Rashidul Hasan, Sajeeb Kumar Ray, and Naima Islam, Attn-CapBN: An Attention-based Bangla Image Captioning using the Bangla View Dataset. Available at SSRN: https://ssrn.com/abstract=5708615 or http://dx.doi.org/10.2139/ssrn.5708615
Download Paper