BanglaView: A Bangla Image Captioning Dataset

Published: 27 December 2023| Version 1 | DOI: 10.17632/rrv8pbxrxv.1


Image description is a more crucial part of the artificial intelligence world because it covers two important sectors, which are computer vision and natural language processing. As the proficiency of computers in interpreting visual data and converting it into textual forms has advanced, there has been a notable surge in research enthusiasm for endeavors such as automated image description in recent years. Although the majority of research efforts are concentrated on the English language within monolingual contexts, less attention has been directed towards resource-constrained languages like Bangla. This oversight primarily stems from the absence of standardized datasets for such languages. To overcome the limited availability of Bangla image captioning data, we propose BanglaView, a novel dataset inspired by Flickr30k. Here, English captions are translated into Bangla using Google Translate and refined by expert annotators. The BanglaView dataset contains a total of 31,783 images with 1,58,915 sentences. The total words, unique words, sentence length mean, and sentence length variance of the BanglaView dataset are 16,68,322, 26,348, 10.49, and 20.82. Each image in these datasets comes with five accompanying descriptions. Native Bangla speakers who are fluent in both Bangla and English painstakingly created the annotations. To maintain high standards, a team of professionals performed review and post-processing to ensure the dataset's quality.



Pabna University of Science and Technology


Artificial Intelligence, Computer Vision, Natural Language Processing, Machine Learning, Bengali Language, Deep Learning, Image Analysis