BaitBuster-Bangla: A Comprehensive Dataset for Clickbait Detection in Bangla with Multi-Feature and Multi-Modal Analysis

Published: 25 October 2023| Version 2 | DOI: 10.17632/3c6ztw5nft.2
Abdullah Al Imran, Md Sakib Hossain Shovon, Firoz Mridha


This dataset is a multi-feature and multi-modal dataset for Bangla clickbait detection in video sharing platforms. The dataset is collected from YouTube using its official public API with the objective of classifying clickbait content in the Bangla language. The dataset consists of 253,070 entries with 18 columns covering a curated list of 28 Not Clickbait, and 26 Clickbait Bangla youtube channels. The dataset provides valuable information for studying clickbait content and includes various metadata related to the videos, user engagement statistics, and labels. The dataset has been labeled in three different strategies: i) pre-defined auto labels, ii) labels by human annotator, and iii) labels by fine-tuned AI model. However, human labels are are available for 10000 entries. The dataset is available in three different formats: xlsx, csv, and parquet.



Artificial Intelligence, Social Media, Data Science, Natural Language Processing, Machine Learning, Supervised Learning, Multimodality