BaitBuster-Bangla: A Comprehensive Dataset for Clickbait Detection in Bangla with Multi-Feature and Multi-Modal Analysis
Description
This dataset is a multi-feature and multi-modal dataset for Bangla clickbait detection in video sharing platforms. The dataset is collected from YouTube using its official public API with the objective of classifying clickbait content in the Bangla language. The dataset consists of 253,070 entries with 18 columns covering a curated list of 28 Not Clickbait, and 26 Clickbait Bangla youtube channels. The dataset provides valuable information for studying clickbait content and includes various metadata related to the videos, user engagement statistics, and labels. The dataset has been labeled in three different strategies: i) pre-defined auto labels, ii) labels by human annotator, and iii) labels by fine-tuned AI model. However, human labels are are available for 10000 entries. The dataset is available in three different formats: xlsx, csv, and parquet.