MultiBanFakeDetect: An Extensive Benchmark Dataset for Multimodal Under-Resource Bangla Fake News Detection

Published: 8 August 2024| Version 1 | DOI: 10.17632/k5pbz9795f.1
Contributors:
,
,
,
,

Description

The MultiBanFakeDetect dataset consists of a total of 9,600 text-image instances from online forums, news websites, and social media. Covering political, social, Technology, and entertainment themes, the dataset offers balanced real and fake instances. The data distribution within the labels includes 7,680 instances for training, 960 instances for testing, and 960 instances for validation. Statistical Overview of Text-Image Pair Data Across Different Types of Fake News ========================================= Types Training Testing Validation ========================================= Misinformation 1288 161 162 Rumor 1215 152 151 Clickbait 1337 167 167 Non-fake 3840 480 480 ========================================= Total 7680 960 960 Distribution of Text-Image Pair Data within Labels ===================================== Label Training Testing Validation ===================================== 1 (Fake) 3840 480 480 0 (Non-Fake) 3840 480 480 ===================================== Total 7680 960 960 Statistical Overview of Text-Image Pair Data Across Different Categories of Fake News ========================================= Category Training Testing Validation ========================================= Entertainment 640 80 80 Sports 640 80 80 Technology 640 80 80 National 640 80 80 Lifestyle 640 80 80 Politics 640 80 80 Education 640 80 80 International 640 80 80 Crime 640 80 80 Finance 640 80 80 Business 640 80 80 Miscellaneous 640 80 80 ========================================= Total 7680 960 960

Files

Institutions

Ahsanullah University of Science and Technology

Categories

Social Media, Image Classification, Bengali Language, Multimodality, Deep Learning

Licence