BanglaCalamityMMD: A Comprehensive Benchmark Dataset for Multimodal Disaster Identification in the Low-Resource Bangla Language

Published: 7 August 2024| Version 1 | DOI: 10.17632/7dggbjn5sd.1
Contributors:
Fatema Tuj Johora Faria,
,
,
,
,
,

Description

The BanglaCalamityMMD dataset is a comprehensive multimodal resource designed to address the significant gap in disaster identification within Bangla language text. Comprising a total of 7,903 instances spanning eight distinct categories: Landslides, Wildfire, Tropical Storm, Drought, Flood, Earthquake, Human Damage, and Non-Disaster—the dataset is meticulously divided into three subsets: 6,323 instances for training, 790 instances for testing, and 790 instances for validation. This structured division ensures that models can be trained effectively, tested rigorously, and validated accurately, thereby enhancing the overall reliability and applicability of disaster identification systems in Bangla. Here is the dataset description for various disaster categories: Category Train Test Validation Total ============================================ Earthquake 800 100 100 1000 Flood 800 100 100 1000 Landslides 803 100 100 1003 Wildfires 720 90 90 900 Tropical Storms 800 100 100 1000 Droughts 800 100 100 1000 Human Damage 800 100 100 1000 Non-Disaster 800 100 100 1000 ============================================= Total 6323 790 790 7903

Files

Institutions

Ahsanullah University of Science and Technology

Categories

Social Media, Bengali Language, Multimodality, Natural Disaster, Deep Learning

Licence