SOMADHAN: A Bengali Math Word Problem Dataset

Published: 27 May 2025| Version 1 | DOI: 10.17632/34bs5cxk9j.1
Contributors:
,
,
,
,

Description

SOMADHAN is a Bengali Math Word Problem (MWP) dataset specifically designed to facilitate research in multilingual and low-resource natural language processing tasks, particularly in mathematical reasoning. This dataset consists of 4,000 Bengali complex math word problems, each accompanied by detailed, step-by-step solutions that mimic the logical flow of human problem-solving. The development of SOMADHAN was inspired by and based on the structure of the GSM8K dataset, a widely-used English-language benchmark for mathematical reasoning tasks. To ensure a robust and comprehensive dataset, we utilized the train and test portions of the publicly available GSM8K dataset, which together contain 8,792 English math problems with solutions.

Files

Institutions

Ahsanullah University of Science and Technology

Categories

Natural Language Processing

Licence