Household Socioeconomic and Livelihood Dataset of Fishermen and Non-Fishermen in Chalan Beel, Bangladesh

Published: 26 May 2026| Version 1 | DOI: 10.17632/by8mht79xv.1
Contributor:
Arijit Biswas

Description

This dataset contains household-level survey data collected from fishermen and non-fishermen communities in the Chalan Beel area of Bangladesh. The cleaned dataset includes 282 household observations and 77 questionnaire-based variables. The target/grouping variable is community, with 141 fishermen households and 141 non-fishermen households. The variables cover demographic characteristics, education, employment, income, housing, sanitation, water source, electricity access, women’s household participation, welfare access, livelihood diversification, migration, food security, health, treatment choices, community development role, and social or cultural participation. The dataset is suitable for socioeconomic comparison, livelihood analysis, data-driven community studies, and machine-learning benchmark validation.

Files

Steps to reproduce

The data were collected through a primary household survey conducted among fishermen and non-fishermen communities in the Chalan Beel area of Bangladesh. Two structured questionnaires were used: one for fishermen households and one for non-fishermen households. Both questionnaires followed the same broad structure, covering socioeconomic condition and livelihood diversification. The survey collected household-level information on demographic characteristics, education, employment, income, housing, sanitation, drinking water, electricity, women’s participation, welfare access, livelihood diversification, migration, food scarcity, health, treatment choices, and community participation. After data collection, the responses were entered into a tabular dataset and cleaned using Python. Column names were standardized into machine-readable snake_case format. The final cleaned dataset contains 282 household observations and 77 questionnaire-based variables. The target/grouping variable is community, with two balanced classes: 141 fishermen households and 141 non-fishermen households. Logical missing values in conditional questions were recoded using context-specific categories. For example, missing values in if_no_living_land were coded as “Not applicable,” missing values in alternative_income_type were coded as “No alternative income,” and missing values in loan_source were coded as “No loan.” To reproduce the dataset workflow, download the full repository and open the cleaned dataset from the data/cleaned/ folder. Use the files in the metadata/ folder, especially data_dictionary.xlsx and variable_description.csv, to understand the variables, descriptions, data types, and possible values. The original survey instruments are provided in the questionnaires/ folder. To reproduce the machine-learning benchmark, open code/ml_benchmark_clean_reproducible.ipynb in Jupyter Notebook, JupyterLab, or Google Colab. Install the required Python libraries, including pandas, numpy, scikit-learn, openpyxl, and matplotlib. Run the notebook cells sequentially. The workflow loads the cleaned dataset, checks the dataset shape and class distribution, separates the target variable community, applies preprocessing, and evaluates supervised machine-learning models using stratified cross-validation. The generated outputs can be compared with the benchmark result files provided in the results/ folder. This dataset can also be reused independently for descriptive socioeconomic comparison, livelihood analysis, food security analysis, migration-related analysis, health-related analysis, and machine-learning benchmark validation.

Institutions

Categories

Socialization and Social Development, Sociodemographics, Coastal Fisheries

Licence