Dataset supporting: Gender budgeting, school infrastructure, and girls' schooling participation: Evidence from the Indian states
Description
The datasets used in this study provide information on school infrastructure, available resources, teachers, and educational participation variables. The data has been sourced from two major repositories to analyze the impact of gender budgeting policy changes on school infrastructure, specifically facilities for girls, such as separate toilets, and their influence on schooling participation in India. The first dataset, the District Information System for Education (DISE), is primarily derived from the National University of Educational Planning and Administration (NUEPA) state report cards. This dataset covers the period from 2004 to 2017 for all states in India and serves as the primary source of information on school infrastructure and facilities. The second dataset, the India Human Development Survey (IHDS), is available in two waves and provides data on various aspects of human development. This study focused on the primary school facility survey data from the IHDS. These datasets played a crucial role in assessing the impact of gender budgeting on school infrastructure and educational outcomes for girls in India. Our findings indicate that gender budgeting has had a positive impact on outcomes. Additionally, state education expenditure data have been used to understand the spending patterns of state governments, which have been obtained from the RBI.
Files
Steps to reproduce
The study’s data preparation began with the DISE dataset, which included annual state report cards from 2003-04 to 2016-17 in raw Excel files with complex structures. The first step involved separating and clearly labeling columns under each school level to standardize the data across states and years. The 2003-04 academic year was relabeled as 2004 for consistency. Variables were aggregated across school levels to create total state-level indicators. Units were harmonized by converting percentages to absolute numbers where needed. The dataset was divided into three subsamples based on reporting formats (2004–06, 2007–12, 2013–17), each cleaned and standardized separately before being combined into a continuous panel. Telangana was excluded due to limited data availability. Variables were renamed for ease of analysis, and the cleaned data were imported into Stata for further processing, such as appending all three Excel files, creating an event year, and keeping only relevant variables for analysis. Three infrastructure-related variables have been mainly used in the study. For the IHDS dataset, variable names and codes were harmonized across two survey waves, with special codes for non-response recoded as missing (.). Unique school identifiers were created by concatenating geographic and school codes to link data across waves. The dataset was filtered to retain only schools appearing in both waves, and the missing values have been removed from both waves separately to get the balanced panel. A treatment-period dummy was created to distinguish pre- and post-policy observations, as well as a wave identifying variable. Odisha was excluded due to policy timing conflicts, and boys-only schools were removed to focus on girls and co-educational schools. States with late policy implementation on or after 2012 were classified as controls. State education expenditure data has been used, as it was provided by the RBI. We plotted that over time for different states. Throughout, data cleaning, harmonization, and variable construction were carefully performed to ensure consistency and compatibility, enabling reliable merging and panel data analysis.
Institutions
- Indian Institute of Technology Kanpur