Singapore “housing-index” dataset - building-level marker of socioeconomic status

Published: 23 March 2021| Version 1 | DOI: 10.17632/pj3dkvskr9.1
Ting Hway Wong,
Pin Pin Pek,
Shao Wei Sean Lam,
Fuh Yong Wong,
Ru Xin Wong,
Daniel Yan Zheng Lim,
Marcus Eng Hock Ong,
Andrew Fuwah Ho


Socioeconomic status (SES) is an important determinant of health that is of interest to policy makers, researchers and other data users. In particular, researchers in health services research and epidemiological studies require SES data to control for confounders. In our previous studies, we used housing type as a surrogate marker for SES and investigated its influence on patients with head and neck cancer, and breast cancer. Here, we describe the housing-index dataset that was constructed for the purpose of our research. To construct the dataset, we merged datasets from the Singapore Land Authority (purchased) and Housing Development Board (public domain). This dataset contains all postal codes in Singapore. Of the residential postal codes, we derived a housing-index which is the mean number of rooms per apartment in the building (codes 1-5). Private residential properties were assigned codes 6-7. The higher the code, the higher the SES. The housing-index dataset was created in 2016 and updated in 2020. This housing-index dataset can be readily linked to existing population and clinical datasets in Singapore using patient's residential address postal code. The added “housing-index” data field serves as a block-level SES measure.


Steps to reproduce

Singapore’s land use master plan data (2016) consisting of all postal codes in Singapore and its corresponding latitude/longitude coordinates, street addresses, building type, postal district and region, were purchased from the Singapore Land Authority. Public housing information from the Housing Development Board was obtained from open source website: This consists of street addresses and composition of unit types in terms of number of rooms and the number of such unit types in each block, and year completed. Property information of residential building postal codes that were missing from our master list were retrieved manually using a triangulation of information from One Map, Google Map, and commercial property websites. Microsoft Excel was used to merge the datasets using street address as the merging variable.


Epidemiology, Demography Related to Public Health, Health Services Research, Socioeconomic Factor in Health