Synthetic populations of South African urban areas

Published: 24 June 2021| Version 2 | DOI: 10.17632/dh4gcm7ckb.2
Johan W. Joubert


The data accompanying this article include the compressed, Extensible Markup Language (XML) files of the synthetic populations for the nine areas of importance in South Africa. The provided populations are controlled at the household level using (household) income, and at individual levels using gender and population group. The result provides a complete stock of individuals while accounting for detailed demographic, socioeconomic information, and household structure. The detailed XML Schema Definition (XSD) and XML Document Type Definition (DTD), which contains the declarations that describes the formal acceptable structure of the XML file, is available on More specifically, there is one XSD definition for the household file, `households_v1.0.xsd`, and one DTD file for the individuals, `population_v6.dtd`. The files are normal XML and readable using many parsers. Our choice to use the Multi-Agent Transport Simulation (MATSim) infrastructure is because the populations are, in our context, frequently used for large-scale mesoscopic transport models using the agent-based MATSim. This version has a number of updates. Firstly, the populations increased in size to reflect the 2019 population, based on the Statistics South Africa mid-year population estimates. Secondly, we add three household attributes taken from census data, namely the household's access to a working car (`carAccess`), access to piped water (`pipedWater`) and the type of ablution facility accessible to the household (`toilet`). The first attribute is useful for later estimation of travel behaviour, while the latter two attributes were introduced as a proxy for housing quality in land use transport interaction (LUTI) models. Thirdly, we added a single person-specific attribute: the individual's income (`personIncome`).


Steps to reproduce

The detailed description of the steps to reproduce is published in the Data-in-Brief article with the same title as this data set.


Population, Extensible Markup Language, Synthesis, Agent-Based Modeling, South Africa, Household