Synthetic populations of South African urban areas

Published: 10 October 2022| Version 4 | DOI: 10.17632/dh4gcm7ckb.4
Johan W. Joubert


The data accompanying this article include the compressed, Extensible Markup Language (XML) files of the synthetic populations for the nine areas of importance in South Africa. The provided populations are controlled at the household level using (household) income, and at individual levels using gender and population group. The result provides a complete stock of individuals while accounting for detailed demographic, socioeconomic information, and household structure. The detailed XML Schema Definition (XSD) and XML Document Type Definition (DTD), which contains the declarations that describes the formal acceptable structure of the XML file, is available on More specifically, there is one XSD definition for the household file, `households_v1.0.xsd`, and one DTD file for the individuals, `population_v6.dtd`. The files are normal XML and readable using many parsers. Our choice to use the Multi-Agent Transport Simulation (MATSim) infrastructure is because the populations are, in our context, frequently used for large-scale mesoscopic transport models using the agent-based MATSim. In this version, the populations increased in size to reflect the 2022 population, based on the Statistics South Africa mid-year population estimates. The same extended attributes for both households and persons from v2 and v3 apply. The major difference in this version is that instead of only the nine (9) specific study areas, we create complete populations for all nine (9) provinces of the country. Consequently, this version covers the entire country at the sub-place level.


Steps to reproduce

The detailed description of the steps to reproduce is published in the Data-in-Brief article with the same title as this data set.


University of Pretoria


Population, Extensible Markup Language, Synthesis, Agent-Based Modeling, South Africa, Household