Ghost City Occupancy Index (GCOI) Dataset: Vietnamese New Towns and HCMC Residential Projects, 2010–2024

Published: 30 April 2026| Version 2 | DOI: 10.17632/rffz276tnc.2
Contributors:
, Joolim Lee

Description

This dataset supports the empirical analyses reported in "Reverse Death Spiral of Supply-Led Ghost Cities: Formalising and Testing the Structural Failure Mechanism of Vietnamese New Towns" (submitted to Urban Geography). The dataset comprises three components. First, GCOI (Ghost City Occupancy Index) scores and component indicators for five Hanoi-area new towns (Vinhomes Ocean Park, Ecopark, Nam An Khanh, Thu Duc City, Binh Duong New City), including VIIRS nighttime light-based occupancy rate estimates (NTL-OCR, blooming-corrected), electricity consumption-based occupancy rate estimates (OCR_util), and commercial facility opening rates, covering 2010–2024. Second, a project-level monitoring dataset of 68 newly launched residential projects in the Ho Chi Minh City metropolitan area (2022–2024), including presale rates, occupancy conversion rates (OCR), speculative occupancy gaps (SOG), primary and secondary market prices, and product type classifications. Third, Nam An Khanh system dynamics calibration outputs, including observed vs. model-predicted OCR trajectories (2010–2024) and counterfactual simulation results. VIIRS nighttime light data were processed using Google Earth Engine (GEE); processing code is provided in the accompanying appendix. Primary market data were sourced from the National Housing Organization (NHO) internal project monitoring database, cross-referenced with Batdongsan.com.vn and CBRE Vietnam (2022). All data are reported at the project or new town level; no individual-level personal data are included.

Files

Steps to reproduce

1. VIIRS Nighttime Light Data Processing Download NASA VIIRS DNB VNP46A2 BRDF-corrected monthly composites (500 m resolution) from NASA's LAADS DAAC (https://ladsweb.modaps.eosdis.nasa.gov). Process imagery using Google Earth Engine (GEE); full processing code is provided in Online Appendix A. Define area-of-interest (AOI) polygons for each new town. Retain only dry-season observations (November–April, excluding Tet) with "confident clear" or "probably clear" cloud quality flags. Apply three-step blooming correction: (1) subtract median NTL values from a 500 m outer buffer; (2) apply a 3×3 median spatial filter; (3) classify pixels with DN < 0.5 nW/cm²/sr as unoccupied. Extract mean radiance per AOI to derive NTL-OCR estimates. 2. Electricity Consumption Data (OCR_util) Obtain district-level electricity consumption data from Vietnam Electricity (EVN) for 2020–2023. Normalise consumption relative to full-occupancy baseline (planned units × average household consumption) to derive OCR_util estimates. 3. Commercial Facility Opening Rate (CI_comm) Conduct field surveys and cross-validate with Google Earth Pro high-resolution imagery to confirm operating status of commercial facilities within each new town. Divide actual operating floor area (m²) by planned commercial floor area from master plans. 4. GCOI Calculation Apply PCA to the three component variables (NTL-OCR, OCR_util, CI_comm) across 20 observations (5 new towns × 4 years, 2020–2023) to derive weights (w₁ = 0.359, w₂ = 0.340, w₃ = 0.300). Compute GCOI = 1 − [w₁·NTL-OCR + w₂·OCR_util + w₃·CI_comm]. Verify robustness across 10 alternative weight specifications (Table A). 5. HCMC Project Monitoring Dataset (N = 68) Compile project-level data from the NHO internal database, cross-referenced with sales office inquiries, Batdongsan.com.vn, Mogi.vn, and CBRE Vietnam (2022) and JLL Vietnam (2023) research reports. Directly confirm OCR for 42 of 68 projects via telephone or management office disclosure; supplement remaining 26 using comparable project matching and NTL pattern analysis. Obtain secondary market prices for 66 projects; supplement 31 via OLS estimation. 6. System Dynamics Calibration Implement the four-equation reverse spiral model in Vensim PLE. Calibrate parameters (λ, μ, φ, γ) against Nam An Khanh's observed NTL-OCR trajectory (2010–2024). Validate fit using RMSE. Run counterfactual simulation assuming public facility opening rate of 30% achieved by 2012. 7. Statistical Analyses Conduct Hansen (2000) threshold regression, quasi-DiD comparison, Spearman rank correlation, Welch t-tests, OLS regression with HC3 robust standard errors, and ANOVA using standard statistical software (R or Stata). Bootstrap confidence intervals (B = 5,000) for SOG regression coefficients.

Institutions

Categories

Urban Economics, Urban Development

Licence