Data for: The Environmental Cost of Rapid Urbanization: Residential Construction and Lake Depletion in Astana, Kazakhstan

Published: 14 January 2026| Version 1 | DOI: 10.17632/5hbrh7vrhn.1
Contributor:
Serik Iskakov

Description

RESEARCH HYPOTHESIS: Tests whether rapid urbanization in Astana, Kazakhstan contributed to Bolshoy Taldykol Lake depletion through groundwater extraction and land use changes. Hypothesis predicts increased construction activity correlates with decreased lake surface area when controlling for climatic variables. WHAT THE DATA SHOWS: Dataset combines satellite-derived lake surface area, residential construction statistics, and weather data for Astana (2015-2024). Three sources: (1) Lake surface area from Landsat 8/9 imagery using Modified Normalized Difference Water Index (MNDWI); (2) Monthly residential construction data; (3) Weather indicators. Analysis reveals 70% lake reduction from 2017 peak (0.856 km²) to 2024 (0.260 km²), coinciding with construction boom. DATA COLLECTION: Lake Surface Area: Google Earth Engine with Landsat 8/9. Processing: filter cloud cover <15%, create summer composites (June-September), calculate MNDWI=(Green-SWIR1)/(Green+SWIR1), apply threshold MNDWI>0, export GeoTIFF 30m resolution for 2015, 2017, 2020, 2022, 2024. Collection 2 Level-2 surface reflectance, scaling: multiply 0.0000275, add -0.2. Region: lake plus 500m buffer. JavaScript code provided. Surface Area Calculation: Python with rasterio loads water masks, sums water pixels (value=1), multiplies by pixel area (900 m²), converts to km². Visualization includes temporal grids, before/after comparisons, change detection. Python code provided (Google Colab compatible). Weather Data (March 2001-December 2024): Weather Underground (wunderground.com) Astana/UACC station via Selenium scraper. Monthly: max/avg/min temperatures, dew point, precipitation (with sum), snow depth (with sum), wind, gust wind, sea level pressure. Python script with retry logic, checkpoints. 285 monthly observations. Residential Construction (January 2015-December 2024): Bureau of National Statistics Kazakhstan (taldau.stat.gov.kz). "Total area of commissioned residential buildings" in m². Monthly completed construction. Manually extracted from "Investment and Construction Statistics / Commissioning of Housing / Residential Buildings Information". 120 monthly observations. INTERPRETATION: Water masks processed using provided Python code (rasterio) or GIS software. Each pixel=900 m². Water detected where MNDWI>0. Weather variables control for climate (precipitation inputs, evaporation via temperature/wind/dew point). Construction represents urbanization pressure on groundwater. Account for seasonal lake variations. Limitations: (1) Summer satellite data only; (2) No direct groundwater measurements; (3) Completed buildings only; (4) Missing water policy data. Applications: environmental impact assessment, urban planning, water management, climate adaptation, sustainable development. Files: Water mask GeoTIFFs (5 years), RGB composites (5 years), weather CSV (285 months), construction CSV (120 months), JavaScript code (GEE), Python analysis code.

Files

Steps to reproduce

1. LAKE SURFACE AREA DATA COLLECTION Platform: Google Earth Engine (earthengine.google.com). Data sources: Landsat 8 (LANDSAT/LC08/C02/T1_L2, 2013-present) and Landsat 9 (LANDSAT/LC09/C02/T1_L2, 2021-present). Processing: Filter by lake location and cloud cover under 15%. Select bands SR_B3 (Green), SR_B4 (Red), SR_B5 (NIR), SR_B6 (SWIR1). Apply scaling: multiply 0.0000275, add -0.2. Create June-September composites using median for 2015, 2017, 2020, 2022, 2024. Calculate MNDWI = (Green minus SWIR1) / (Green plus SWIR1). Water mask: MNDWI greater than 0. Export RGB composite, water mask, MNDWI per year at 30m resolution, EPSG:4326, GeoTIFF, lake plus 500m buffer. JavaScript code provided. Files saved to folder "Taldykol_Lake_Analysis". Time: 10-15 min/year. 2. SURFACE AREA CALCULATION Software: Python 3.7+ with rasterio, matplotlib, pandas. Process: Load water mask GeoTIFFs from Taldykol_Lake_Analysis folder. Count pixels where value = 1. Calculate area: water pixels times 900 square meters / 1,000,000 = square kilometers. Generate visualizations: temporal grids, before/after comparisons, time series, change detection. Save to Output folder. Export CSV with year, area, percentage. Python script provided. Time: 5 min. 3. WEATHER DATA COLLECTION Source: Weather Underground (wunderground.com/history/monthly/kz/astana/UACC) for Astana/UACC station. Period: March 2001-December 2024 (285 months). Software: Python 3.7+ with selenium, pandas, webdriver_manager, psutil. Process: Selenium WebDriver automates Chrome scraping. Monthly data: max/avg/min temperatures, dew point, wind, gust wind, sea level pressure, precipitation (with sum), snow depth (with sum). Checkpoint saves every 5 months. Batch CSV files combined. Auto restart every 20 months. Python script provided. Time: 4-6 hrs. Output: CSV, 285 records. 4. CONSTRUCTION DATA COLLECTION Source: Bureau of National Statistics Kazakhstan (taldau.stat.gov.kz/ru/NewIndex/GetIndex/701925?regionId=268012&periodId=7). Variable: Total area commissioned residential buildings (square meters). Category: Investment and Construction Statistics, Commissioning of Housing, Residential Buildings. Period: January 2015-December 2024 (120 months). 5. DATA INTEGRATION Temporal alignment: Lake area 5 annual observations, weather 285 monthly, construction 120 monthly. Options: aggregate weather and construction to annual or interpolate lake area monthly. Software: Python (pandas), R, or Stata. Analysis: correlate construction with lake depletion controlling for climate variables. Time: GEE exports 10-15 min/year, Python 5 min, weather scraping 4-6 hrs, statistics under 5 min. Dataset includes Output folder containing Taldykol_Lake_Analysis subfolder with GeoTIFF files (water masks, RGB composites for 5 years), analysis visualizations, statistics CSV, weather CSV, construction CSV, JavaScript and Python code.

Categories

Environmental Economics, Spatial Analysis, Central Asia, Urbanization, Kazakhstan, Environmental Economics of Transitional Economy, Urban Economics of Transitional Economy

Licence