Housing submarket segmentation house price dataset. A typology-based case study in Madrid

Published: 20 March 2023| Version 1 | DOI: 10.17632/ksypsm2zh3.1
David Blanco, Juan Ramón Selva-Royo


This dataset contains a series of house sales prices metrics at neighborhood level for the city of Madrid, from the Spanish real estate listings portal www.idealista.com. The objective of the related research is to demonstrate that a segmentation of administrative divisions improve the significance of price parameters, such as average price. We defined 8 typomorphologic using the K-Means algorithm: Compact fabric (historic center) Consolidated fabric (old neighborhoods) Compact block (expansion) Open block (new extensions) Open building 1 (compact) Open Building 2 (Fine Grain) Single-family home 1 (medium intensity) Single-family home 2 (low intensity) Data is delivered at neighborhood level, with the following variable set: DISTRICTNAME: District name NEIGHBORHOODNAME: Neighborhood name CLUSTER: Interquartile name N: Number of ads in the stratum AVG_UNITPRICE: Average unit price euros/square meters (geometric mean) Q1_UNITPRICE: 1st quartile price eur/sqm Q3_UNITPRICE: 3rd quartile price eur/sqm IQR: Interquartile range CV: Coefficient of variation of the stratum We include data split in typomorphological groups or at neighborhood level (when CLUSTER contains the literal 'Complete neighborhood'). Polygons are also included in EPSG: 4326. Data corresponds to ad published during the year 2018.


Steps to reproduce

Data is obtained by averaging prices by neighborhood and typomorphological clusters. Each cluster is obtained by developing a clustering using urban features.


Universidad Nacional de Educacion a Distancia - Campus de Senda del Rey


Real Estate Economics, Real Estate Market, Real Estate Sector