Python code for the estimation of missing prices in real-estate market with a dataset of house prices from the center of Teruel city

Published: 17 September 2017| Version 1 | DOI: 10.17632/mxpgf54czz.1
Contributors:
Iván García-Magariño,
,

Description

This research data file contains the necessary software and the dataset for estimating the missing prices of house units. This approach combines several machine learning techniques (linear regression, support vector regression, the k-nearest neighbors and a multi-layer perceptron neural network) with several dimensionality reduction techniques (non-negative factorization, recursive feature elimination and feature selection with a variance threshold). It includes the input dataset formed with the available house prices in the center of Teruel city (Spain) in December 30, 2016 from Idealista website. This dataset supports the research of the authors in the improvement of the setup of agent-based simulations about real-estate market. The work about this dataset has been submitted for consideration for publication to a scientific journal. The open source python code is composed of all the files with the “.py” extension. The main program can be executed from the “main.py” file. The “boxplotErrors.eps” is a chart generated from the execution of the code, and compares the results of the different combinations of machine learning techniques and dimensionality reduction methods. The dataset is in the “data” folder. The input raw data of the house prices are in the “dataRaw.csv” file. These were shuffled into the “dataShuffled.csv” file. We used cross-validation to obtain the estimations of house prices. The outputted estimations alongside the real values are stored in different files of the “data” folder, in which each filename is composed by the machine learning technique abbreviation and the dimensionality reduction method abbreviation.

Files

Steps to reproduce

Execute “main.py” with Python within the folder from the zip file. It has been tested with Python 3.5.2 version, but it may work with other versions.

Categories

Software, Machine Learning, Dimensionality Reduction, Software Agent, Big Data, Agent-Based Modeling, Multi-Agent Systems, Housing Market

Licence