Cleaned and validated Irish EPC database

Published: 14 February 2023| Version 1 | DOI: 10.17632/5vnnbf8hd6.1
Kumar Raushan


The raw EPC database is filtered using ‘Python’ and ‘R’, Scripting helped in efficiently handling a large amount of data. The scripts used are combinations of data manipulation steps, applied in series, with filter, and change data based on specified user criteria. In this case, the entire EPC database accounting for 1 million entries in quarter 1 of 2022 was used as the input. This data flow captures the relative frequency of each filter and the resultant number of EPC entries cleaned from the dataset for being considered erroneous or outliers. Approximately 30% of EPC entries are flagged and labeled as outliers, this gives an overview of the data quality of the EPC database. As discussed earlier in section 3.7, the issue is not localized in Ireland, but it is widespread in MSs. The most striking finding was that many features which play a pivotal part in the overall performance of a dwelling and could uplift, or downgrade energy rating are plagued with poor data, i.e., Living Area Percentage, assigning the areas within a dwelling to be assumed to be heated at 21° C and 18° C . Getting it right is crucial in estimating the representative theoretical energy performance of dwellings. Also, ceiling height data is poor in quality, making it unrepresented of dwellings’ geometry, which again results in inaccurate overall energy performance.



Energy Efficiency, Energy Efficiency Certificate, Built Environment


Science Foundation Ireland