A Review of Big Data and Machine Learning Operations in Official Statistics: MLOps and Feature Store Adoption

Published: 13 May 2024| Version 1 | DOI: 10.17632/96mxz7jvkr.1
Carlos Nunes, Afshin Ashofteh


Integrating machine learning (ML) into the official statisticians' toolset is gaining popularity as National Statistical Offices (NSOs) strive to improve their methodologies. This trend poses new challenges and implications for incorporating innovative techniques that ensure the reliability of the official statistical production process. A comprehensive literature review was conducted using Scopus and Web of Science databases to explore the contemporary applications of data science in official statistics. A total of 178 research articles were identified, focusing on areas such as big data, machine learning, and data quality. While the literature review revealed extensive proposals on utilizing alternative data and applying machine learning techniques to support official statistics production, it also identified research gaps in the post-training steps of the machine learning process. Areas requiring further investigation include machine learning operations in a production environment, data quality assurance, and governance.



Machine Learning, Big Data